What are the basic knowledge points of zookeeper 07/06 Update SLTechnology News&Howtos

What are the basic knowledge points of zookeeper

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what are the basic knowledge points of zookeeper". Friends who are interested may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn what are the basic knowledge points of zookeeper.

1. The concept of zookeeper 1. The basic concept of Zookeeper

The design goal of ZooKeeper is to encapsulate those complex and error-prone distributed consistency services to form an efficient and reliable set of primitives and provide them to users with a series of easy-to-use interfaces.

ZooKeeper is a typical distributed data consistency solution. Distributed applications can implement functions such as data publish / subscribe, load balancing, naming service, distributed coordination / notification, cluster management, Master election, distributed locking and distributed queue based on ZooKeeper.

One of the most common usage scenarios for Zookeeper is to act as a registry for service producers and consumers. (for example, zookeeper plays the role of registry in Dubbo)

2. Important concepts of Zookeeper:

Zookeeper itself is a distributed program (as long as more than half of the nodes survive, Zookeeper can serve normally)

To ensure high availability, it is best to deploy Zookeeper in a cluster form, so that Zookeeper itself is still available as long as most of the machines in the cluster are available

Zookeeper keeps data in memory, which ensures high throughput and low latency (but memory limits the capacity of storage is not too large, and this limit is also maintained)

Zookeeper is of high performance. Especially high performance in applications where "read" is more than "write", because "write" causes all servers to synchronize state.

Zookeeper has the concept of temporary nodes. When the client session that creates the temporary node remains active, the transient node always exists. When the session ends, the instantaneous node is deleted. A persistent node means that once the znode is created, the znode will always be saved on the zookeeper unless the znode is removed actively.

In fact, the bottom layer of zookeeper only provides two functions: 1, managing (storing and reading) the data submitted by the user program, submitting data for the user program and node monitoring service.

3. Session (Session)

Session refers to the session between the ZooKeeper server and the client. In ZooKeeper, a client connection is a TCP persistent connection between the client and the server. When the client starts, it first establishes a TCP connection with the server, and the lifecycle of the client session begins from the first connection establishment. Through this connection, the client can maintain a valid session with the server through heartbeat detection, send requests and receive responses to the Zookeeper server, and receive Watch event notifications from the server through the connection. The sessionTimeout value of Session is used to set the timeout for a client session. When the client connection is disconnected due to various reasons, such as excessive server pressure, network failure, or client active disconnection, the previously created session is still valid as long as any server in the cluster can be reconnected within the time specified by sessionTimeout.

* * before creating a session for a client, the server first assigns a sessionID to each client. Because sessionID is an important identity of a Zookeeper session, and many session-related operating mechanisms are based on this sessionID, it is important to ensure that the sessionID assigned to the client by any server is globally unique.

4 、 Znode

When it comes to distribution, we usually refer to each of the machines that make up the cluster. However, in Zookeeper, "nodes" are divided into two categories: the first category also refers to the machines that make up the cluster, which we call machine nodes; the second category refers to the data units in the data model, which we call data nodes-ZNode.

Zookeeper stores all the data in memory, and the data model is a Znode Tree, and the path split by a slash (/) is a Znode, such as / foo/path2. Each will save its own data content, as well as a series of attribute information.

In Zookeeper, node can be divided into two categories: persistent nodes and temporary nodes. The so-called persistent node means that once the ZNode is created, the ZNode will always be saved on the Zookeeper unless the ZNode is removed actively. The temporary node is different, its life cycle and client session binding, once the client session fails, then all temporary nodes created by the client will be removed. In addition, ZooKeeper allows users to add a special attribute for each node: SEQUENTIAL. Once a node is marked with this attribute, when the node is created, Zookeeper automatically appends an integer number to its node name, which is a self-increment number maintained by the parent node.

5. Version

As we mentioned earlier, data is stored on each ZNode of Zookeeper, and a data structure called Stat is maintained for each ZNode,Zookeeper. Three data versions of this ZNode are recorded in Stat, namely version (version of the current ZNode), cversion (version of the current ZNode child node), and aversion (the ACL version of the current ZNode).

6 、 Watcher

Watcher (event listener) is a very important feature in Zookeeper. Zookeeper allows users to register some Watcher on specified nodes, and when some specific events are triggered, the ZooKeeper server will notify the interested clients of the events. This mechanism is an important feature of Zookeeper to implement distributed coordination services.

7 、 ACL

Zookeeper uses ACL (AccessControlLists) policy for permission control, which is similar to the permission control of UNIX file system. Zookeeper defines the following five permissions.

In particular, it should be noted that both CREATE and DELETE permissions are permission controls for child nodes.

II. Characteristics of ZooKeeper

Sequential consistency: transaction requests from the same client will eventually be applied to the ZooKeeper in strict order.

Atomicity: the processing results of all transaction requests are consistent on all machines in the entire cluster, that is, either all machines in the entire cluster successfully apply a transaction, or none of them are applied.

Single system image: no matter which ZooKeeper server the client connects to, it sees the same server-side data model.

Reliability: once a change request is applied, the result of the change is persisted until it is overwritten by the next change.

III. ZooKeeper design goal 3.1 simple data model

ZooKeeper allows distributed processes to coordinate with each other through shared hierarchical namespaces, similar to standard file systems. Namespaces consist of data registers in ZooKeeper-called znode, which are similar to files and directories. Unlike a typical file system designed for storage, ZooKeeper data is kept in memory, which means that ZooKeeper can achieve high throughput and low latency.

3.2 clusters can be built

To ensure high availability, it is best to deploy ZooKeeper as a cluster, so that zookeeper itself is still available as long as most of the machines in the cluster are available (which can tolerate certain machine failures). When using ZooKeeper, the client needs to know the list of cluster machines and use the service by establishing a TCP connection with a machine in the cluster. The client uses this TCP link to send requests, get results, get listening events, and send heartbeats. If the connection is disconnected abnormally, the client can connect to another machine.

* * the architecture diagram officially provided by ZooKeeper:

Each Server in the figure above represents a server where the Zookeeper service is installed. The servers that make up the ZooKeeper service maintain the current server state in memory, and each server maintains communication with each other. Data consistency is maintained through Zab protocol (Zookeeper Atomic Broadcast) between clusters.

3.3 Sequential access

For each update request from the client, ZooKeeper assigns a globally unique incremental number that reflects the sequence of all transaction operations, and applications can use the ZooKeeper feature to implement higher-level synchronization primitives. This number is also called a timestamp-zxid (Zookeeper Transaction Id)

3.4 High performance

ZooKeeper is of high performance. Especially high performance in applications where "read" is more than "write", because "write" causes all servers to synchronize state. ("read" more than "write" is a typical scenario for coordination services. )

Introduction to the role of four ZooKeeper clusters

The most typical cluster mode: Master/Slave mode (active / standby mode). In this mode, the Master server usually provides write services as the primary server, and other Slave servers obtain the latest data of the Master server to provide read services from the server through asynchronous replication.

However, instead of choosing the traditional concept of Master/Slave in ZooKeeper, three roles of Leader, Follower and Observer are introduced. As shown in the following figure

All the machines in the ZooKeeper cluster select a machine called "Leader" through a Leader election process, and Leader can provide both write and read services for clients. Except for Leader, both Follower and Observer can only provide read services. The only difference between Follower and Observer is that Observer machines do not participate in the election process of Leader, nor do they participate in the "more than half-write success" strategy of write operations, so Observer machines can improve the read performance of the cluster without affecting write performance.

Five ZooKeeper & ZAB Protocol & Paxos algorithm 5.1 ZAB Protocol & Paxos algorithm

Paxos algorithm can be said to be the soul of ZooKeeper. However, ZooKeeper does not completely adopt the Paxos algorithm, but uses the ZAB protocol as its core algorithm to ensure data consistency. In addition, it is also pointed out in the official document of ZooKeeper that ZAB protocol is not a general distributed consistency algorithm like Paxos algorithm, it is a crash recoverable atomic message broadcasting algorithm specially designed for Zookeeper.

5.2 introduction of ZAB protocol

ZAB (ZooKeeper Atomic Broadcast Atomic broadcast) protocol is a specially designed atomic broadcast protocol for distributed coordination service ZooKeeper, which supports crash recovery. In ZooKeeper, we mainly rely on ZAB protocol to achieve distributed data consistency. Based on this protocol, ZooKeeper implements a system architecture of active and standby mode to maintain data consistency among replicas in the cluster.

5.3 two basic modes of the ZAB protocol: crash recovery and message broadcasting

The ZAB protocol includes two basic modes, namely crash recovery and message broadcasting. When the whole service framework is in the starting process, or when the Leader server has network outages, crashes, exits and restarts, the ZAB protocol will enter the human recovery mode and elect a new Leader server. When a new Leader server is elected and more than half of the machines in the cluster have completed state synchronization with the Leader server, the ZAB protocol exits the recovery mode. Among them, the so-called state synchronization refers to data synchronization, which is used to ensure that more than half of the machines in the cluster can be consistent with the data state of the Leader server.

When more than half of the Follower servers in the cluster have completed the state synchronization with the Leader servers, then the entire service framework can enter the human message broadcast mode. When a server that also complies with the ZAB protocol is started and added to the cluster, if there is already a Leader server in the cluster responsible for broadcasting messages, then the newly added server will consciously enter the human data recovery model: find the server where the Leader is located, synchronize with it, and then participate in the message broadcast process. As mentioned in the above introduction, ZooKeeper is designed to allow only one Leader server to process transaction requests. After receiving the transaction request from the client, the Leader server generates the corresponding transaction proposal and initiates a broadcast protocol; if other machines in the cluster receive the transaction request from the client, then these non-Leader servers will first forward the transaction request to the Leader server.

At this point, I believe you have a deeper understanding of "what are the basic knowledge points of zookeeper?" you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.