Explanation of the basic principles of ZooKeeper 07/06 Update SLTechnology News&Howtos

Explanation of the basic principles of ZooKeeper

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly explains the "explanation of the basic principles of ZooKeeper". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "the basic principles of ZooKeeper".

Introduction to ZooKeeper

ZooKeeper is an open source distributed application coordination service, which contains a simple set of primitives based on which distributed applications can implement synchronization services, configuration maintenance and naming services.

Design purpose of ZooKeeper

1. Ultimate consistency: no matter which Server client connects to, it is shown to be the same view, which is the most important performance of zookeeper.

two。 Reliability: with simple, robust, good performance, if message m is accepted by one server, it will be accepted by all servers.

3. Real-time: Zookeeper ensures that the client will get the update information of the server or the information of server failure within a time interval.

However, due to network delay and other reasons, Zookeeper can not guarantee that two clients can get the newly updated data at the same time. If the latest data is needed, the sync () interface should be called before reading the data.

4. Wait irrelevant (wait-free): slow or invalid client must not interfere with fast client requests so that each client can wait effectively.

5. Atomicity: updates can only succeed or fail, with no intermediate state.

6. Ordering: including global order and partial order: global order means that if message an is published before message b on a server, message a will be published before message b on all Server; partial order means that if a message b is released by the same sender after message a, a will be before b.

ZooKeeper data model

Zookeeper maintains a hierarchical data structure that is very similar to a standard file system, as shown in the figure:

The data structure of Zookeeper has the following characteristics:

1) each subdirectory entry, such as NameService, is called znode, and the znode is uniquely identified by the path it is located in, such as Server1, the znode is identified as / NameService/Server1.

2) znode can have child node directories, and each znode can store data. Note that directory nodes of the EPHEMERAL (temporary) type cannot have child node directories.

3) znode has a version (version). There can be multiple versions of the data stored in each znode, that is, multiple copies of data can be stored in one access path, and the version number is automatically increased.

4) Type of znode:

The Persistent node, once created, will not be accidentally lost, even if the server is fully restarted. Each Persist node can contain either data or child nodes.

The Ephemeral node is automatically deleted at the end of the Session between the client and the server that created it. Server restart will cause the Session to end, so the znode of type Ephemeral will be deleted automatically at this time.

Non-sequence node, when multiple clients create the same Non-sequence node at the same time, only one can be created successfully, while the others fail. And the node name created is exactly the same as the node name specified at the time of creation.

The Sequence node that creates the node name with a 10-digit decimal number after the specified name. When multiple clients create a node with the same name, they can all be created successfully, but with different serial numbers.

5) znode can be monitored, including modifications to the data stored in this directory node, changes in sub-node directories, and so on. If there is a change, you can notify the client that sets the monitoring. This is the core feature of Zookeeper, and many functions of Zookeeper are based on this feature.

6) ZXID: each change to the state of Zookeeper produces a zxid (ZooKeeper Transaction Id). Zxid is globally ordered. If zxid1 is less than zxid2, zxid1 occurs before zxid2.

ZooKeeper Session

A connection is established between Client and Zookeeper clusters, and the entire session state changes as shown in the figure:

If Client loses connection because of Timeout and Zookeeper Server, and client is in CONNECTING state, it will automatically try to connect to Server again, and if it successfully connects to a Server within the validity period of session, it will return to CONNECTED state.

Note: if client and Server lose contact due to poor network status, client will stay in the current state and will try to connect to Zookeeper Server again. Client cannot claim that its session expired,session expired is determined by Zookeeper Server, and client can choose to shut down session on its own initiative.

ZooKeeper Watch

Zookeeper watch is a snooping notification mechanism. All Zookeeper read operations getData (), getChildren () and exists () can be set to watch, and monitoring events can be understood as one-time triggers.

The official definition is as follows:

A watch event is one-time trigger, sent to the client that set the watch, whichoccurs when the data for which the watch was set changes .

Three key points of Watch:

(triggered once) One-time trigger

When the monitoring data is changed, the monitoring event is sent to the client.

For example, if the client calls getData (/ znode1, true) and the data on the / znode1 node is changed or deleted later, the client will get the monitoring event of the change in / znode1

If / znode1 changes again, the client will not receive an event notification unless the client sets up monitoring for / znode1 again.

(send to client) Sent to the client

The Zookeeper client and server communicate through socket. Due to network failure, monitoring events are likely not to reach the client successfully, and monitoring events are sent to the monitor asynchronously.

Zookeeper itself provides sequence assurance (ordering guarantee): that is, the client will not be aware of a change in the znode it is set to monitor (a client will never see a change for which it has set a watch until it first sees the watch event) until it first sees the monitoring event.

Network latency or other factors may cause different clients to perceive a monitoring event at different times, but everything seen by different clients is in the same order.

(data set to watch) The data for which the watch was set

This means that the znode node itself has a different way of changing. You can also imagine that Zookeeper maintains two monitoring linked lists: data monitoring and child node monitoring (data watches and child watches) getData () and exists () setting data monitoring, and getChildren () setting child node monitoring.

Or you can imagine that different monitoring settings set by Zookeeper return different data, getData () and exists () return information about znode nodes, and getChildren () returns a list of child nodes.

Therefore, setData () triggers the data monitoring set on a node (assuming the data setting is successful), while a successful create () operation starts the data monitoring set on the current node as well as the child node monitoring of the parent node.

A successful delete operation will trigger the data monitoring and child node monitoring events of the current node, as well as the child watch of the parent node of that node.

Monitoring in Zookeeper is lightweight, so it is easy to set up, maintain, and distribute. When the client loses contact with the Zookeeper server, the client is not notified of the monitoring event, and only when the client reconnects, if necessary, previously registered monitoring is re-registered and triggered, which is usually transparent to the developer.

There is only one situation that results in the loss of monitoring events, that is, monitoring of a znode node is set through exists (), but if a client loses contact with the zookeeper server during the interval between the creation and deletion of the znode node, the client will not be notified of the event even after reconnecting to the zookeeper server later.

Consistency Guarantees

Zookeeper is an efficient and extensible service, both read and write operations are designed to be fast, and read operations are faster than write operations.

Sequential consistency (Sequential Consistency): update requests from a client are executed sequentially.

Atomicity: updates are either successful or failed, and there are no partially successful cases.

Unique system image (Single System Image): no matter which Server the client connects to, the system image is seen to be consistent.

Reliability: once the update is valid, it remains valid until it is overwritten.

Timeline (Timeliness): ensures that the system information seen by each client is consistent within a certain period of time.

How ZooKeeper works

In the zookeeper cluster, each node has the following three roles and four states:

Role: leader,follower,observer

Status: leading,following,observing,looking

The core of Zookeeper is atomic broadcasting, which ensures synchronization between Server. The protocol that implements this mechanism is called the Zab protocol (ZooKeeper Atomic Broadcast protocol). There are two modes of Zab protocol, which are recovery mode (Recovery primary) and broadcast mode (Broadcast synchronization).

When the service starts or after the leader crashes, the Zab enters the recovery mode, and when the leader is elected and most of the Server finishes synchronizing with the leader, the recovery mode ends. State synchronization ensures that leader and Server have the same system state.

In order to ensure the order consistency of transactions, zookeeper uses an incremental transaction id number (zxid) to identify transactions. All proposals (proposal) are made with zxid when they are made.

In the implementation, zxid is a 64-bit number, and its high 32-bit epoch is used to identify whether the leader relationship has changed. Each time a leader is selected, it will have a new epoch that identifies the current period of leader rule. The lower 32 bits are used to increment the count.

Each Server has four states during its operation:

LOOKING: currently Server doesn't know who leader is and is searching for it.

LEADING: the current Server is the elected leader.

FOLLOWING:leader has been elected and the current Server is synchronized with it.

The behavior of OBSERVING:observer is exactly the same as that of follower in most cases, but they do not participate in elections and votes, but only accept (observing) the results of elections and ballots.

Leader Election

When leader crashes or leader loses most of its follower, the zk enters recovery mode, which requires a new leader to be re-elected so that all Server are restored to the correct state.

There are two election algorithms for Zk: one is based on basic paxos, and the other is based on fast paxos algorithm.

The default election algorithm of the system is fast paxos. First introduce the basic paxos process:

1. The election thread is held by the thread initiated by the current Server. Its main function is to count the voting results and select the recommended Server.

two。 The election thread first initiates a query to all Server (including itself)

3. After receiving the reply, the election thread verifies whether the query is initiated by itself (verify whether the zxid is consistent), then obtains the id (myid) of the other party, stores it in the list of current query objects, and finally obtains the leader-related information (id,zxid) proposed by the other party, and stores the information in the voting record table of the current election.

4. After receiving all the Server responses, calculate the Server with the largest zxid, and set the Server related information to the Server to vote next time.

5. The thread sets the maximum Server of the current zxid to the Leader recommended by the current Server. If the winning Server gets the number of Server votes of Server 2 + 1, and sets the currently recommended leader to the winning Server, it will set its own status according to the information about the winning Server, otherwise, the process will continue until the leader is elected.

Through the process analysis, we can draw a conclusion: for Leader to be supported by most Server, the total number of Server must be odd 2n+1, and the number of surviving Server must not be less than 1.

The above process is repeated after each Server is started. Under the recovery model, if you have just recovered from a crash or just started server will also restore data and session information from disk snapshots, zk will record transaction logs and take periodic snapshots to facilitate state recovery during recovery.

The fast paxos process is that in the election process, a Server first proposes to all Server that it wants to become a leader. When other Server receives the proposal, it resolves the conflict between epoch and zxid, accepts the proposal, and then sends a message accepting the proposal to the other party, repeating the process, and finally electing the Leader.

Leader workflow

Leader has three main functions:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Recover data

Maintain the heartbeat with follower, receive follower requests and determine the request message type of follower

The main message types of follower are PING message, REQUEST message, ACK message and REVALIDATE message, which are processed differently according to different message types.

Description:

PING message refers to the heartbeat information of follower; REQUEST message is the proposal information sent by follower, including write request and synchronous request

The ACK message is follower's reply to the proposal, and if more than half of the follower is approved, commit the proposal.

REVALIDATE messages are used to extend the validity of SESSION.

Follower workflow

Follower has four main functions:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Send a request to Leader (PING message, REQUEST message, ACK message, REVALIDATE message)

Receive Leader messages and process them

Receive a request from Client and, if it is a write request, send it to Leader for voting

Returns the Client result.

Follower's message loop handles the following messages from Leader:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

PING message: heartbeat message

PROPOSAL message: a proposal initiated by Leader asking Follower to vote

OMMIT message: information about the latest proposal on the server side

UPTODATE message: indicates that synchronization is complete

REVALIDATE message: according to the REVALIDATE result of Leader, whether to close the session waiting for revalidate or allow it to accept messages

SYNC message: returns the SYNC result to the client, which is originally initiated by the client to force the latest update.

Zab: Broadcasting State Updates

Zookeeper Server receives a request, and if it is a follower, it forwards the request for execution to leader,Leader and broadcasts the execution in the form of Transaction.

How does the Zookeeper cluster determine whether a Transaction is executed by commit? Through the two-paragraph submission Agreement (a two-phase commit):

Leader sends a PROPOSAL message to all follower.

A follower receives the PROPOSAL message, writes it to disk, and sends an ACK message to leader informing it that it has been received.

When the Leader receives the ACK of the follower of the quorum (quorum), the commit message is sent for execution.

The Zab protocol guarantees:

If leader broadcasts in the order of T1 and T2, then all Server must execute T1 before T2.

If any Server is executed in the order of T1 and T2 commit, all other Server must also be executed in the order of T1 and T2.

The biggest problem with the two-segment commit protocol is that if Leader sends a PROPOSAL message or temporarily loses its connection, it will cause the entire cluster to be in an uncertain state (follower does not know whether to abandon this commit or perform the commit).

The Zookeeper will then select a new leader, and the request processing will be moved to the new leader, and different leader will be identified by different epoch. When switching Leader, you need to resolve the following two issues:

1. Never forget delivered messages

Leader has only its own commit before COMMIT delivers to any follower. The new Leader must guarantee that the transaction must also be commit.

2. Let go of messages that are skipped

Leader generates some proposal, but before crash, no follower saw the proposal. When the server is restored, the proposal must be discarded.

Zookeeper will try to ensure that there will not be two active Leader at the same time, because two different Leader will cause the cluster to be in an inconsistent state, so the Zab protocol also guarantees:

Before the new leader broadcast Transaction, the Transaction of the previous Leader commit will be executed first.

At no time will two Server have a quorum (quorum) supporters at the same time.

The quorum here is more than half of the number of Server, or rather the Server with the right to vote (excluding Observer).

Thank you for your reading, the above is the content of "explanation of the basic principles of ZooKeeper". After the study of this article, I believe you have a deeper understanding of the basic principles of ZooKeeper, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.