How to understand Zookeeper Conformance Protocol Zab 07/03 Update SLTechnology News&Howtos

How to understand Zookeeper Conformance Protocol Zab

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces you how to understand the Zookeeper conformance protocol Zab, the content is very detailed, interested friends can refer to, I hope it can be helpful to you.

Zookeeper is a distributed coordination system, which is used to coordinate and synchronize the state between multiple servers with strong fault tolerance.

To guarantee HA, an application often needs N servers (N > 1) to provide services, among which there are M sets of master,N-M and slave. This one is dead, and Nmuri 1 can also provide service. Therefore, the data will also be backed up as N copies scattered on these servers. The question now becomes, how to manage these N servers? How to reselect master when the master node fails? How to ensure that the backup data stored by all servers is consistent?

If you encounter some of the thorny problems above, zookeeper can help you:

Store common configuration information

Track the running status of all processes

Management of each node in the cluster

Master chooses the main body

Provide distributed synchronization support (distributed locking and other functions)

Listed above is the official feature support, and there are many other features to be mined.

When your system relies on zookeeper for HA and consistency, you must also wonder how zookeeper itself guarantees these two features. The man behind the scenes is often overlooked. In fact, it is the Zookeeper Atomic Broadcasting Protocol (Zab protocol). If you have learned about famous distributed consistency algorithms such as two-phase commit, paxos algorithm, and raft algorithm, Zab is certainly no stranger, because they want to achieve the same goal, that is, to ensure the HA and consistency of the application.

In order to understand the following, there are some terms that need to be understood:

The theory and implementation of Zab are not exactly the same.

Implementation is based on the theory of some optimization methods, these optimizations are added in the process of the continuous development of zookeeper. My next explanation, also with reference to this paper, is based on the implementation of zookeeper 3.3.3, and summarizes it in my own language.

Theoretical agreement

As the system starts or recovers, it goes through the following four phases described in the Zab protocol

Stage 0:Leader election. Each peer chooses its own prepared leader from the Quorum peer.

Stage 1: discovery. Prepare leader to find the latest data from Quorum Follower and overwrite its own expired data

Phase 2: synchronization. Prepare leader to synchronize its latest data to Quorum Follower by two-phase commit. After completing this step, the pre-leader will be transformed into a formal leader.

Stage 3: broadcast. Leader accepts write requests and broadcasts to Quorum Follower through two-phase submission

I just briefly described the agreement in the theory before, but the ideal is very bony and there is a lot of room for improvement or compromise. I will clarify them one by one:

The election leader of stage 0 is actually very crude, and it is only chosen as the preparatory leader if it is "right in the eye", so the data of the preparatory leader may not be up-to-date.

To prepare for the expiration of leader data, you need to use stage 1 to make up for it. By transferring data to each other, you can find the latest data and complete the data upgrade for pre-leader.

More data transmission between networks represents greater network overhead

Protocol implementation

After understanding the ideal sense of bone, we return to reality.

The real apache zookeeper puts forward a tentative idea in the implementation: optimize the leader election and directly select the latest Peer as the preparatory Leader, so that phase 0 and phase 1 can be merged, reducing the overhead on the network and the complexity of multiple processes.

As can be seen from the figure, in terms of code implementation, the protocol is simplified into three stages.

Quick election Leader phase: select the latest peer from the Quorum Peer as the leader

Recovery phase: Leader synchronizes data to Quorum Follower

Broadcast phase: Leader accepts write requests and broadcasts to Quorum Follower

Talk is cheap.Show me the Code

At this time, Linus said that no matter how powerful the language is, only the code can truly express the author's mind. I just briefly described the three stages of the implementation of the protocol before, and only the code can really get rid of the fog outside the Zab protocol. (in order not to waste space, pseudo code is used here.)

Quick election Leader phase (FLE)

First initialize some data

ReceivedVotes: ballot box, which stores voting nodes and current voting information

OutOfElection: stores information about nodes that have become Leader, Follower, and do not participate in voting, as well as its historical voting information

Write your name in the ballot paper.

Send notification: initiates a ballot notification that the node will carry ballots into the queue of the target node, which is equivalent to casting a vote for itself and recommending itself to others.

If the current node is elected, it is also notified of the ballot. It constantly polls the node from the queue to get vote information (if it times out, it relaxes the timeout up to the upper limit). According to the status of the polling sending node, do the corresponding processing.

Election: if the round of the sending node is greater than its own, indicating that its election information is out of date, update its own election round, empty the ballot box, update its own ballot content, and notify other nodes of the new ballot; if the sending node's turn is equal to its own, and the voting content is newer than its own, you only need to update your own ballot and notify other nodes. If the number of rounds of the sending node is less than its own, the voting content is expired, has no reference significance, and is directly ignored. All ballot papers that have not been ignored will go into the ballot box. Finally, according to the results in the ballot box, we can judge whether the votes of the current node account for the majority, and if so, choose the Leader according to the votes of the current node.

Leading or following: the sender node is equal to its own number of rounds, indicating that the sender node also participates in voting. If the sender node is Leading or its ballots account for the majority of the ballot box, the election will be completed directly. If the sender node has completed the election (different rounds) or it collects fewer ballots, its information will be stored in OutOfElection. When the node continues to complete the election and the number of OutOfElection gradually becomes Quorum, the OutOfElection is used as a ballot box to check whether the sending node has a majority of votes, and if so, directly elect the Leader.

Recovery phase

After FLE, the node in the log with the most recently committed transaction has been selected as the standby Leader. Below, we will introduce the specific implementation from the perspectives of Leader and Follower.

Leader perspective

First of all, update lastZxid, will era + 1, count zero, announce a change of dynasty. Then every time you receive a data synchronization request from Follower, you will send your own lastZxid back, indicating that all Follower are subject to their own lastZxid. Next, judge how to synchronize the data to Follower according to the specific situation.

If the history commit transaction of Leader is newer than the latest transaction of Follower, the data of Follower needs to be updated. The update method depends on whether the earliest transaction of Leader is newer than the latest transaction of Follower: if the former is updated, it means that in Leader's view, all the recorded transactions of Follower are too obsolete and have no retention value. In this case, you just need to send all the history of Leader to Follower (respond to SNAP) If the latter is updated, it means that in Leader's view, all the latest transactions from Follower's own lastZxid to the Leader log need to be synchronized, so this part is intercepted and sent to Follower (response DIFF)

If the history commit transaction of Leader is not as new as the latest transaction of Follower, there are uncommitted transactions in Follower, and these transactions should be discarded (in response to TRUNC)

When the Follower completes the synchronization, the synchronization ack is sent. When the Leader receives the Quorum ack, the data synchronization phase is complete and the final broadcast phase is entered.

Follower perspective

Notify Leader that you want to synchronize the data in Leader.

When receiving a rejection response from Leader, it means that Leader does not recognize itself as a Follower. It is possible that the Leader is not reliable, so it begins to restart FLE.

When a SNAP or DIFF response is received, Follower synchronizes the transaction sent by Leader

When a TRUNC response is received, Follower discards all outstanding data

When each Follower completes the above synchronization process, it sends an ack to the Leader and enters the broadcast phase.

Broadcast stage

Entering this stage, it means that all data has been synchronized and Leader has become a regular employee. Start the most common workflow of zookeper: broadcast.

The broadcast phase is the stage that actually accepts transaction requests (write requests) and also represents the normal working phase of zookeeper. All nodes can accept write requests from the client, but Follower forwards them to Leader, and only Leader can convert these requests into transactions and broadcast them. This node also has two roles, which will be explained in accordance with these two roles.

Leader perspective:

Leader must go through ready to accept write requests. The Leader that completes the ready constantly accepts write requests, converts them into transaction requests, and broadcasts them to the Quorum Follower.

When Leader receives ack, it means that Follower completes the corresponding processing, Leader broadcasts to submit the request, and Follower completes the submission.

When a new Peer request is found to join as a Follower, send its own era and transaction log to the Peer so that it can complete the recovery phase described above. When the synchronized completed ack of the Peer is received, the Leader sends a commit request so that the Peer commits all synchronized completed transactions. At this time, the Peer becomes a regular Follower and is incorporated into the Quorum Follower by Leader.

Follower perspective:

When Follower is found to be in Leading state, the ready procedure is executed to accept write requests.

When a transaction request is received from Leader, Follower records the transaction in history and responds to ack.

When a commit request is received from Leader, Follower checks the history to see if there are any uncommitted transactions. If so, it needs to wait for the previous transactions to be committed in FIFO order before committing this transaction.

This article does not introduce the use of Zookeeper, but focuses on the implementation of its core protocol Zab. As mentioned in the article, the earliest idea of Zab is not the same as the current implementation. Today's implementation is constantly optimized and improved in the process of the continuous development and growth of Zookeeper. Perhaps the early implementation is what yahoo conceived in the paper. Rome was not built in a day, and no one can expect to eat a fat man in one bite. If Zookeeper starts by thinking about how to optimize to the extreme, it will seriously affect the value of the project itself, because it is likely to be eliminated before an interview.

It can be seen that premature optimization is the root of all evil. But at the same time, a good programmer will not forget the part that needs to be optimized, he will locate the corresponding code, and then modify it. This is what the developers of zookeeper did.

On how to understand the Zookeeper conformance protocol Zab to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.