How to implement a Zab protocol in Zookeeper 07/15 Update SLTechnology News&Howtos

How to implement a Zab protocol in Zookeeper

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article will explain in detail how to implement a Zab protocol in Zookeeper. The content of the article is of high quality, so the editor shares it for you as a reference. I hope you will have some understanding of the relevant knowledge after reading this article.

A brief introduction to Zookeeper

Zookeeper is a distributed data consistency solution, based on which distributed applications can implement functions such as data publish / subscribe, load balancing, naming service, distributed coordination / notification, cluster management, Master election, distributed locking and distributed queue. Zookeeper is committed to providing a distributed coordination system with high performance, high availability and strict sequential access control.

Considering the state of the main operation data of Zookeeper, in order to ensure the consistency of the state, Zookeeper proposes two security attributes:

Total order: if message an is sent before message b, all Server should see the same result

Causal order (Causal order): if message an occurs before message b (a causes b) and is sent together, an is always executed before b

To ensure the above two security attributes, Zookeeper uses the TCP protocol and Leader:

The full-order characteristics of messages are guaranteed by using TCP protocol (first-come-first-served).

The causal sequence problem is solved through Leader: first come, first execute, but then Leader may have network outages, crashes, exits and restarts, so it is necessary to introduce Leader election algorithm.

ZAB (Zookeeper Atomic Broadcast is Zookeeper Atomic message broadcast Protocol) is the core algorithm of its data consistency, so let's introduce the ZAB protocol.

What is the Zab protocol

ZAB, Zookeeper Atomic Broadcast,zk Atomic message broadcast Protocol, is an atomic broadcast protocol designed for ZooKeeper to support crash recovery. In Zookeeper, based on this protocol, ZooKeeper implements a master-slave system architecture to maintain data consistency among replicas in the cluster.

Zookeeper uses a single main process to receive and process all transaction requests from the client, namely write requests. When the state of server data changes, the cluster adopts ZAB atomic broadcast protocol and broadcasts it to all replica processes in the form of transaction proposal Proposal. The ZAB protocol can guarantee a global change sequence, that is, each transaction can be assigned a global incremental number xid.

When the Zookeeper client connects to a node in the Zookeeper cluster, if the client submits a read request, the current node responds directly to its saved data; if it is a write request and the current node is not Leader, then the node will forward the write request to Leader,Leader and broadcast the write operation in the form of a proposal. As long as more than half of the nodes agree to the write operation, the write operation request will be submitted. Leader then broadcasts again to all subscribers, Learner, informing them to synchronize the data.

Third, the principle of Zab protocol

The Zab protocol requires that each Leader go through three stages: discovery, synchronization, and broadcasting.

Discovery: the zookeeper cluster is required to elect a Leader process, while Leader maintains a list of Follower available clients. Clients can communicate with these Follower nodes in the future.

Synchronization: Leader is responsible for synchronizing its own data with Follower to achieve multi-copy storage. This also withdraws the high availability and partition fault tolerance in CAP. After Follower consumes the outstanding requests in the queue, it writes them to the local transaction log.

Broadcast: Leader can accept new transactional Proposal requests from the client and broadcast new Proposal requests to all Follower.

IV. Core of Zab protocol

The core of the Zab protocol: defines how transaction requests are handled

All transaction requests must be coordinated by a globally unique server, which is called a Leader server. The remaining servers are Follower servers.

The Leader server is responsible for converting a client transaction request into a transactional Proposal and distributing the Proposal to all Follower servers in the cluster, that is, sending data broadcast requests (or data replication) to all Follower nodes.

After distribution, the Leader server needs to wait for feedback from all Follower servers (Ack requests). In the Zab protocol, as long as more than half of the Follower servers have received correct feedback (that is, received more than half of the Follower Ack requests), then Leader will again send Commit messages to all Follower servers asking them to commit the last transaction proposal.

V. content of Zab agreement

The Zab protocol includes two basic modes: crash recovery and message broadcasting.

1. Protocol process

During the whole cluster startup process, or when the Leader server is broken, crashed, exited or restarted, the Zab protocol will enter the crash recovery mode and a new Leader will be elected.

When a new Leader is elected and more than half of the machines in the cluster have completed state synchronization (that is, data synchronization) with the Leader server, the Zab protocol will exit the crash recovery mode and enter message broadcast mode.

At this point, if a server that complies with the Zab protocol joins the cluster because there is already a Leader server broadcasting messages in the cluster, the newly joined server automatically enters recovery mode: find the Leader server and complete data synchronization. After the synchronization is complete, participate in the message broadcast process as a new Follower.

two。 Protocol status switching

When Leader crashes and exits or the machine restarts, or when there is no more than half of the servers in the cluster communicating normally with Leader, Zab will once again enter crash recovery, initiate a new round of Leader election and achieve data synchronization. After the synchronization is completed, it enters the message broadcast mode and receives the transaction request.

3. Make sure the messages are orderly

In the whole message broadcast, Leader will convert each transaction request into a corresponding proposal for broadcasting, and before broadcasting the transaction Proposal, the Leader server will first assign a global single increment unique ID to the transaction Proposal, which is called transaction ID (zxid). Because the Zab protocol needs to ensure the strict order of each message, each proposal must be sorted and processed according to its zxid order.

VI. Recovery from collapse

Once the Leader server crashes or the Leader server loses contact with more than half of the Follower due to network reasons, it will enter crash recovery mode.

As we said earlier, crash recovery has two phases: Leader election and initialization synchronization. When the Leader selection is completed, the Leader is still a quasi-Leader, and it needs to be initialized and synchronized before it becomes a real Leader.

Initialize synchronization

The specific process is as follows:

To ensure that Leader sends proposals to Learner in an orderly manner, Leader prepares a queue for each Learner server

Leader encapsulates transactions that are not synchronized by each Learner as Proposal

Leader sends these Proposal to each Learner one by one, and each Proposal is followed by a COMMIT message, indicating that the transaction has been committed and that the Learner can receive and execute it directly.

Learner receives Proposal from Leader and updates it locally

When the Learner update is successful, ACK information will be sent to the quasi-Leader.

Upon receiving an ACK from Learner, the Leader server adds the Learner to the actually available Follower list or Observer list. There is no feedback ACK, or Learner,Leader that does not receive feedback from Leader will not be added to the appropriate list.

VII. Two principles of the recovery model

When the cluster is in the process of starting up, or when the Leader is disconnected from more than half of the hosts, the cluster enters recovery mode. There are two principles to follow for the state of the data to be restored.

1. Messages that have been processed cannot be lost.

When the Leader receives more than half of the ACKs of the Follower, it broadcasts the COMMIT message to each Follower and approves each Server to perform the write transaction. When each Server receives a COMMIT message from Leader, it performs the write operation locally and then responds to the client that the write operation was successful.

But if Leader dies before not all Follower receives the COMMIT message, this will have a consequence: part of the Server has already executed the transaction, while some Server has not received the COMMIT message, so it has not executed the transaction. When the new Leader is elected, the cluster needs to ensure that transactions that have been performed by part of the Server are performed on all Server after the recovery mode.

two。 Discarded messages cannot be reproduced

When the new Leader transaction has passed and it has updated the transaction locally, but before all the Follower has received the COMMIT, the Leader goes down (earlier than the downtime described earlier), and all Follower is not aware of the existence of the Proposal. When the new Leader is elected and the whole cluster enters the normal service state, the dead Leader host restarts and registers as Follower. If the Proposal that others do not know is still in that host, then its data will have more content than other hosts, resulting in inconsistent state of the whole system. Therefore, the Proposa should be discarded. Transactions like this that should be discarded cannot reappear in the cluster and should be cleared.

VIII. News broadcast

When the Learner in the cluster completes initialization state synchronization, the entire zk cluster enters normal working mode.

If the Learner node in the cluster receives a transaction request from the client, the Learner forwards the request to the Leader server. Then perform the following specific process:

After receiving the transaction request, Leader assigns a globally unique 64-bit self-increasing id, namely zxid, to the transaction. The orderly management of the transaction can be realized by comparing the size of the zxid, and then the transaction is encapsulated as a Proposal.

Leader gets all the Follower based on the Follower list, and then sends the Proposal through the queues of these Follower to each Follower.

When Follower receives the proposal, it first compares the zxid of the proposal with the largest zxid in the locally recorded transaction log. If the zxid of the current proposal is greater than the maximum zxid, the current proposal is recorded in the local transaction log and an ACK is returned to Leader. (ask students)

When Leader receives more than half of the ACKs, Leader sends COMMIT messages to all Follower queues and Proposal to all Observer queues.

When Follower receives the COMMIT message, the transactions in the log are officially updated locally. When Observer receives the Proposal, it updates the transaction directly locally.

Both Follower and Observer need to send a successful ACK to Leader after the synchronization is complete.

9. Realization principle 1. Three types of roles

In order to avoid the single point problem of Zookeeper, zk also appears in the form of clusters. There are three main types of roles in a zk cluster:

Leader: receives and processes read requests from clients; the only processor of transaction requests in the zk cluster, and is responsible for initiating decisions and votes, and then synchronizes the results to other hosts in the cluster after the passed transaction requests are processed locally.

Follower: receive and process read requests from the client; transfer transaction requests to data in Leader; synchronous Leader; when Leader dies, participate in the election of Leader (with the right to vote and stand for election)

Observer: a Follower (temporary worker) who does not have the right to vote and stand for election, and does not have the right to vote. If there is a lot of read pressure in the zk cluster, you need to increase the Observer, preferably not the Follower. Because increasing Follower will increase the pressure on voting and counting ballots, reduce the efficiency of write operations, and the efficiency of Leader elections.

These three types of roles have different names in different situations (this can be learned in preparation for the next reading of the source code):

Learner = Follower + Observer

QuorumServer = Follower + Leader

two。 Three pieces of data

There are three important pieces of data in ZAB:

Zxid: is a 64-bit Long type. The high 32 bits represent epoch and the low 32 bits represent xid.

Epoch: each Leader will have a different epoch to distinguish between different periods (which can be understood as the year name of the dynasty).

Xid: transaction id, which is a serial number (each dynasty change, that is, leader change), incremented from 0.

Whenever a new Leader is elected, the zxid of the largest number Proposal in the local transaction log is extracted from the Leader server, the corresponding epoch number is parsed from the zxid, and then added 1, which is then used as the new epoch value, and the low 32-bit number is zeroed, and the zxid is regenerated from 0.

3. Three states

Each host in the zk cluster is in a different state at different stages. Each host has four states.

LOOKING: election statu

The normal working state of FOLLOWING:Follower, the status of synchronizing data from Leader

Normal working status of LEADING:Leader, status of Leader broadcast data updates

The normal working state of OBSERVING:Observer, the status of synchronizing data from Leader

In the code implementation, there is one more state: Observing state, which is added after Zookeeper is introduced into Observer. Observer does not participate in the election, is a read-only node, and actually has nothing to do with the Zab protocol. This concept is added here to read the source code.

The four stages of 4.Zab

Myid: this is the unique identity of the server in the zk cluster, called myid. For example, if there are three zk servers, then the number is 1pm 2pm 3.

Logical clock: the logical clock, Logicalclock, is an integer. This concept is called logicalclock at election time and epoch after election bundles. That is, epoch and logicalclock are the same value and different names in different situations.

1)。 Election stage (Leader Election)

The node is in the election node at the beginning, as long as one node gets more than half of the votes of the node, it can be elected quasi-Leader, and only when it reaches the third stage (that is, the synchronization phase), the quasi-Leader will become a real Leader.

Zookeeper stipulates that all valid votes must be in the same round, and each server will increment the logicalClock it maintains when it starts a new round of voting.

Each server empties its own ballot box (recvset) before broadcasting its own ballot. The ballot box records the votes received.

For example, if Server_2 votes for Server_3,Server_3 and votes for Server_1, then the ballot boxes of Server_1 are (2), (3), (1) and (1). (each server will vote for itself by default)

The former number represents the voter, and the latter number represents the voter. Only the last voting record of each voter is recorded in the ballot box, and if the voter updates his or her own ballot, other servers will update the server's ballot in their own ballot box when they receive the new ballot. * * think about it: how should it be implemented here? * * as we can see when we analyze the source code, it's very ingenious.

The purpose of this stage is to select a quasi-Leader and then move on to the next stage.

2)。 Discovery phase (Descovery)

At this stage, the Followers communicates with the quasi-Leader elected in the previous round and synchronizes the transaction Proposal recently received by the Followers.

The main purpose of this phase is to discover the latest Proposal that most nodes currently receive, and the quasi-Leader generates a new epoch for Followers to receive and update their acceptedEpoch.

3)。 Synchronization phase (Synchronization)

The synchronization phase mainly uses the latest Proposal history obtained in the previous stage of Leader to synchronize all replicas in the cluster.

Only when the quorum (more than half of the nodes) are completed synchronously will the quasi-Leader become a true Leader. Follower will only receive Proposal whose zxid is larger than its own lastZxid.

4)。 Broadcast phase (Broadcast)

At this stage, the Zookeeper cluster can formally provide transaction services to the outside world, and Leader can broadcast messages. At the same time, if a new node joins, the new node needs to be synchronized. It is important to note that Zab commit transactions do not require all Follower to be Ack like 2PC, but only need to get the Ack of quorum (more than half of the nodes).

Zab and Paxos

The above has made a detailed description of the process involved in the Zab protocol, so what is the relationship between it and Paxos?

The authors of Zab believe that Zab is different from paxos, and that Paxos is not adopted because Paxos does not guarantee full order:

Because multiple leaders can propose a value for a given instance two problems arise. First, proposals can conflict. Paxos uses ballots to detect and resolve conflicting proposals. Second, it is not enough to know that a given instance number has been committed, processes must also be able to fi gure out which value has been committed.

It is true that the Paxos algorithm does not care about the logical order between requests, but only considers the full order between the data, but few people directly use the paxos algorithm, which will be simplified and optimized.

Burrows, the designer and developer of Google's coarse-grained locking service Chubby, once said: "all conformance protocols are essentially either Paxos or variants." There is some truth in this sentence. ZAB is essentially a simplified form of Paxos.

On how to achieve a Zab protocol in Zookeeper to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.