What is the data synchronization process of ZooKeeper cluster 07/13 Update SLTechnology News&Howtos

What is the data synchronization process of ZooKeeper cluster

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly explains "what is the data synchronization process of ZooKeeper cluster". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "what is the data synchronization process of ZooKeeper cluster"?

I. completion of the election

After the election, our Ma Guoguo Glory was elected as the Leader of the current office cluster, so now assume that the diagram of each office looks like this:

Let's talk about how Ma Xiaoyun and Ma Xiaoteng synchronize their data with Ma Guoguo.

After the tiring election, Ma Xiaoyun and Ma Xiaoteng lost the competition by a narrow margin and had no choice but to become Follower. After sorting out their emotions, the first thing they need to do is to report their own information to Ma Guoguo through the operator, using a special code FOLLOWERINFO, and the data mainly have their own epoch and myid:

Then there is Ma Guoguo. After receiving the FOLLOWERINFO, he will also make statistics until it reaches more than half. After synthesizing the information given by each Follower, the new epoch will be calculated, and then the new epoch will be sent back to other Follower with the code LEADERINFO.

Then go back to Ma Xiaoyun and Ma Xiaoteng, record the new epoch after receiving the LEADERINFO, and then reply to Ma Guoguo with an ACKEPOCH code and the largest zxid on his side, indicating that the LEADERINFO has just been received.

Then Ma Guoguo will wait for more than half of the ACKEPOCH notifications, and after receiving it, it will give different synchronization strategies according to the information of each Follower. With regard to different synchronization strategies, let me give you a preconceived introduction:

DIFF, if the record of Follower is not much different from that of Leader, use incremental synchronization to send a write request to Follower

TRUNC, this situation means that Follower's zxid is ahead of the current Leader (possibly the previous Leader), and the Follower needs to truncate the excess part and downgrade it to be consistent with Leader.

SNAP, if the record of Follower differs too much from the current Leader, Leader sends its entire memory data directly to Follower

As for which strategy to adopt and how to judge it, we will explain it one by one.

1.1 DIFF

After each ZK node receives a write request, it maintains a write request queue (the default size is 500mm, which is configured through zookeeper.commitLogCount) and records the write request in it. The earliest zxid in this queue is minZxid (hereinafter referred to as min), and the zxid of the last incoming write request is maxZxid (hereinafter referred to as max). When the limit is reached, the earliest entered write request will be removed. After knowing these two values, Let's see how DIFF judges.

1.1.1 recover from the write request queue in memory

In one case, if the zxid reported by Follower through ACKEPOCH is between min and max, the DIFF strategy is used for data synchronization.

In our example, the zxid of Leader is 99, which shows that the queue for storing 500write requests is not full at all, so min is 1 max is 99, obviously 77 and 88 are in this interval, so Ma Guoguo will find the desired interval for the other two Follower, send a DIFF to Follower first, and then send them each write request in the order of PROPOSAL and COMMIT.

1.1.2 restore from disk file log

Another situation is that if the zxid of Follower is not within the range of min and max, but if the zookeeper.snapshotSizeFactor configuration is greater than 0 (default is 0.33), you will try to use log for DIFF, but if the total size of the log file that needs to be synchronized cannot exceed 1/3 of the current size of the latest snapshot file (take the default 0.33 as an example), you can synchronize DIFF by reading the write request record in the log file. The method of synchronization is the same as above, first send a DIFF to Follower, then find the interval of the Follower in the log file, and then send PROPOSAL and COMMIT one by one.

On the other hand, when Follower receives the password message from PROPOSAL, it will process it one by one as if it were a client request, and slowly restore the data to be consistent with Leader.

1.2 SNAP

Suppose the three offices are like this now.

Ma Guoguo's write request queue records 277 to 777 write requests by default, and assuming that the current scenario does not meet the situation of 1.1.2 above, Ma Guoguo knows that synchronization needs to be done through SNAP.

Ma Guoguo will first send a SNAP request to Ma Xiaoyun and Ma Xiaoteng to get them ready.

Then the entire data in the current memory is serialized (the same as the snapshot file) and then sent to Ma Xiaoyun and Ma Xiaoteng together.

After receiving the entire snapshot from Ma Guoguo, Ma Xiaoyun and Ma Xiaoteng will first empty all the information in their current database, and then deserialize the received snapshot directly to complete the recovery of the entire memory data.

1.3 TRUNC

The scenario for the last strategy is assumed to be like this:

Suppose Ma Xiaoteng was the last Leader, but resumed to join the cluster as Follower after a power outage, but his zxid is larger than max. Ma Guoguo will send TRUNC to Ma Xiaoteng at this time. (as for why Ma Xiaoyun does not give an example as TRUNC in the figure, because if Ma Xiaoyun's zxid is also larger than Ma Guoguo, Ma Guoguo cannot be elected as Leader in the current scene).

Ma Guoguo will send TRUNC to Ma Xiaoteng (ignore Ma Xiaoyun here)

Suppose Ma Xiaoteng's local log file directory looks like this:

/ tmp └── zookeeper └── log └── version-2 └── log.0 └── log.500 └── log.800

After Ma Xiaoteng receives the TRUNC, he will find all the log files greater than 777 in the local log file to delete, that is, the log.800 here, and then he will find the 777 zxid record in the log.500 file and change the read and write pointer of the current file to the location of 777. After that, the read and write operation for the file will start from 777, so that the later records will be overwritten.

On the other hand, when Ma Guoguo judges the synchronization strategy and sends it to the other two horses, a NEWLEADER message will be sent to them.

If Ma Xiaoyun and Ma Xiaoteng synchronize data through SNAP after receiving NEWLEADER, they will force a snapshot of a new snapshot file here. Then I will reply to Ma Guoguo with an ACK message, telling him that his synchronization data has been completed.

Then Ma Guoguo will also wait for half of the same ACK to be received, and then send a UPTODATE to the other two horses, telling them that the office data are consistent and can begin to provide services.

Then Ma Xiaoyun and Ma Guoguo will reply an ACK to Ma Guoguo after receiving the UPTODATE, but this time Ma Guoguo will not deal with it after receiving this ACK, so after UPTODATE, each office can officially provide services.

So much has been said above, but Ma Xiaoyun and Ma Xiaoteng are both Follower. What if it is Observer? How to synchronize with the above steps?

The difference is in the first step, Follower sends FOLLOWERINFO, while Observer sends OBSERVERINFO, which is the same step for data synchronization as Follower.

2. Continue to dig deep

Now to explain some of the details in ape words, the specific methods adopted by Leader when sending Follower are not quite the same for three different data synchronization strategies.

2.1 three policy delivery methods

If the synchronization method of DIFF or TRUNC is used, Leader actually does not send it when it finds any discrepancy data, but first puts it into a queue in order, and finally starts a thread to send it one by one.

DIFF:

TRUNC:

However, if synchronized in SNAP mode, it will not be put into the queue, and either the SNAP message or the entire serialized memory snapshot snapshot will be written directly through the socket between the servers.

2.2 from the perspective of God

Let's take another look at the whole process of message interaction among the three strategies. Here, take Ma Xiaoyun as an example.

2.2.1 DIFF

2.2.2 TRUNC

2.2.3 SNAP

You can see that the beginning and end are the same, that is, the intermediate requests will be sent according to different policies. This is pretty much the end of the overall logic on how Follower or Observer synchronizes messages with Leader.

2.3 Summary

There are three ways for Follower and Observer to synchronize data: DIFF, SNAP, and TRUNC

DIFF requires that the data difference between Follower or Observer and Leader is within the range of min and max, or is configured to allow recovery from log files

TRUNC means that when the zxid of Follower or Observer is larger than Leader, the node needs to actively delete excess zxid-related data and downgrade it to Leader consistency.

SNAP as the last means of data synchronization, Leader serializes the whole memory data directly and sends it to Follower or Observer in order to recover the data.

After reading the number of words in the next article, I decided to add a little bit and start a short article about ACL, which I haven't explained for a long time.

Third, there are no rules, no square

Let's first take you back to memory. The ZooDefs.Ids.OPEN_ACL_UNSAFE in the code snippet of the node created before is the parameter of ACL.

Client.create ("/ update video / dance / 20201101", "this is Data, you can either record some business data or write at will" .getBytes (), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT)

First of all, if zookeeper.skipACL is configured, the parameter is yes (note case), which means that the current node abandons ACL verification. The default is no.

So how is this ACL defined, what permissions does it have, and how is it reflected on the server side? First of all, ACL is divided into two parts: Permission and Scheme. Permission is the permission for operation, while Scheme specifies which authentication mode to use. Let's take a look at it.

3.1 introduction to permission Permission

First of all, ZK divides permissions into five categories:

READ (hereinafter referred to as R) to get node data or a list of child nodes

WRITE (hereinafter referred to as W), setting node data

CREATE (hereinafter referred to as C), create a node

DELETE (hereinafter referred to as D), delete node

ADMIN (hereinafter referred to as A), which sets the ACL permission of a node

Then the five permissions are simple int data at the code level, and you only need to use & operation and target permission to determine whether there is a permission. As long as the result is not equal to 0, you have the permission. The details are as follows:

Int binaryR 1 00001W 2 00010C 4 00100D 8 01000A 16 10000

Suppose the client permission is RWC, and the corresponding value is the sum of all permissions 1 + 2 + 4 = 7.

Int binaryRWC 7 00111

For any node with R, W, C permission requirements, the result of seeking & is not 0, so it can be judged that the client has the three permissions of RWC.

However, if the client deletes the target node and determines the permission, the result is 0, indicating that the client does not have the permission to delete, and the permission error will be returned to the client.

Int binaryRWC 7 00111D 8 & 01000-result 0 000003.2 Scheme introduction

There are four kinds of Scheme, which are ip, world, digest, and super, but there are actually two categories. One is ip for IP addresses, and the other is to use world, digest and super similar to "username: password". In fact, the whole ACL is divided into three parts. The value of scheme:id:perms and id depends on the type of scheme. Here is ip, so the value of id is the specific IP address, while perms is the RWCDA I introduced in the previous section.

The scheme:id of the first two parts of these three parts is equivalent to telling the server "who am I?" And the last part, perms, stands for "what can I do?" An error in either of these two problems will cause the server to throw a NoAuthException exception telling the client that there are not enough permissions.

3.2.1 IP

Let's first look at a piece of code directly, in which I wrote IP 10.11.12.13 casually.

ZooKeeper client = new ZooKeeper ("127.0.0.1 null 2181", 3000, null); List aclList = new ArrayList (); aclList.add (ZooDefs.Perms.ALL, new Id ("ip", "10.11.12.13")); String path = client.create ("/ abc", "test" .getBytes (), aclList, CreateMode.PERSISTENT); System.out.println (path); / / output / abcclient.close ()

You can see that / abc can be output correctly, and you can see the / abc node by looking at the list of child nodes of /

ZooKeeper client = new ZooKeeper ("127.0.0.1 null 2181", 3000, null); List children = client.getChildren ("/", false); System.out.println (children); / / output [abc, zookeeper] client.close ()

But now if you access the data of this node, you will get an error.

ZooKeeper client = new ZooKeeper ("127.0.0.1 null 2181", 3000, null); byte [] data = client.getData ("/ abc", false, null); System.out.println (new String (data)); client.close (); Exception in thread "main" org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for / abc

Readers can try to change the above IP to 127.0.0.1 to recreate the node, and then you can access it normally. In general, IP mode is not used much in the production environment (or maybe I don't use much). If you want to use IP to control access, you can use firewall whitelist and other means. I don't think ZK is needed to manage this level.

3.2.2 World

This mode should be the most commonly used (manual dog head)

Let's look at a piece of code.

ZooKeeper client = new ZooKeeper ("127.0.0.1 anyone 2181", 3000, null); List aclList = new ArrayList (); aclList.add (ZooDefs.Perms.READ, new Id ("world", "anyone")); / / the difference is that the line String path = client.create ("/ abc", "test" .getBytes (), aclList, CreateMode.PERSISTENT); System.out.println (path); / / output / abcclient.close ()

I changed scheme to World mode, and the id value of World mode is that fixed anyone cannot use other values, and I also set perms to R, so this node can only read data, but cannot do other operations. If you use setData to modify its data, you will also get the error of permission.

ZooKeeper client = new ZooKeeper ("127.0.0.1 null 2181", 3000, null); Stat stat = client.setData ("/ abc", "newData" .getBytes (),-1); / / NoAuth for / abc

Now, looking back at the previous ZooDefs.Ids.OPEN_ACL_UNSAFE, it is actually a commonly used static constant provided by ZK, which means that the permission is not verified.

Id ANYONE_ID_UNSAFE = new Id ("world", "anyone"); ArrayList OPEN_ACL_UNSAFE = new ArrayList (Collections.singletonList (new ACL (Perms.ALL, ANYONE_ID_UNSAFE); 3.2.3 Digest

This is the user name and password we are familiar with, or code first.

ZooKeeper client = new ZooKeeper ("127.0.0.1 null 2181", 3000, null); List aclList = new ArrayList (); aclList.add (ZooDefs.Perms.ALL, new Id ("digest", DigestAuthenticationProvider.generateDigest ("laoxun:kaixin"); / / 1String path = client.create ("/ abc", "test" .getBytes (), aclList, CreateMode.PERSISTENT); System.out.println (path); client.close ()

It must be noted in this writing that the string of username:password at 1 must be wrapped by the DigestAuthenticationProvider.generateDigest method, which encodes the incoming string.

After packaging, laoxun:kaixin actually becomes laoxun:/xQjqfEf7WHKtjj2csJh2/aEee8=,. The process is as follows:

Laoxun:kaixin encrypts the whole string with SHA1 first.

Encode the encrypted result with Base64

Concatenate the user name and the encoded result

The above code is also written as follows, using addAuthInfo to add permission information to the client context

ZooKeeper client = new ZooKeeper ("127.0.0.1 client.addAuthInfo 2181", 3000, null); client.addAuthInfo ("digest", "laoxun:kaixin" .getBytes ()); / / 1. List aclList = new ArrayList (); aclList.add (ZooDefs.Perms.ALL, new Id ("auth", ")); / / 2. Here the Id is written as String path = client.create ("/ abc", "test" .getBytes (), aclList, CreateMode.PERSISTENT); System.out.println (path); client.close ()

There are two changes here. The method of using addAuthInfo in 1 can add auth information to the current client session. The id value of Digest is username:password and can be directly used in plaintext. Both username and password are customized.

And then the query code.

ZooKeeper client = new ZooKeeper ("127.0.0.1 null 2181", 3000, null); client.addAuthInfo ("digest", "laoxun:kaixin" .getBytes ()); / / this line will error byte [] data = client.getData ("/ abc", false, null); System.out.println (new String (data)); / / test

No matter how you write it when you create it, you have to use addAuthInfo to add permission information to the session before you can query the node.

3.2.4 Super

If you listen to the name, you will know that this mode is the administrator's mode, because the previously created nodes cannot be accessed by other clients if the user name and password are set, and if the client exits by itself, these nodes will not be able to operate. Therefore, the role of administrator is needed to reduce the dimension of the node.

First of all, Super mode is to be enabled. Here, I assume that the administrator's user name is HelloZooKeeper, password is niubi, encoded is HelloZooKeeper:PT8Sb6Exg9YyPCS7fYraLCsqzR8=, and then you need to specify the zookeeper.DigestAuthenticationProvider.superDigest configuration in the server-initiated environment, and the parameter is HelloZooKeeper:PT8Sb6Exg9YyPCS7fYraLCsqzR8=.

It is assumed that the node is created in laoxun:kaixin mode, and then it can be accessed normally through the administrator's password.

ZooKeeper client = new ZooKeeper ("127.0.0.1 null 2181", 3000, null); client.addAuthInfo ("digest", "HelloZooKeeper:niubi" .getBytes ()); / / 1.byte [] data = client.getData ("/ abc", false, null); System.out.println (new String (data)); / / testclient.close ()

Here you can see that the Super mode in place 1 is essentially Digest, and the specified scheme is digest, and then the value of id is in plaintext, not the encoded format, remember!

3.3 Permission Summary Table

Here I list the Permission permissions for most of the operations provided by the server:

The permissions required for the operation describe the CREATE creation node of the create parent node, the CREATE creation node of the create2 parent node At the same time, return node data createContainer parent node CREATE creation container node createTTL parent node CREATE creation node with timeout node delete parent node DELETE deletion node setData current node WRITE setting node data setACL current node ADMIN setting node permission information reconfig current node WRITE reset some configurations (later have the opportunity to introduce) getData current node READ query node data getChildren current node READ Take the list of child nodes getChildren2 the READ of the current node get the list of child nodes getAllChildrenNumber the READ of the current node get the number of all child nodes (including grandchildren) the ADMIN or READ of the current node of getACL get the permission information of the node

You can see that deleting and creating nodes look at the permissions of the parent node, and only read and write are the permissions of the users themselves. In addition, if the operation that does not appear in the table can be considered that there is no need for ACL permission verification, the other is either that the client is a legitimate session or that it has some special functions, such as createSession, closeSession, etc. As for more about session, save it for the next article. Haha.

3.4 the principle behind ACL

We just spent a little space on what ACL is and how to use it. Now take a closer look at how ACL is implemented at the bottom of the server side of ZK. In order to save space, we will go directly to the ape talk this time.

First of all, present a previous picture to awaken everyone's memory.

The previous article in the permission section (blue font) in the figure has been skipped and has not been explained. Today we will talk about this permission field.

You can also see from the figure that the permission field is directly stored in the server node in the form of numbers (long type, 64-bit integer numbers), and-1 is a special value indicating that no permission verification corresponds to the previous OPEN_ACL_UNSAFE constant.

Whether the ACL permission is provided when the node is created (the ACL parameter is a List) or through the addAuth method (which can be called multiple times), both designs mean that a client can have multiple permissions, such as multiple usernames and passwords, multiple IP addresses, and so on.

ACL I mentioned earlier that it is made up of three parts, that is, scheme:id:perms I will use this form to represent an ACL later for the sake of concise presentation.

The server will use two hash tables to store the bidirectional relationship between the ACL list received so far and its corresponding numbers, like this (the ACL value in the figure is made up by me at random):

The ZK server maintains a number starting from 1, and when a new ACL is received, the two hash tables are placed at the same time (the source code corresponds to two Map, one is Map and the other is Map). In addition to these two hash tables, the ZK server also maintains a session permission information for each client, which is added by the client through addAuth. However, only the scheme:id part of the permission information of the client is saved, so the permission verification of the client operation can be carried out by combining the following three information:

The information scheme:id:perms of the node represented by two hash tables, which can have multiple

The permission information in the context of a client session is id:perms only, and there can be multiple

The corresponding permission requirements for this operation, that is, the required permissions listed in Table 3.3

The process of verification is as follows:

Additional mention here, the validator can be customized, users can customize their own scheme and their own verification logic, you need to configure zookeeper.authProvider in the server's environment variables. At the beginning of the configuration, the corresponding value corresponds to a full path of a class class, which must implement the org.apache.zookeeper.server.auth.AuthenticationProvider interface, and the class must be loaded into the ZK server, so that the custom scheme can be parsed to control the entire verification logic. This function is relatively advanced, and I have never used it. Let's just learn about it as supplementary knowledge.

At this point, I believe you have a deeper understanding of "what is the data synchronization process of ZooKeeper cluster". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.