In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
What this article shares with you is about the election mechanism of ZooKeeper. The editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article. Without saying much, let's take a look at it.
Starting from today, we will delve into the knowledge related to the ZK election.
I. basic rules of the election
ZKr~ this time I decided not to tell stories. I had to talk about something very important in the ZK election.
1.1 zxid
Zxid is the transaction number we mentioned earlier, which is an 8-byte integer number, but ZK designed to split this number into two parts, eat two fish!
8-byte integers have a total length of 64 bits, the first 32 bits are used to record the epoch, and the last 32 bits are used to count. You might want to ask? epoch? What is it?
Zxid initialization is 0, that's it.
00000000000000000000000000000000 00000000000000000000000000000000
Each write request will be incremented by the last 32 bits, and assuming that 10 write requests have been made now (regardless of whether the request is actually modified to data or not), zxid will be like this.
00000000000000000000000000000000 00000000000000000000000000001010
When an election is held, the first 32 bits will be increased by 1 and the last 32 bits will be cleared.
00000000000000000000000000000001 00000000000000000000000000000000
In addition to the election, when the latter 32 bits are completely used up (become all 1, that is, ZK normally executed 2 ^ 32-1 write requests without a single election, awesome! It will also increase the first 32 bits by 1, which is equivalent to carry
# carry before 0000000000000000000000 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
At this point, I can answer the previous question. Epoch is the number of the first 32 digits of zxid. The translation of epoch itself means "era, era", which means updating, while the last 32 digits of zxid are just the count of write requests.
1.2 myid
In the previous short story, I gave each node in ZK's cluster a memorable name (God is fucking easy to remember!). But how does the ZK official name each node in the cluster itself? Use myid!
ZK's startup configuration zoo.cfg has a dataDir that specifies the path where the data is stored (default is / tmp/zookeeper). Create a new text file under this path and name it myid. The text content is a number, and this number is the myid of the current node.
/ tmp └── zookeeper ├── myid └──...
Then the cluster information is configured like this in zoo.cfg
Server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
This server. The following number is myid, which is a myid that cannot be repeated between nodes throughout the cluster. I forgot where I saw it before. I always believed that myid could only be a number from 1 to 255. until this time, I put it into practice with a rigorous attitude, everything was based on facts, and my experiment covered 3.4,3.5,3.6 versions (all simple clusters of three machines). The conclusion is: as long as myid is not equal to-1 (- 1 is a fixed value will cause the current node to start an error), it cannot be greater than Long.MAX_VALUE or less than Long.MIN_VALUE, but if zookeeper.extendedTypesEnabled=true is configured in the current node, the maximum myid of the current node is 254 (negative number does not affect, I do not know the intention of this 254, but there is a judgment in the code) is there a strange increase in knowledge?
More information about the configuration will be sorted out separately later, and that's all for today.
1.3 Election rules
What's the use of knowing all this? It's very important! Because the election of Leader completely depends on these values.
Epoch
Number of write requests
Myid
Priority from top to bottom step by step comparison, who is bigger will be more qualified to become a Leader, the current level will be the same as the next level, until the outcome! Because myid can not be repeated, so in the end, we must be able to tell the outcome!
All right, now that you know the most basic election rules, let's move on to the next section.
Second, the dispute between the three horses
Ma Guoguo must have no idea that he can be compared with two famous star entrepreneurs in this life. let's go and see what happened.
2.1 prepare to start work
Ma Guoguo previously stipulated that the three offices must select a Leader before opening to the outside world, and before the formal start of the selection, each office also has some preparatory work to do:
Each office must know how many offices there are.
Hire additional operators who specialize in communicating with other offices
Prepare a ballot box for counting and returning votes.
Set up a fixed myid for each office
So now the layout of the office looks like this (I omitted the other elements of the previous chapter):
With these preparations, all offices can enter the election stage, and the village committee has stipulated several states to indicate the stage the current office is in:
LOOKING is looking for Leader. Offices at this stage cannot provide services.
LEADING, whose current office is Leader, can provide services to the outside world
FOLLOWING, the current office is following Leader and can provide services
It is obvious that all the offices that have just been prepared are now in the state of LOOKING, so let's officially enter the election process.
2.2 start the election
As the various offices are just ready, they have not yet adopted a letter with each other, and everyone is surnamed Ma, and they all want to be the boss in their hearts, so each office will take the lead in drawing up its own ballot paper and sending it to other offices. The main information is as follows:
Sid: who am I?
Leader: who do I choose?
State: my current status
Epoch: my current epoch
Zxid: the largest transaction number of the leader I chose
Take Ma Guoguo as an example:
Ma Xiaoyun and Ma Xiaoteng are the same. At the beginning, they both chose themselves as Leader candidates, and both sent their votes to the other two candidates (and themselves).
2.2.1 Ma Guoguo perspective
Each office will also receive votes from other offices (or it may be its own). Every time you get a vote, you need to compare it with the current Leader candidate. In theory, your vote for yourself will reach your own ballot box first, because there is no need to communicate and reduce the transmission path. Your vote is consistent with your own candidate, so there is no need to compare. Just make a note in the ballot box, let's take Ma Guoguo as an example:
On the left is the name of the office, and on the right is the Leader chosen by the office. The current voting statistics refers to the ballots obtained by the Leader selected by the current node.
Suppose he receives Ma Xiaoyun's vote again:
Ma Guoguo first saw that Ma Xiaoyun was also in the LOOKING state.
Then you will compare your own candidate's vote with Ma Xiaoyun's (the left represents the candidate of the current office, and the right represents the ballot information received, the same below).
ERV 0 = = eRU 0zRO = = ZRV 0l: Ma Guoguo (69) > l: Ma Xiaoyun (56)
In the end, Ma Guoguo won because Ma Guoguo's myid 69 was bigger than Ma Xiaoyun's myid 56. Although Ma Xiaoyun won, the current voting statistics cannot be modified, because Ma Xiaoyun's vote in this round is the elected Ma Xiaoyun, and he needs to wait for him to re-vote before he can revise the voting statistics.
It will then be recorded in the ballot box:
This was followed by Ma Xiaoteng's vote:
ERV 0 = = eRV 0 ZRO = = ZRV 0 l: Ma Guoguo (69) > l: Ma Xiaoteng (49)
Ma Guoguo still wins!
Record the ballot box:
Every time the vote is received, Ma Guoguo will return the vote according to the current voting statistics, but it is a pity that the election still cannot be concluded, because the ending rule must have an office to get more than half of the votes. Now there is only one Ma Guoguo's own vote, which is not satisfied with more than half, so Ma Guoguo can only wait.
While Ma Guoguo is busy here, Ma Xiaoyun and Ma Xiaoteng are also doing the same thing.
2.2.2 Ma Xiaoyun's perspective
We omit the description of Ma Xiaoyun's process of recording his own votes. assuming that he received Ma Guoguo's vote first, how did he deal with it?
ERU 0 = = eRU 0zRU 0 = = ZRU 0l: Ma Xiaoyun (56)
< l: 马果果(69) 马小云看到自己认为的 Leader 候选人被马果果的选票击败了,所以将自己的候选人改为马果果,并将新的选票重新广播出去 然后在自己的投票箱中记录: 为了叙述的完整性,我们还是把马小腾的票也看完 e:0 == e:0 z:0 == z:0 l: 马果果(69) >L: Ma Xiaoteng (49)
Ma Guoguo still won, so Ma Xiaoyun's ballot box ended up like this:
Next, we should take Ma Xiaoteng as the main perspective and repeat the process just now, but we can think that it is almost the same as Ma Xiaoyun. in order to smooth the story, we need to return to Ma Guoguo's perspective, because Ma Xiaoyun changed the vote after losing to Ma Guoguo and sent another round of votes.
2.2.3 Ma Guoguo Perspective (again)
Ma Guoguo once again received Ma Xiaoyun's vote (after the vote was changed), the ballot box will be changed like this:
After receiving this vote, the current voting statistics will increase Ma Xiaoyun's record, and then Ma Guoguo will find that he has more than half of his votes this time, and then he will make a second confirmation. I will wait for a while to see if I can still receive the updated ballots. Here, assuming that no updated votes have been received, we will judge whether the current more than half of the candidates are themselves. If it is, then I am Leader. If not, I am Follower.
Obviously, Ma Guo Guo is Leader, and then he will change his status to LEADING.
At the same time, Ma Xiaoyun and Ma Xiaoteng also returned their tickets, and the result was that they were Follower, and their status was changed to FOLLOWING, and then each of them would synchronize the data with Leader. After the synchronization was completed, the entire office could provide services to the outside world.
2.3 Ma Xiaoteng has a power outage
The election itself involves communication between clusters, state management and state change of nodes. It is a relatively complex process. Just now I just gave an example of the simplest way to start an election process. Here are more examples to help you understand the logic of the entire election.
Now suppose that the office is safe and sound after providing services for a period of time, Ma Xiaoteng's office suddenly went out of power and could not communicate with the other two horses, while the other two horses knew when they had not received Ma Xiaoteng's message for a period of time. Something's wrong! However, according to their respective inventory, there are still two offices that can provide services to the outside world, which account for more than half of the total number of clusters, and can continue to allow villagers to handle business, so now the whole cluster has become like this:
After a while, Ma Xiaoteng's office was restored and reopened as a result of active repair by the power company, but each office was in a LOOKING state before opening, and would give priority to voting for itself, and would review the local archives to get the latest data of his office, assuming that Ma Xiaoteng was like this before the power outage:
ERV 0 ZRV 21 l: Ma Xiaoteng (49) LOOKING
He will send his own votes to the other two offices as before.
But unlike the previous situation, both Ma Guoguo and Ma Xiaoyun are now working. After receiving Ma Xiaoteng's vote, they will send him the current Leader, that is, Ma Guoguo's vote information as well as their current status.
Ballot information sent by Ma Guoguo:
ERV 0 ZRO 30 l: Ma Guo Guo (69) LEADING
Vote information sent by Ma Xiaoyun:
ERV 0 ZRO 30 l: Ma Guo Guo (69) FOLLOWING
After Ma Xiaoteng received your vote information, he knew that the current Leader was Ma Guoguo, and Ma Guoguo himself confirmed that it was LEADING status, so he immediately changed his status to FOLLOWING status, and would synchronize the data with Leader as before. As for the specific synchronization, I intend to save it for later to explain.
After synchronization, Ma Xiaoteng's state became the same as Ma Xiaoyun.
Let me assume that there is a parallel world here. When Ma Xiaoteng had just restored the power supply and was ready to go online, the state of Ma Xiaoteng at this time was assumed to be like this:
ERV 1 zvl 7 l: Ma Xiaoteng (49) LOOKING
Even if epoch is larger than the current Leader, in fact, it is more qualified to be Leader, but since other offices in the cluster already have a clear Leader, Ma Xiaoteng can only endure humiliation (who told you to lose power) or join the cluster as Follower, and still synchronize with the current Leader information, you can also understand it as downgrading (downgrading your epoch back to 0)
The workplace is so cruel, it may be different for you to take a long holiday and come back.
2.4 Ma Guo Guo is sick again.
Ma Guoguo, after all, is old and sick again, so the office can only close with tears, but unlike the last Ma Xiaoteng power outage, Ma Guoguo, as a Leader, stopped its service this time, because of the previous regulations, the whole office cluster must have a Leader. Now Ma Xiaoyun and Ma Xiaoteng find that Leader can not be contacted, which means that Leader can not serve, they know that they must choose a new Leader. So they changed their status to LOOKING status, and once again put the candidate as themselves, and re-broadcast their votes to other offices that could still provide services (the current scenario is to send votes to each other).
Whoever receives the vote and compares it will know that Ma Xiaoteng won.
ERV 1 = = eRV 1 ZRV 77 < ZRV 80 l: Ma Xiaoyun (56) l: Ma Xiaoteng (49)
Ma Xiaoyun will change his candidate to Ma Xiaoteng and then re-send his vote. Now Ma Xiaoteng has obtained 2 votes, and at the same time, it is also satisfied with more than half of the entire office cluster, so Ma Xiaoteng and Ma Xiaoyun have respectively changed their status to LEADING and FOLLOWING, and will, as said before, add one epoch to empty the counting part at the same time, and finally resume providing services to the villagers.
After Ma Guoguo is well, he will reopen the business as in the previous example, starting from the LOOKING status. Finally, after learning from the other two horses that the current Leader is Ma Xiaoteng, he will take the initiative to synchronize the data with Ma Xiaoteng and join the office cluster to provide services as Follower.
2.5 attracting investment
The popularity of the office was seen by the village committee, thinking that only three offices could achieve this effect. What if there were more offices? So I discussed with Sanma and decided to introduce social capital to foreign investment and let them set up a new office according to the existing model, so that the village committee did not have to pay a penny, and the villagers could get real benefits.
Picture
For a time, this move attracted a lot of attention from social capital, but after discussion, Sanma felt that if too many external forces were introduced, it would certainly weaken his power, so there was another rule that the three horses self-styled themselves as Participant, only the three of them were eligible to run for Leader, while the offices created by the introduced social capital could only provide read-only services as Observer in the cluster of offices, and were not qualified to compete in Leader. In this way, the throughput of the entire office cluster for read requests can be improved without increasing the complexity of the election.
To declare that the current node is Observer, you need to configure peerType=observer in zoo.cfg first
At the same time, the declared cluster information should be added at last: observer is used to identify, so that other nodes will know that the current myid of 1 and 2 is Observer.
Server.69=maguoguo:2888:3888 server.56=maxiaoyun:2888:3888 server.49=maxiaoteng:2888:3888 server.1=dongdong:2888:3888:observer server.2=jitaimei:2888:3888:observer
The initial Leader candidates in the LOOKING status of Observer will also choose themselves, but the vote information is set to this, for example:
E:Long.MIN_VALUE z:Long.MIN_VALUE l: East-West (1) LOOKING
Because epoch is set to a minimum, this vote is tantamount to non-existence and can be directly ignored, and a list of Participant is maintained in Sanma. If they receive votes from offices other than Participant, they will directly ignore them, so it can be said that the vote of Observer has no effect on the election result. Finally, waiting for the notification of the election results between Participant, Observer itself changes the status to OBSERVING, and begins to synchronize data with Leader, which is no different from Follower. After that, Observer and Follower will be collectively called Learner.
2.6 Summary
The election Leader looks at the three fields of epoch, write request Operand and myid, and compares the older one in turn who is more qualified to be a Leader.
More than half of the selected offices officially became Leader and changed their status to LEADING.
If other Participant is changed to FOLLOWING,Observer, it will be changed to OBSERVING.
If a Leader already exists in the cluster, other offices can directly follow the Leader if they join halfway.
It should also be mentioned that if there are less than half of the nodes that can provide services at present, the result of this election will never be elected, each node will always be in the state of LOOKING, and the whole office cluster will not be able to provide services.
Third, ape talk
Bullshit is over, now use our jargon to have some concepts to go a little deeper.
First of all, I have to say that the three horses in the story are described as three roles in order to achieve a certain program effect, but in practice, the ZK server will not make such a distinction. They are all the same code, starting according to different configurations, so that they can be divided into runtime Leader, Follower and Observer roles. So something closer to reality should be similar to the shadow avatar in the fire shadow or the remnant fist in the dragon ball (as if mixed with something strange).
I drew a simple flowchart of the election:
I have basically talked about other places. Here I will talk about the red part again. Because of some network factors, the votes sent out by the other party have not been received. The purpose of this re-broadcast vote is to enable the other side to re-send the votes they have just received.
Unlike listening client port 2181, server clusters communicate with each other directly using native Socket and not using NIO or Netty. Because there are only a few server nodes and each other will start a thread to listen, it directly adopts this relatively primitive and blocking way to communicate, which is more simple and direct, and assumes that the other party's service is not available. Socket will simply report an error to exit.
Sending and receiving ballots also adopts the producer-consumer model, which is very common in ZK, which maintains two blocking queues, one corresponding to the sent ballot and the other corresponding to the received ballot, and each uses a child thread to poll the blocking queue.
The previous ZK had three election strategies, and although the other two were previously abandoned and are not recommended, they can still be used forcefully through configuration files. However, in the latest 3.6.2, the other two strategies have been deleted directly from the source code, and now there is only one election strategy, corresponding to FastLeaderElection in the source code. I have not studied the other two, so I will not expand them.
About heartbeat detection between servers:
Heartbeat detection (PING) between servers is initiated by Leader and sent to other nodes in all clusters
After receiving the PING, Follower will return a PING to Leader and bring its own client session data.
When Leader receives the PING of Follower, it will make a session connection to these clients.
The above is what the election mechanism of ZooKeeper is, and the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 264
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.