II. Basic principles of zookeeper-- 04/28 Update SLTechnology News&Howtos

II. Basic principles of zookeeper--

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

I. Overview

1. Basic overview

Zookeeper is an open source distributed Apache project that provides coordination services for distributed applications. Important components of Hadoop and Hbase. It is a software that provides consistent services for distributed applications.

2. What does zk provide

Although you can implement many functions with zk, in fact zk provides only three things: the file system, the notification mechanism, and the cluster management mechanism.

(1) File system

The structure of the data stored in zk, which is similar to a file system, is as follows:

Figure 1.1 zk file system

Each node is called znode, each znode is a structure similar to KV, the name of each node is equivalent to key, and the corresponding data is saved in each node, similar to the value corresponding to Key. There can be multiple child nodes under each znode, which goes on and on, forming an architecture similar to the Linux file system.

(2) Notification mechanism

When a client listens on a node (watch mechanism, described later), zk notifies the client listening to the node when the node changes (perhaps by adding more child nodes, or when the node value changes, etc.). What to do in the future depends on the processing logic of the client.

(3) Cluster management mechanism

Zk itself is a cluster structure with a leader node responsible for writing requests and multiple follower for responding to read requests. And when the leader node fails, the new leader will be automatically selected from the remaining follower according to the election mechanism.

3. Characteristics

1) Zookeeper: a cluster of leaders (leader) and multiple followers (follower). 2) Leader is responsible for initiating and deciding votes and updating the status of the system. 3) Follower is used to receive customer requests and return results to the client, and participate in voting during the election of Leader. 4) as long as more than half of the odd servers in the cluster survive, the Zookeeper cluster will be able to serve normally. 5) Global data consistency: each server keeps a copy of the same data, and the data is consistent no matter which server the client is connected to. 6) the update requests are carried out sequentially, and the update requests from the same client are executed in the order in which they are sent. 7) data update atomic surname, a data update is either successful or failed. 8) Real-time surname, within a certain time range, client can read the latest data.

Second, the concept and principle of zk 1. The role classification of nodes in zk leader: 1. Restore data; 2. Maintain the heartbeat with Learner, receive Learner request and judge the request message type of Learner; the main message types of 3.Learner are PING message, REQUEST message, ACK message and REVALIDATE message, which are processed differently according to different message types. PING message refers to the heartbeat information of Learner; REQUEST message is the proposal information sent by Follower, including write request and synchronous request; ACK message is the reply of Follower to the proposal, and if more than half of the Follower is approved, commit the proposal; REVALIDATE message is used to prolong the validity period of SESSION.

Follower: 1) send a request (PING message, REQUEST message, ACK message, REVALIDATE message) to Leader; 2) receive Leader message and process it; 3) receive a request from Client, if it is a write request, send it to Leader for processing; 4) return Client result.

Follower's message cycle processes the following messages from Leader: 1) PING message: heartbeat message 2) PROPOSAL message: proposal initiated by Leader, asking Follower to vote 3) COMMIT message: information on the latest proposal on the server side 4) UPTODATE message: indicating that synchronization is completed 5) REVALIDATE message: based on the REVALIDATE result of Leader, the session waiting for revalidate is closed or allowed to accept message 6) SYNC message: return the SYNC result to the client This message was originally initiated by the client to force the latest updates.

Observer: general term for learner:follower and observer that are similar to follower but do not participate in voting and elections

2. Types of nodes in zk

Transient node-ephemeral: after the client and server are disconnected, the created node is deleted by itself. And in the process of not being disconnected, this temporary node is visible to other clients. It can also be divided into ordinary short-lived nodes and short-lived nodes with sequence numbers. A brief node with a sequence number is called ephemeral_sequential, which means that a sequence number is added to the end of the node's name to indicate the order.

Persistent node-persistent: the created node is permanent, even if the client and server are disconnected. It is also divided into ordinary and serial numbers, the difference is similar to the above

3. The principle of zk election leader

First of all, let's talk about a few related concepts: zxid: every time you modify the data in the zk and the change in the state of the zk (such as an election, etc.), there will be a string of zxid numbers, which will increase gradually as the transaction increases. So small zxid transactions must be the first to happen.

Myid (sid): the server number of each zk node, as specified in the configuration file, must be globally unique.

Several states of zk: looking-- says it is searching for leader leading--. After the election of leader following--leader, it is synchronizing data between the new leader and follower. Observing:observer is accepting the election results.

The following is a formal talk about the election process: the election of leader is based on the paxos algorithm. Here, without going into the principle of the algorithm, we will simply talk about the election process. (1) the first round of voting: all living zk nodes will vote for themselves and broadcast the results to other nodes. Because we don't know about the other nodes at this time, there are two key messages on the ballot: the latest zxid and sid of the current node. The larger the zxid, the newer the data owned by the node, so the node with large zxid will be elected as leader first; if the zxid is the same, the larger sid,sid will be elected as leader. And because sid is globally unique, according to this principle, a unique leader must be selected. (2) after receiving the vote from other nodes, get the zxid and sid in the ballot paper, select the latest one according to the comparison principle of (1), then update the ballot paper, send the new ballot paper, and mark the ballot paper as "second round ballot". We should note that because the time at which each node receives ballots from other nodes is generally inconsistent, different nodes will have delays, which will lead to the emergence of ballots with different election rounds. So when the node finds that the number of votes it receives is larger than the number of votes it currently cast, it directly updates the current ballot and then votes it out. And each node will file ballots (the same round of ballots). If it is found that no node has more than half of the votes, the node will continue to cast the latest (zxix,sid) ballots. (3) after going through the (2) process for many times, if a node archives ballots and finds that there is more than half of the ballots, it will stop voting. If the node finds the result of the archive statistics and finds that leader is itself, it will broadcast to other nodes that I am leader. (4) when other nodes receive new leader messages, they will start data synchronization between leader and follower.

4. Node meta-information stat

When we use stat path or get path to view node meta-information, there will be a lot of information items, so what does each item represent?

1) czxid- causes the zxid created by the znode, and the zxid of the transaction that creates the node receives a timestamp in the form of zxid, that is, the ZooKeeper transaction ID, each time the ZooKeeper state is modified. Transaction ID is the total order of all modifications in ZooKeeper. Each modification has a unique zxid, and if zxid1 is less than zxid2, then zxid1 occurs before zxid2. 2) number of milliseconds that ctime-znode was created (since 1970) 3) mzxid-znode last updated zxid 4) number of milliseconds last modified by mtime-znode (since 1970) 5) pZxid-znode last updated child node zxid 6) cversion-znode child node change number, znode child node change number 7) dataversion-znode data change number 8) aclVersion-znode access control list change number 9) if ephemeralOwner- is a temporary node This is the session id of the znode owner. 0 if it is not a temporary node. 10) data length of dataLength- znode 11) number of numChildren-znode child nodes

5. Zk writing process

Figure 2.1 zk write data flow

Reading is a local surname, that is, client only needs to read data from the follower connected to it; when writing a request, follower forwards the request to leader,leader for broadcast and execution in the form of a transaction (transaction). What is the process? (1) leader will send a PROPOSAL proposal message to all follower. (2) A follower receives the PROPOSAL message, writes it to disk, and sends an ACK message to leader to inform it that it has been received. (3) when the Leader receives the ACK of the follower of the quorum (quorum), the commit message is sent to execute. Note: only after the commit is sent, the changes will be committed, otherwise it will be backed back; if the write timeout is found, the update operation will be reversed.

6. The principle of listener

1) first, there must be a main () thread 2) create a ZK client in the main thread, then two threads will be created, one responsible for network connection communication (connect) and one responsible for listening (listener) 3) send registered listening events to ZK 4) add registered listeners to the list of registered listeners in ZK. 5) ZK listens when data or path changes. This message is sent to the listener thread via the connect thread. 6) the process () method is called inside the Listener thread.

Under the command line: the client listens for changes in the child nodes of a node in a way similar to ls path watch. When the child nodes of the node change, the client will be notified and the following information will be displayed:

WATCHER::

WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/

WatchedEvent state:SyncConnected--- event status: synchronous updates

Type:NodeChildrenChanged--- type: the child node of the node changes

Path:/-- path: /

3. Application scenario 1. Unified naming service

Figure 3.1 zk Unified naming Service

The so-called naming service is to name resources for better positioning of resources. The file system structure of zk itself can create nodes with path names, which are used to store information such as server addresses. And the name of each node will not be repeated, strictly in accordance with the limitations of the file system. The distributed service framework DUBBO developed by Alibaba uses zookeeper as its naming service to maintain a global list of servers. When the server starts, a node representing itself is created under a certain path of ZK, and the value of the node stores the service address of the corresponding service (such as URL address).

2. Unified configuration management

Figure 3.2 zk Unified configuration Management

In the services of the environment in the cluster, the same program will run on multiple machines, making it easy to scale out. Programs generally require some configuration information, and it becomes difficult to change the configuration one by one if the programs are deployed on multiple machines. At the same time, with the popularity of various distributed systems and micro-services, how to effectively manage the configuration of different components is also an important problem. You can now create a node on zk with configuration information as the value of the node. Then the relevant applications listen to this node. When the data in the node changes, zk notifies the listeners, "something has changed." These listeners will automatically go to zk to get the changed configuration information, and then how to deal with it depends on the specific business logic. In fact, this is using the zk watcher mechanism, that is, publish / subscribe mechanism, the client can register watcher with the zookeeper server, subscribe to the node they are interested in, and when the corresponding node changes, the zookeeper server will publish a notification to the client. Similar scenarios can also be used in highly available services, such as two database servers, one master and one standby.

Figure 3.3 zk unified configuration management-database master / slave

Both master and slave data run a zk monitoring process client, which is used to monitor the available last name of the database service. When the service is available, the client of the master / slave database establishes a persistent session with zk and sends a heartbeat message to zk to indicate that it is running normally. When zk receives the heartbeat information, it creates a corresponding node on the node, and the node data is the address of the currently available master data (actually, it's not that simple, it's just simplified and easy to understand). If any of the database services are offline, the corresponding nodes on the zk will also be deleted.

3. Dynamic online and offline perception of service

Figure 3.4 dynamic online and offline awareness of zk services

For some complex distributed systems, there are many different components in the system, that is, there are many different services. There are also multiple servers for concurrent processing in the same service. So how do you know which of these services are available and which are not? You need to use zk. When the service comes online, it creates a zkclient, maintains a persistent session with the zk, and then creates a temporary znode under a specific directory of the zk (znode disappears as soon as the session is disconnected). Clients that need to access these services will listen in this directory (watcher mechanism) and listen through the api of getChildren (). When the nodes in the directory change, it means that those services are offline or online. At this point, zk will notify the client listening on (subscribing) the node to get the latest list of available services. In this way, you can dynamically go up and down the available services.

4. Unified cluster management

Zk can be used to store the status information of each node in other business clusters, especially in some master-slave clusters. When a failover occurs, the status information of each current node can be obtained from zk. Typically, for example, the election of the master node of HBase is achieved through zk coordination state. Each node in the cluster will create its own child znode under a corresponding znode to save its own state. Then each node listens under this znode, that is, to listen for changes in the state of each node. Normally, if a cluster does not fail, all node information stored in zk will not change. If there is a change in zk, it means that there is a failure in the cluster. Subsequent clusters can use the node state data in zk for further fault handling, such as electing a new master and so on.

5. Soft load balancing

Figure 3.5 zk soft load balancing

For each service node registered under the service list, the data under the corresponding znode stores the service address and the number of visits currently processed. When the client comes to obtain the service list, a service node with less visits can be selected to distribute the request. Achieve a certain role of load balancing.

6. Distributed lock

In the use of some resources, in order to prevent the reading and writing of different requests from interfering with each other, there is the concept of lock. For example, in the case of concurrency, buy a limited number of goods. In order to ensure that the quantity of goods is safe, we need to use the lock mechanism. Locks are divided into shared locks (also called read locks) and exclusive locks (also called write locks). When a shared lock is acquired, all transactions can read the data, but none of them can modify the data. When an exclusive lock is acquired, only the current transaction can read and modify the data, not other transactions. In distributed systems, distributed locks must be used to restrict some resources, which is more complex than traditional non-distributed locks. You can use zk to implement it, and there are many ways to implement it, one of which is described below.

First, the person who needs to acquire the lock creates a temporary node in a specific directory of the zk and indicates whether it is a read lock or a write lock. So, all the nodes in this directory are the people who want to acquire the lock. Then listen in this directory, there is a notice, it means that there is a new person to acquire the lock. Because there are many people who want to acquire locks, there must be an order, which is determined by the order of the nodes. Then, there is the problem of the order of nodes. If they are all read locks, then the order will not be affected; if they are all write locks, the order of acquiring write locks must be determined strictly according to the order of nodes; if the two kinds of locks are mixed, the order of nodes will affect the order of acquiring locks. For such a scenario, the implementation logic is as follows: (1) if you currently need to acquire a read lock, determine whether there is a write node in front of the node whose sequence number is smaller than its own. If so, you must wait for the write node to acquire the write lock before it is released; if not, it means that all the read locks in front of you are read locks, and you can acquire the read locks together without waiting. (2) if you currently need to acquire a write lock, you need to see if you are the node with the lowest sequence number. If not, it means that the previous write node and read node, no matter which one is, cannot allow yourself to acquire the write lock. Finally, because the nodes created are temporary nodes, as long as the session is disconnected, it will be deleted automatically, and the lock will be released.

7. Queue management

In distributed system, a very important component is message queue, which can realize application decoupling, asynchronous message and traffic reduction. First of all, the traditional user requests are sent directly to the server by the requestor, and the server returns the result to the requester after the processing is completed, which leads to a problem that the requester will block after sending the request. Nothing can be done before receiving a reply from the server, and if a message queue is introduced, the processing process will become that the requester sends the request to the message queue first. Then the server gets the message that needs to be processed from the message queue, and after the processing is completed, the server returns the result to the requestor. In this way, the requester can just send the request to the message queue and do something else. In this way, the asynchronous processing of the request is realized, and the coupling between the requester and the server is uncoupled. In a typical high-traffic scenario, when a large number of requests pour into the server, the service will be paralyzed, so there needs to be a flow restriction measure, but the request will not be lost. So you need a component to temporarily store the request and then forward it to the server for processing at a certain speed, which is the message queue. This practice is called traffic abatement. There is also a typical panic buying scenario in which a large number of panic buying requests pour into the server to prevent the server from crashing. You can use message queues and set the size of the queue to the number of panic purchases. If you exceed it, you will return a failure and you will not be able to grab anything. The successful ones will be temporarily stored in the queue for the server to deal with slowly. The zk implementation queue is as simple as creating nodes in a specific directory to store requests. The server then reads the requests in the node from this directory, processes them, and returns the results. Delete the node when the processing is complete.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.