Detailed explanation of Zookeeper (1): distribution and Zookeeper 07/19 Update SLTechnology News&Howtos

Detailed explanation of Zookeeper (1): distribution and Zookeeper

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

distributed

In distributed frameworks, the biggest problem facing distributed applications is data consistency. Zookeeper is a better solution. Play a coordinating role in a distributed framework.

What is Zookeeper?

Zookeeper is a high-performance distributed collaboration service and distributed data consistency solution created by Yahoo! and is an open source implementation of Google Chubby, so you naturally understand the relationship between Zookeeper and Chubby. Chubby is a distributed lock service, GFS and Big Table are using it to solve some of the problems of distributed collaboration, and its underlying consistency implementation is based on the Paxos algorithm.

Zookeeper can guarantee distributed consistency characteristics, including sequential consistency, atomicity, single view (no matter which ZK server the client connects to see the same data model), reliability, and real-time (the client can read the latest data state within a certain period of time rather than all servers updating immediately after submission).

Zookeeper's data model is a tree of nodes. After the service starts, all data is loaded into memory to improve server throughput and reduce latency.

Basic concepts of Zookeeper

Cluster Role:

Leader: The core role of the ZK cluster, elected to provide read and write services for clients, that is, to process transaction requests

Follower: Followers of the cluster state, participating in the election, not selected is this role, providing read services, that is, processing non-transaction requests, and forwarding received transaction requests to the Leader server

Observer: Observer role, does not participate in elections, but provides data reading services, provides reading services, that is, handles non-transaction requests, and forwards received transaction requests to the Leader server.

Conversation:

Client and ZK connection, client and ZK connect a TCP long connection to maintain the session, through this connection can detect the heartbeat and save the session with the server, can also send requests and receive server responses, can also receive WATCH events.

Data nodes:

There are two types of nodes

A machine in a cluster is called a node.

There is also a Znode in the tree data unit, also called a node, with persistent nodes and temporary nodes.

Version:

The version of ZK is different from the version we normally understand. It records node data or the category of node children or the number of ACL modifications. There is a data structure called STAT that records the following version information.

version: version number of the data content of the current data node

cversion: Version number of child node of current data node

Aversion: ACL version number of the current data node

This version can be used to implement distributed lock services

Pessimism lock: Pessimism concurrent lock, also known as exclusive lock, to avoid different transaction pairs on the same data concurrent update operation data inconsistency

Optimistic locking: Think that different transactions accessing the same data rarely interfere with each other, so there is no need to do strict concurrency control, but it is also a lock. For example, add a version number to each database table. Reading the data before modifying the data will naturally read out the version number. When updating, use this version number. If the version number is 1, use 1 when updating. If the update fails, it means that other things have modified the data. At this time, other subsequent processing is needed.

watcher：

ZK allows users to register watchers on the node, and when data changes, ZK sends notifications of the changes to the client.

ACL Permission Control:

Permission control can be set on nodes

CREATE: Create read permissions for child nodes

READ: Access to node data and child node lists

WRITE: Permission to update node data

DELET: Permission to delete child nodes

ADMIN: Set permissions for node ACL

ZAB protocol

Zookeeper is based on PAXOS algorithm, but it also has its own core algorithm is ZAB, atomic message broadcasting protocol. This protocol was designed separately for Zookeeper and is a crash recoverable atomic message broadcasting algorithm.

Zookeeper uses a single master process to receive and process all requests from clients, broadcasting the state of the server as a transaction to all replica processes.

The ZAB protocol consists of two basic modes, crash recovery and message broadcasting.

During cluster startup, if the Leader is disconnected, crashed or restarted, ZAB will enter recovery mode and elect a new Leader. When a new Leader is generated and more than half of the corruption in the cluster completes the state synchronization (data synchronization) with the Leader, ZAB will exit recovery mode and enter message broadcast mode.

If a new server is added to the current cluster, the new server automatically enters recovery mode and enters message broadcast mode after synchronization with the cluster Leader is completed.

The Leader server receives a client transaction request and generates a transaction proposal and broadcasts it; if a non-Leader server receives a client request, it forwards the request to the Leader server.

Message broadcast:

Before broadcasting a transaction, the Leader server assigns the transaction a globally monotonically increasing unique ID, the transaction ID (ZXID). Each transaction must be processed in the order of ZXID. And the Leader server assigns a separate queue to each Follower, and then puts the transactions that need to be broadcast into the queue. After each Follower server receives this transaction, it will write it to the local disk in the form of transaction log. After successful writing, it will feed back an ACK to the Leader. When the Leader receives half of the ACK response, it will broadcast a Commit message to all followers, informing them to commit. At the same time, the Leader will complete its own commit.

Crash Recovery:

The goal is to ensure that a new Leader is elected as soon as possible and notified to other followers, while ensuring that the data state is consistent throughout the cluster.

The ZAB protocol discards transactions that are only proposed at the Leader server.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.