How to understand the distributed lock based on Zookeeper 07/04 Update SLTechnology News&Howtos

How to understand the distributed lock based on Zookeeper

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about how to understand Zookeeper-based distributed locks, which may not be understood by many people. in order to make you understand better, the editor has summarized the following for you. I hope you can get something from this article.

At present, there are three popular schemes to realize distributed lock, which are based on database, Redis and Zookeeper, among which the first two schemes have a lot of information to refer to on the network. Let's take a look at how distributed locks are implemented using Zookeeper.

What is Zookeeper?

Zookeeper (industry referred to as zk) is a centralized service that provides configuration management, distributed collaboration and naming. These functions are very low-level and indispensable basic functions in distributed systems, but it is actually very difficult to achieve high throughput and low latency while maintaining consistency and availability. So zookeeper provides these functions, and developers build their own distributed systems on top of zookeeper.

Although the implementation of zookeeper is complex, the model abstraction it provides is very simple. Zookeeper provides a multi-level node namespace (the node is called znode), each node is represented by a path separated by a slash (/), and each node has a parent node (except the root node), much like a file system. For example, / foo/doo represents a znode whose parent node is / foo, parent node is /, and / is the root node without a parent node. Unlike the file system, these nodes can set the associated data, while in the file system, only the file node can store the data, but not the directory node. In order to ensure high throughput and low latency, Zookeeper maintains this tree-like directory structure in memory, which makes Zookeeper can not be used to store a large amount of data, and the upper limit of data storage per node is 1m.

In order to ensure high availability, zookeeper needs to be deployed in the form of a cluster, so that as long as most of the machines in the cluster are available (able to tolerate certain machine failures), the zookeeper itself is still available. When using zookeeper, the client needs to know the list of cluster machines and use the service by establishing a TCP connection with a machine in the cluster. The client uses this TCP link to send requests, get results, get listening events, and send heartbeats. If the connection is disconnected abnormally, the client can connect to another machine.

The architecture diagram is as follows:

The client's read request can be processed by any machine in the cluster, and if the read request registers a listener on the node, the listener is also handled by the connected zookeeper machine. For write requests, these requests will be sent to other zookeeper machines at the same time and agreed upon before the request returns success. Therefore, as the number of cluster machines in zookeeper increases, the throughput of read requests increases, but the throughput of write requests decreases.

Ordering is a very important feature in zookeeper. All updates are globally ordered, and each update has a unique timestamp called zxid (Zookeeper Transaction Id). The read request will only be ordered relative to the update, that is, the return result of the read request will have the latest zxid of this zookeeper.

How to use zookeeper to implement distributed locks?

Before describing the algorithm flow, take a look at a few interesting properties about nodes in zookeeper:

Ordered node: if there is currently a parent node with / lock, we can create a child node under this parent node Zookeeper provides an optional ordering feature, for example, we can create a child node "/ lock/node-" and indicate the order, then zookeeper will automatically add an integer sequence number according to the current number of child nodes when generating child nodes, that is, if it is the first created child node, the generated child node is / lock/node-0000000000, the next node is / lock/node-0000000001, and so on.

Temporary node: the client can establish a temporary node, which is automatically deleted by zookeeper after the session ends or the session times out.

Event listening: when reading data, we can set event listening on the node at the same time, and zookeeper will notify the client when the node data or structure changes. There are four kinds of events in zookeeper: 1) Node creation; 2) Node deletion; 3) Node data modification; 4) Child Node change.

The following describes the algorithm flow of using zookeeper to implement distributed locks, assuming that the root node of the lock space is / lock:

The client connects to the zookeeper and creates temporary and ordered child nodes under / lock, with the first client corresponding to / lock/lock-0000000000, the second / lock/lock-0000000001, and so on.

The client acquires the list of child nodes under / lock to determine whether the child node it creates is the child node with the lowest sequence number in the current child node list. If so, it thinks that it has acquired the lock, otherwise it listens for the child node change message of / lock, and repeats this step until the lock is obtained after receiving the notification of the child node change.

Execute business code

After completing the business process, delete the corresponding child node to release the lock.

The temporary node created in step 1 ensures that the lock can be released in the event of a failure. Consider this scenario: if the child node currently created by client an is the node with the lowest sequence number, the client machine goes down after obtaining the lock, and the client does not actively delete the child node; if a permanent node is created, the lock will never be released, resulting in a deadlock Because the temporary node is created, after the client downtime, the zookeeper does not receive the heartbeat packet of the client after a certain period of time to judge the session invalidation, delete the temporary node and release the lock.

In addition, careful friends may think of the atomicity of getting the list of child nodes and setting listening in step 2. Consider this scenario: the client a corresponds to the child node / lock/lock-0000000000, the client b corresponds to the child node / lock/lock-0000000001, and the client b finds that it is not the one with the lowest sequence number when obtaining the child node list. But before setting up the listener, client a finishes the business process and deletes the child node / lock/lock-0000000000. Doesn't the listener set by client b lose this event and cause it to wait forever? This problem does not exist. Because the operation and read operation of setting listeners in API provided by zookeeper are performed by atoms, that is, listeners are set up when reading the list of child nodes to ensure that events are not lost.

Finally, there is a great optimization point for this algorithm: if there are currently 1000 nodes waiting for the lock, if the client that acquired the lock releases the lock, all 1000 clients will be awakened, which is called the "herding effect". In this herding effect, zookeeper needs to notify 1000 clients, which blocks other operations, and it is best to wake up only the clients corresponding to the new smallest node. What should I do? When setting event listening, each client should set event listening on the child nodes just before it, for example, the list of child nodes is / lock/lock-0000000000, / lock/lock-0000000001, / lock/lock-0000000002, the client with sequence number 1 listens for the child node with sequence number 0 to delete the message, and the child node with sequence number 2 with sequence number 1 deletes the message.

So the flow of the adjusted distributed locking algorithm is as follows:

The client connects to zookeeper and creates temporary and ordered child nodes under / lock, with the first client corresponding to / lock/lock-0000000000, the second / lock/lock-0000000001, and so on

The client acquires the list of child nodes under / lock to determine whether the child node it creates is the child node with the lowest sequence number in the current child node list, and if so, it thinks that it has acquired the lock, otherwise it listens to the child node just before it deletes the message, and repeats this step until the lock is obtained after receiving the notification of the child node change

Execute business code

After completing the business process, delete the corresponding child node to release the lock.

Source Code Analysis of Curator

Although the API exposed by the zookeeper native client is very concise, it is troublesome to implement a distributed lock. We can directly use the zookeeper distributed lock implementation provided by curator, an open source project.

We just need to introduce the following package (based on maven):

Org.apache.curator curator-recipes 4.0.0

And then you can use it! The code is as follows:

Public static void main (String [] args) throws Exception {

/ / create a client for zookeeper

RetryPolicy retryPolicy = new ExponentialBackoffRetry (1000, 3)

CuratorFramework client = CuratorFrameworkFactory.newClient ("10.21.41.181 purl 2181 Magi 10.21.42.47 Frey 2181", retryPolicy)

Client.start ()

/ / create a distributed lock. The root node path of lock space is / curator/lock

InterProcessMutex mutex = new InterProcessMutex (client, "/ curator/lock")

Mutex.acquire ()

/ / acquire the lock and proceed with the business process

System.out.println ("Enter mutex")

/ / complete the business process and release the lock

Mutex.release ()

/ / close the client

Client.close ()

}

You can see that the only key core operations are mutex.acquire () and mutex.release (), which is so convenient!

Let's analyze the source code implementation of the lock. The method of acquire is as follows:

/ *

* acquire a lock, which will block waiting when the lock is occupied. This operation supports reentrant with the same thread (that is, repeatedly acquiring the lock), and the number of acquire needs to be the same as that of release.

* @ throws Exception ZK errors, connection interruptions

, /

@ Override

Public void acquire () throws Exception {

If (! internalLock (- 1, null)) {

Throw new IOException ("Lost connection while trying to acquire lock:" + basePath)

}

It should be noted here that when there is an exception in communication with zookeeper, acquire will directly throw an exception, requiring the user to retry the strategy. InternalLock (- 1, null) is called in the code, and the argument indicates that the wait is permanently blocked when the lock is occupied. The code for internalLock is as follows

Private boolean internalLock (long time, TimeUnit unit) throws Exception {

/ / the reentrability of the same thread is dealt with here. If the lock has been obtained, only the number of times of acquire is added to the corresponding data structure, and the success is returned directly.

Thread currentThread = Thread.currentThread ()

LockData lockData = threadData.get (currentThread)

If (lockData! = null) {

LockData.lockCount.incrementAndGet ()

Return true

}

/ / this is where you really go to zookeeper to acquire locks.

String lockPath = internals.attemptLock (time, unit, getLockNodeBytes ())

If (lockPath! = null) {

/ / after acquiring the lock, record the current thread to obtain the lock information. When reentering, you only need to increase the number of times in the LockData.

LockData newLockData = newLockData (currentThread, lockPath)

ThreadData.put (currentThread, newLockData)

Return true

}

/ / the lock is still not obtained when the blocking returns, where the context handling implies that the zookeeper communication exception

Return false

}

Specific comments have been added to the code without expansion. Take a look at the specific implementation of zookeeper acquisition lock:

String attemptLock (long time, TimeUnit unit, byte [] lockNodeBytes) throws Exception {

/ / Parameter initialization, omitted here

/ /...

/ / spin acquisition lock

While (! isDone) {

IsDone = true

Try {

/ / create temporary and ordered child nodes under the lock space

OurPath = driver.createsTheLock (client, path, localLockNodeBytes)

/ / determine whether to obtain the lock (the sequence number of the child node is the lowest), and return directly if the lock is obtained, otherwise the blocking waits for the notification of deletion of the previous child node.

HasTheLock = internalLockLoop (startMillis, millisToWait, ourPath)

} catch (KeeperException.NoNodeException e) {

/ / for NoNodeException, the code ensures that NoNodeException will be thrown here only if session expires, so retry is carried out here according to the retry policy.

If (client.getZookeeperClient (). GetRetryPolicy (). AllowRetry (retryCount++, System.currentTimeMillis ()-startMillis, RetryLoop.getDefaultRetrySleeper () {

IsDone = false

} else {

Throw e

}

/ / return the path of the child node if the lock is obtained

If (hasTheLock) {

Return ourPath

}

Return null

}

There are two main steps in the above code:

Driver.createsTheLock: create temporary and ordered child nodes, which are relatively easy to implement without expansion, and mainly focus on several node modes: 1) PERSISTENT (permanent); 2) PERSISTENTSEQUENTIAL (permanent and ordered); 3) EPHEMERAL (temporary); 4) EPHEMERALSEQUENTIAL (temporary and ordered).

InternalLockLoop: blocks waiting until the lock is acquired.

Take a look at how internalLockLoop determines locks and blocking waits. Here, some irrelevant codes are removed, leaving only the main process:

/ / spin until the lock is acquired

While ((client.getState () = = CuratorFrameworkState.STARTED) & &! haveTheLock) {

/ / get a list of all the child nodes and sort them by sequence number from smallest to largest

List children = getSortedChildren ()

/ / judge whether the current child node is the smallest child node according to the sequence number

String sequenceNodeName = ourPath.substring (basePath.length () + 1); / / + 1 to include the slash

PredicateResults predicateResults = driver.getsTheLock (client, children, sequenceNodeName, maxLeases)

If (predicateResults.getsTheLock ()) {

/ / if it is the smallest node, the lock is considered to be acquired.

HaveTheLock = true

} else {

/ / otherwise get the previous child node

String previousSequencePath = basePath + "/" + predicateResults.getPathToWatch ()

/ / the object monitor is used for thread synchronization here. When the lock is not obtained, the previous child node delete message is monitored and wait () is performed. When the current child node is deleted (that is, lock release)

/ / the callback wakes up the thread through notifyAll, and the thread continues to spin to determine whether to acquire the lock.

Synchronized (this) {

Try {

/ / the getData () interface is used instead of checkExists () because if the previous child node has been deleted, an exception is thrown and the event listener is not set.

/ / although checkExists can also get the information about the existence of the node, it also sets a listener, which will never be triggered, which belongs to resource leakage for zookeeper.

Client.getData () .usingWatcher (watcher) .forPath (previousSequencePath)

/ / if the waiting time for blocking is set

If (millisToWait! = null) {

MillisToWait-= (System.currentTimeMillis ()-startMillis)

StartMillis = System.currentTimeMillis ()

If (millisToWait

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.