The Choice of zookpeer and redis how to realize distributed Lock 09/13 Update SLTechnology News&Howtos

The Choice of zookpeer and redis how to realize distributed Lock

2025-09-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "the choice of zookpeer and redis distributed lock how to achieve", the content of the article is simple and clear, easy to learn and understand, now please follow the editor's ideas slowly in depth, together to study and learn "how to achieve the choice of zookpeer and redis distributed lock" bar!

Text

Come to a conclusion: the reliability of zookpper is much stronger than redis, but the efficiency is a little inefficient. If the concurrency is not particularly large, the pursuit of reliability, the first choice is zookpeer. For the sake of efficiency, redis implementation is preferred.

Why use distributed locks?

The purpose of using distributed locks is to ensure that only one client can operate on shared resources at a time. But Martin points out that according to the purpose of the lock, it can also be subdivided into two categories

(1) if multiple clients are allowed to operate on shared resources, the operation on shared resources must be idempotent, and no matter how many times you operate, there will be no different results. The use of locks here is nothing more than to avoid repeated operations to share resources so as to improve efficiency.

(2) in the case that only one client is allowed to operate on shared resources, the operation on shared resources is generally non-idempotent. In this case, if multiple client operations share resources, it may mean that the data is inconsistent and the data is lost.

The first round, stand-alone situation comparison

(1) redis

Let's start with locking. According to the description of the redis official website documentation, use the following command to add the lock.

SET resource_name my_random_value NX PX 30000

My_random_value is a random string generated by the client, which is equivalent to the flag that the client holds the lock.

NX indicates that SET can succeed only if the corresponding key value of resource_name does not exist, which means that only the client of the first request can acquire the lock.

PX 30000 indicates that the lock has an automatic expiration time of 30 seconds.

As for unlocking, to prevent the lock acquired by client 1 from being released by client 2, use the following Lua script to release the lock

If redis.call ("get", KEYS [1]) = = ARGV [1] then

Return redis.call ("del", KEYS [1])

Else

Return 0

End

When executing this LUA script, the value of KEYS [1] is resource_name,ARGV [1] and the value of my_random_value. The principle is to first obtain the my_random_ value corresponding to the lock, which is equal to the value worn by the client, so that you can prevent your lock from being released by others. In addition, the adoption of Lua script operation ensures atomicity. If it is not an atomic operation, the following occurs

Analysis: this redis add and unlock mechanism looks perfect, but there is an inevitable handicap, which is how to set the expiration time. If the lock expires due to long-term blocking in the process of operating the shared resource, then it is not safe to access the shared resource. However, some people will say

After the client has finished operating the shared resource, it can determine whether the lock is still owned by the client, and if it still belongs to the client, commit the resource and release the lock. If it is not owned by the client, the resources will not be submitted.

OK, by doing so, can only reduce the probability that multiple client operations will share resources, and will not solve the problem. In order to make it easier for readers to understand, the blog presents a business scenario.

Business scenario: we have a content modification page. In order to avoid multiple client requests to modify the same page, we use distributed locks. Only the client that acquires the lock can modify the page. Then the process of modifying a page normally is shown in the following figure

Note that step (3)-- > step (4.1) above is not an atomic operation. That is to say, you may return the valid flag bit at step (3), but during transmission, due to delay and other reasons, the lock has expired at step (4.1). Then, at this point, the lock is acquired by another client lock. There is a situation in which two clients work together to operate shared resources.

You can think about it, no matter how you use any means of compensation, you can only reduce the probability of multiple client operations sharing resources, but can not be avoided. For example, you may also have a long GC pause during step 4. 1, and then the lock timeout expires during the pause, so that the lock may also be acquired by other clients. You can think about these for yourself.

(2) zookpeer

First briefly talk about the principle, according to the online document description, the distributed lock principle of zookpeer is to make use of the characteristics of temporary nodes (EPHEMERAL).

When znode is declared as EPHEMERAL, if the client that created the znode crashes, the corresponding znode will be automatically deleted. This avoids the problem of setting the expiration time.

The client tries to create a znode node, such as / lock. Then the first client is created successfully, which is equivalent to getting the lock, while the other clients fail to create (znode already exists) and fail to acquire the lock.

Analysis: in this case, although the problem of setting effective time is avoided, it is still possible for multiple client operations to share resources. You should know that if Zookpeer cannot detect the heartbeat of the client for a long time (Session time), it will think that the Session has expired, then all ephemeral type znode nodes created by this Session will be automatically deleted. At times like this, there will be the following situations.

As shown in the figure above, when client 1 has a GC pause, the zookpeer cannot detect the heartbeat, and it is also possible for multiple clients to operate on shared resources at the same time. Of course, you can say that we can tune through JVM to avoid GC pauses. Note, however, that what we do can only avoid sharing resources among multiple client operations as much as possible, and cannot be completely eliminated.

The second round, cluster situation comparison

In our production, we usually use the cluster case, so the stand-alone case discussed in the first round. As a warm-up for everyone.

(1) redis

For the high availability of redis, it is common to attach a slave to the node of redis, and then use Sentinel mode to switch between master and slave. However, because the master-slave replication (replication) of Redis is asynchronous, this may occur in the process of data synchronization, master is down, and slave has no time to synchronize data is selected as master, resulting in data loss. The specific process is as follows:

(1) client 1 acquired the lock from Master.

(2) Master is down, and the key storing locks has not been synchronized to the Slave yet.

(3) upgrade Slave to Master.

(4) client 2 acquires the lock corresponding to the same resource from the new Master.

In order to deal with this situation, antirez, the author of redis, proposed the RedLock algorithm as follows (the process comes from the official document), assuming that we have N master nodes (N is set to 5 in the official document, which is actually equal to 3).

(1) get the current time in milliseconds.

(2) take turns to request locks on N nodes with the same key and random values. In this step, when the client requests a lock on each master, there will be a much smaller timeout than the total lock release time. For example, if the lock automatic release time is 10 seconds, the timeout period for each node lock request may be in the range of 5-50 milliseconds, which can prevent a client from blocking on a down master node for too long. If a master node is unavailable, we should try the next master node as soon as possible.

(3) the client calculates the time it takes to acquire the lock in the second step. Only if the client successfully acquires the lock on most master nodes (3 in this case), and the total time consumed does not exceed the lock release time, the lock is considered to have been acquired successfully.

(4) if the lock acquisition is successful, the lock automatic release time is now the initial lock release time minus the time it takes to acquire the lock.

(5) if lock acquisition fails, whether it is because no more than half of the locks were successfully acquired, or because the total elapsed time exceeds the lock release time, the client will release locks on each master node, even those locks that he believes did not succeed.

Analysis: the RedLock algorithm considers that there are still the following problems

If the node crashes and restarts, multiple clients will hold locks.

Suppose there are five Redis nodes: a, B, C, D, E. Imagine the following sequence of events:

(1) client 1 successfully locked A, B, C, and acquired the lock successfully (but D and E were not locked).

(2) Node C crashes and restarts, but the lock added by client 1 on C is not persisted and is lost.

(3) after node C restarts, client 2 locks C, D, E, and acquires the lock successfully. In this way, client 1 and client 2 acquire the lock (for the same resource) at the same time.

In order to deal with the lock failure caused by node restart, antirez, the author of redis, put forward the concept of delayed restart, that is, after a node crashes, it does not restart immediately, but waits for a period of time before restarting, and the waiting time is longer than the effective time of the lock. In this way, the locks that this node participates in will expire before restarting, and it will not affect the existing locks after restarting. In fact, this is also through artificial compensation measures to reduce the probability of inconsistency.

Time jump problem

(1) suppose there are five Redis nodes: a, B, C, D, E. Imagine the following sequence of events:

(2) client 1 successfully acquired the lock from Redis nodes A, B, C (most nodes). Communication with D and E failed due to network problems.

(3) the clock on node C jumps forward, causing the locks maintained on it to expire quickly.

Client 2 successfully acquired the lock (most nodes) of the same resource from the Redis nodes C, D, E.

Both client 1 and client 2 now think they have a lock.

In order to deal with the lock failure caused by constant jumps, antirez, the author of redis, proposed that artificial modification of the system time should be prohibited and use a ntpd program that does not "jump" to adjust the system clock. This is also through artificial compensation measures to reduce the probability of inconsistencies.

Timeout leads to lock failure

The RedLock algorithm does not solve the problem that the operation of shared resources timed out, resulting in lock failure. Recall the process of the RedLock algorithm, as shown in the following figure

As shown in the figure, we divide it into two parts. For the steps in the block diagram in the top half, the RedLock algorithm can handle the delay regardless of the reason, and the client will not get a lock that it thinks is valid but actually fails. However, for the steps in the lower half of the block diagram, it is possible for client 2 to get the lock if there is a delay that causes the lock to fail. Therefore, the RedLock algorithm does not solve this problem.

(2) zookpeer

Zookpeer in cluster deployment, the number of zookpeer nodes is generally odd and must be equal to 3. Let's first recall how zookpeer writes data.

As shown in the picture, this picture is too lazy to draw and just copy other articles.

So the steps for writing the data flow are as follows

1. Send a write request to Follwer at Client

2.Follwer sends the request to Leader

When 3.Leader receives it, it initiates a vote and informs Follwer to vote.

4.Follwer sends the voting results to Leader, and as long as more than half of them return ACK information, they will be deemed to have passed.

After 5.Leader summarizes the results, if it needs to be written, it starts writing and notifies Leader of the write operation, and then commit

6.Follwer returns the result of the request to Client

Also, zookpeer takes a global serialization operation

OK, now start the analysis.

Cluster synchronization

Client writes data to Follwer, but Follwer is down. Will there be data inconsistency? No way, at this point, client fails to set up the node and can't get the lock at all. Client writes data to Follwer, and Follwer forwards the request to Leader,Leader due to downtime. Will there be any inconsistency? No way, at this point, zookpeer will select a new leader and continue the writing process mentioned above.

In short, if you use zookpeer as a distributed lock, you can't get the lock. Once you get it, the data of the node must be consistent, and there will be no data loss caused by asynchronous synchronization like redis.

Time jump problem

Do not depend on the global time, how can there be such a problem that timeouts lead to lock failure problems that do not depend on valid time? how can there be such a problem?

The third round, the comparison of other features of the lock

(1) the read and write performance of redis is much better than that of zookpeer. If zookpeer is used as a distributed lock in high concurrency scenarios, the lock acquisition will fail, and there will be performance bottlenecks.

(2) zookpeer can implement read-write locks, but not redis.

(3) the watch mechanism of ZooKeeper. When the client tries to create a znode, it is found that it already exists, and the creation fails, then it enters a waiting state. When the znode node is deleted, ZooKeeper notifies it through the watch mechanism, so that it can continue to complete the creation operation (acquisition lock). This allows the distributed lock to be used on the client like a local lock: the lock fails to block until the lock is acquired. This mechanism cannot be realized by redis.

Thank you for reading, the above is the content of "how to realize the distributed lock of zookpeer and redis". After the study of this article, I believe you have a deeper understanding of how to realize the distributed lock of zookpeer and redis, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.