How to implement distributed Lock with Redis 07/13 Update SLTechnology News&Howtos

How to implement distributed Lock with Redis

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "how Redis realizes distributed locks". In daily operation, I believe many people have doubts about how Redis realizes distributed locks. Xiaobian consulted all kinds of materials and sorted out simple and easy operation methods. I hope to help you answer the doubts about "how Redis realizes distributed locks"! Next, please follow the small series to learn together!

Redis Command Introduction

Using Redis to implement distributed locks, there are two important functions to introduce

SETNX command (SET if Not eXists)

Grammar:

SETNX key value

Function:

If and only if the key does not exist, set the value of the key to value and return 1; if the given key already exists, SETNX does nothing and returns 0.

GETSET command

Grammar:

GETSET key value

Function:

Sets the value of the given key to value and returns the old value of the key, an error if the key exists but is not of string type, and nil if the key does not exist.

GET command

Grammar:

GET key

Function:

Returns the string value associated with key, or nil if key does not exist.

DEL command

Grammar:

DEL key [KEY …]

Function:

Delete one or more of the given keys, non-existent keys will be ignored.

Soldiers are valued for their quality, not for their quantity. Distributed locks, we rely on these four commands. However, in the specific implementation, there are still many details that need to be carefully considered, because in distributed concurrent multi-process, any error at any point will lead to deadlock, hold the host process.

locking implementation

SETNX can be locked directly, for example, to lock a certain keyword foo, the client can try

SETNX foo.lock

If it returns 1, it means that the client has acquired the lock and can continue to operate. After the operation is completed, the lock is passed through

DEL foo.lock

Command to release the lock.

If it returns 0, foo is locked by another client. If the lock is not blocked, you can choose to return to the call. If the call is blocked, the following retry loops are required until the lock is successfully acquired or the retry timeout occurs. Ideal is beautiful, reality is cruel. Using SETNX only to lock with race conditions can cause deadlock errors in certain situations.

Deadlock handling

In the above way, if the client side of the lock acquisition takes too long to execute, the process is killed, or because of other abnormal crashes, the lock cannot be released, resulting in deadlock. Therefore, it is necessary to do a timeliness test on the lock. Therefore, when locking, we store the current timestamp as a value in this lock. By comparing the current timestamp with the timestamp in Redis, if the difference exceeds a certain value, it is considered that the lock has expired, preventing the lock from locking indefinitely. However, in a large concurrency situation, if the lock failure is detected at the same time, and the deadlock is simply deleted and locked through SETNX, it may lead to race conditions, that is, multiple clients acquire the lock at the same time.

C1 acquires lock and crashes. C2 and C3 call SETNX lock return 0, get the timestamp of foo.lock, and find that the lock timeout is found by comparing the timestamps.

C2 sends DEL command to foo.lock.

C2 sends SETNX to foo.lock to acquire lock.

C3 sends DEL command to foo.lock. At this time, when C3 sends DEL, DEL actually drops the lock of C2.

C3 sends SETNX to foo.lock to acquire lock.

At this point C2 and C3 both acquire locks, creating race conditions, and more clients may acquire locks if they are at higher concurrency. So, DEL lock operation, can not be used directly in the case of lock timeout, fortunately we have GETSET method, assuming we now have another client C4, see how to use GETSET method, avoid this situation.

C1 acquires lock and crashes. C2 and C3 call SETNX lock returns 0, call GET command to obtain the timestamp T1 of foo.lock, and find that the lock timeout is found by comparing the timestamps.

C4 sends GESET command to foo.lock,

GETSET foo.lock

and get the old timestamp T2 in foo.lock.

If T1=T2, C4 gets the timestamp.

If T1!= T2 indicates that another client C5 obtains the timestamp by calling GETSET before C4, and C4 does not obtain the lock. You can only sleep and enter the next cycle.

The only question now is whether C4 setting foo.lock's new timestamp will have an effect on the lock. In fact, we can see that the difference between the execution time of C4 and C5 is very small, and the valid time written in foo.lock is wrong, so it has no effect on the lock.

To make this lock stronger, the client that acquires the lock should call the GET method again to get T1 when calling the critical business, and compare it with the T0 timestamp written, so that the lock is not accidentally unlocked by DEL due to other circumstances. The above steps and situations are easily seen in other references. Client processing and failure can be very complex, not only because of a crash, but also because the client has been blocked for a long time because of some operation, and then the DEL command is tried to execute (but the lock is in the hands of another client). It may also lead to deadlock due to improper handling. It is also possible that Redis is crushed under large concurrency because sleep settings are unreasonable. The most common questions are

What logic should be followed when GET returns nil?

The first kind of logic goes overtime

The C1 client acquires the lock, and after processing, DEL unlocks it, before DEL locks it. C2 sets the timestamp T0 to foo.lock through SETNX and finds that there is a client acquiring lock, and enters the GET operation.

C2 sends GET command to foo.lock to get the return value T1(nil).

C2 enters the GETSET process by T0>T1+expire comparison.

C2 calls GETSET to send T0 timestamp to foo.lock and returns the original value of foo.lock T2

C2 If T2=T1 is equal, acquire lock, if T2!= T1, lock not acquired.

The second case goes through the loop and goes through setnx logic

C2 sends GET command to foo.lock to get the return value T1(nil).

C2 loop, enter the next SETNX logic

Both logics seem OK, but logically speaking, there is a problem with the first case. When GET returns nil indicating that the lock was deleted, rather than timed out, SETNX logic should be used to lock. The problem with the first case is that the normal locking logic should go SETNX, but now when the lock is released, it goes GETST. If the judgment condition is improper, it will cause deadlock. It is very sad. I encountered it when I was doing it. See the following problems for details.

What should I do when GETSET returns nil?

C1 and C2 clients call GET interface, C1 returns T1, C3 network situation is better, quickly enter to acquire lock, and execute DEL delete lock, C2 returns T2(nil), C1 and C2 both enter timeout processing logic.

C1 sends the GETSET command to foo.lock to get the return value T11(nil).

C1 compares C1 to C11 and finds that the two are different, and processing logic considers that the lock has not been acquired.

C2 sends the GETSET command to foo.lock to get the return value T22(timestamp written by C1).

C2 compares C2 to C22 and finds that the two are different, and processing logic considers that the lock has not been acquired.

At this time, C1 and C2 both think that they have not acquired the lock. In fact, C1 has acquired the lock, but its processing logic does not consider the situation where GETSET returns nil. It simply compares the GET and GETSET values. As for why this happens? One is multi-client, each client connected to Redis, the command issued is not continuous, resulting in a single client to see the command seems to be continuous, to Redis server, between these two commands may have been inserted a large number of commands issued by other clients, such as DEL,SETNX and so on. In the second case, the time between multiple clients is not synchronized, or not strictly synchronized.

Time stamp problem

We see that the value of foo.lock is a timestamp, so to ensure that the lock is valid in the case of multiple clients, it is necessary to synchronize the time of each server, if there is a difference between the time of each server. Clients with inconsistent timing will deviate in determining lock timeout, resulting in race conditions.

The timeout of the lock depends strictly on the timestamp, and the timestamp itself has a precision limit. If our time precision is seconds, from locking to executing the operation to unlocking, the general operation must be completed in one second. In this case, the CASE above us is easy to appear. Therefore, it is best to improve the time accuracy to the millisecond level. This ensures that millisecond locks are secure.

The Problem of Distributed Locks

1: Necessary timeout mechanism: Once the client that obtains the lock crashes, there must be an expiration mechanism, otherwise other clients will not be able to obtain the lock, causing deadlock problems.

2: Distributed locks, multi-client timestamps do not guarantee strict consistency, so under certain factors, there may be lock strings. To moderate the mechanism, can withstand a small probability of event generation.

3: Only lock the key processing nodes. A good habit is to prepare the relevant resources, such as after connecting to the database, call the locking mechanism to obtain the lock, directly operate, and then release, so as to minimize the time of holding the lock.

4: Do you want to CHECK the lock during the lock holding period? If you need to strictly rely on the state of the lock, it is best to do the CHECK mechanism of the lock in the key step, but according to our test, in the case of large concurrency, each CHECK lock operation consumes a few milliseconds, and our entire lock holding logic is less than 10 milliseconds.

5: sleep learning, in order to reduce the pressure on Redis, when trying to acquire locks, sleep operations must be done between loops. But how much sleep time is a door to learn. You need to make reasonable calculations based on your own Redis QPS, plus lock holding processing time, etc.

6: As for why not use Redis muti, expire, watch and other mechanisms, you can check a reference to find the reasons.

lock test data

Sleep not used

First, the lock is retried without sleep. Single request, lock, execute, unlock time

You can see that locking and unlocking times are fast when we use

ab -n1000 -c100 'http://sandbox6.wanke.etao.com/test/test_sequence.php? tbpm=t'

AB concurrent 100 accumulated 1000 requests, when this method is tested.

We'll see that the lock acquisition time becomes 0, the lock holding time becomes 0, and the lock deletion time is approximately 10 ms. Why is that?

1: After holding the lock, our execution logic includes calling the Redis operation again. In the case of large concurrency, Redis execution is obviously slower.

2: Lock deletion time becomes longer, from 0.2ms to 9.8ms, performance drops nearly 50 times.

In this case, our QPS is 49, and it turns out that QPS is related to the total number of pressure measurements. When we have 100 concurrent requests, QPS gets more than 110. When we use sleep

When using Sleep

At a single execution request

We see comparable performance when and without sleep mechanisms. When compressed under the same pressure test conditions

The lock acquisition time is significantly longer, and the lock release time is significantly shorter, only half of the time without the sleep mechanism. Of course, the execution time becomes longer because we recreate the database connection during execution. At the same time, we can compare Redis 'command execution pressure.

The thin and high part in the above figure is the pressure measurement chart when the sleep mechanism is not used, and the short and fat part is the pressure measurement chart when the sleep mechanism is used. Through the above figure, it can be seen that the pressure is reduced by about 50%. Of course, sleep has a disadvantage that the QPS is obviously reduced. Under our pressure measurement conditions, it is only 35, and some requests have timeout. However, after integrating various situations, we decided to adopt the sleep mechanism, mainly to prevent Redis from being crushed in large concurrent situations, which is very bad. We have encountered it before, so we will definitely adopt the sleep mechanism.

At this point, the study of "how Redis implements distributed locks" is over, hoping to solve everyone's doubts. Theory and practice can better match to help everyone learn, go and try it! If you want to continue learning more relevant knowledge, please continue to pay attention to the website, Xiaobian will continue to strive to bring more practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.