Talking about several ways of using distributed Lock (redis, zookeeper, Database) 07/01 Update SLTechnology News&Howtos

Talking about several ways of using distributed Lock (redis, zookeeper, Database)

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Q: a business server, a database, operation: query the user's current balance, deduct 3% of the current balance as a handling fee

Synchronizedlockdblock

Q: two business servers, one database, operation: query the user's current balance, deduct 3% of the current balance as a handling fee

Distributed lock

What kind of distributed locks do we need?

It can be guaranteed that in a distributed application cluster, the same method can only be executed by one thread on one machine at a time. This lock is a reentrant lock (avoid deadlock) this lock had better be a blocking lock (consider whether or not to want this lock according to business needs) this lock had better be a fair lock (consider this lock according to business needs) have high availability lock and release lock function to acquire and release lock performance is better

1. Distributed lock based on database

Distributed Lock based on Table

CREATE TABLE `methodLock` (`id` int (11) NOT NULL AUTO_INCREMENT COMMENT 'primary key', `method_ name` varchar (64) NOT NULL DEFAULT''COMMENT' locked method name', `lock`varchar (1024) NOT NULL DEFAULT 'remarks', `update_ time`timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'save data time, automatic generation', PRIMARY KEY (`id`), UNIQUE KEY `uidx_method_ name` (`method_name `) method in ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT=' locking'

When we want to lock a method, execute the following SQL:

Insert into methodLock (method_name,desc) values ('method_name','desc')

Because we have made a uniqueness constraint on method_name, if there are multiple requests submitted to the database at the same time, the database will guarantee that only one operation can succeed, then we can assume that the successful thread has acquired the lock of the method and can execute the content of the method body.

After the method is executed, you need to execute the following Sql if you want to release the lock:

Delete from methodLock where method_name = 'method_name'

The above simple implementation has the following problems:

This lock strongly depends on the availability of the database, the database is a single point, once the database is down, it will cause the business system to be unavailable. There is no expiration time for this lock, and once the unlock operation fails, it will cause the lock record to stay in the database all the time, and other threads can no longer obtain the lock. This lock can only be non-blocking, because the insert operation of the data will directly report an error if the insertion fails. The thread that did not acquire the lock will not enter the queue, and the lock acquisition operation will be triggered again in order to acquire the lock again. The lock is non-reentrant, and the same thread cannot acquire the lock again until it is released. Because the data already exists. This lock is an unfair lock, and all threads waiting for the lock compete for the lock by luck.

Of course, there are other ways to solve the above problems.

Is the database a single point? Make two databases and synchronize the data in both directions before. Once you hang up, quickly switch to the standby library. No expiration time? As long as you do a scheduled task, clean up the timeout data in the database at regular intervals. Non-blocking? Create a while loop and return success until insert succeeds. Non-reentrant? Add a field to the database table to record the host information and thread information of the machine that currently acquired the lock, then the next time you acquire the lock, query the database first. If the host information and thread information of the current machine can be found in the database, just assign the lock to him. Unfair? Create an intermediate table to record all the threads waiting for the lock and sort it according to the creation time. Only the first one is allowed to acquire the lock.

Distributed lock based on exclusive lock

In addition to adding and deleting records in the data table, you can also use the locks in the data to achieve distributed locks.

We also use the database table we just created. Distributed locks can be implemented through exclusive locks in the database. The MySql-based InnoDB engine can use the following methods to implement locking operations:

Public boolean lock () {connection.setAutoCommit (false); while (true) {try {result= select * from methodLock where method_name=xxx for update; if (result==null) {return true;}} catch (Exception e) {} sleep (1000);} return false;}

Add for update after the query statement, and the database adds an exclusive lock to the database table during the query. When an exclusive lock is added to a record, other threads can no longer add an exclusive lock to that record.

We can assume that the thread that acquires the exclusive lock can acquire the distributed lock. After acquiring the lock, we can execute the business logic of the method, and then unlock it by the following methods:

Public void unlock () {connection.commit ();}

Release the lock through the connection.commit (); operation.

This method can effectively solve the problems mentioned above that can not release locks and block locks.

Blocking lock? The for update statement returns immediately after a successful execution and remains blocked when the execution fails until it succeeds.

After locking, the service goes down and cannot be released? In this way, the database releases the lock itself after the service goes down.

However, the problems of single point of database, reentrant and fair lock can not be solved directly.

Summarize the ways of using the database to realize the distributed lock, both of which rely on a table of the database, one is to determine whether there is a lock by the existence of the records in the table, and the other is to realize the distributed lock through the exclusive lock of the database.

The advantages of distributed Lock based on Database

Directly with the help of the database, easy to understand.

The disadvantage of realizing distributed Lock in Database

There will be a variety of problems, which will make the whole solution more and more complex in the process of solving the problem.

Operating the database requires a certain amount of overhead, and performance issues need to be considered.

Second, distributed lock based on cache

Compared with the scheme of distributed lock based on database, the implementation based on cache will perform better in terms of performance.

There are many mature cache products, including Redis,memcached and so on. This paper takes Redis as an example to analyze the scheme of using cache to realize distributed lock.

There are many related articles about the implementation of distributed lock based on Redis on the Internet, among which the main implementation method is to use Jedis.setNX method to achieve.

Public boolean trylock (String key) {ResultCode code = jedis.setNX (key, "This is a Lock."); if (ResultCode.SUCCESS.equals (code)) return true; else return false;} public boolean unlock (String key) {ldbTairManager.invalid (NAMESPACE, key);}

There are also several problems with the above implementation:

1. Single point problem.

2. There is no expiration time for this lock. Once the unlocking operation fails, the lock record will always be in the redis, and other threads will no longer be able to obtain the lock.

3. This lock can only be non-blocking and will be returned directly regardless of success or failure.

4. The lock is non-reentrant. After a thread acquires the lock, it cannot acquire the lock again until it is released, because the key used already exists in the redis. The setNX operation can no longer be performed.

5. This lock is unfair. All waiting threads initiate setNX operations at the same time, and lucky threads can acquire the lock.

Of course, there are also ways to solve it.

Now the mainstream caching services support cluster deployment, through clustering to solve the single point problem. No expiration time? The setExpire method of redis supports passing in the expiration time, and the data will be deleted automatically after the arrival time. Non-blocking? While repeated execution. Non-reentrant? After a thread acquires the lock, it saves the current host information and thread information, and checks whether it is the owner of the current lock before getting it next time. It's not fair? Put all waiting threads in a queue before the thread acquires the lock, and then acquire the lock according to the first-in-first-out principle.

The synchronization strategy of the redis cluster takes time. It is possible that the A thread gets the lock after the setNX is successful, but this value has not been updated to the server where the B thread executes setNX, which will cause concurrency problems.

Salvatore Sanfilippo, the author of redis, proposed the Redlock algorithm, which implements more secure and reliable distributed lock management (DLM) than a single node.

The Redlock algorithm assumes that there are N redis nodes, which are independent of each other and are generally set to Niss5. These N nodes run on different machines to keep the physical level independent.

The steps of the algorithm are as follows:

1. The client gets the current time in milliseconds.

2. The client tries to acquire the lock of N nodes (each node acquires the lock in the same way as the cache lock mentioned earlier), and N nodes acquire the lock with the same key and value. The client needs to set the API access timeout, and the API timeout needs to be much less than the lock timeout. For example, if the lock is automatically released for 10 seconds, the API timeout should be set to 5-50ms. In this way, after the downtime of a redis node, it can time out as soon as possible when accessing the node, and reduce the normal use of locks.

3. The client calculates how much time it takes to acquire the lock by using the current time minus the time acquired in step 1. Only the client acquires the lock of more than 3 nodes, and the time to acquire the lock is less than the timeout time of the lock. The client gets the distributed lock.

4. The lock time acquired by the client is the lock timeout time set minus the acquisition time calculated in step 3.

5. If the client fails to acquire the lock, the client will delete all locks in turn.

Using Redlock algorithm, we can ensure that the distributed lock service can still work when up to two nodes are down, which greatly improves the availability compared with the previous database lock and cache lock. Because of the efficient performance of redis, the performance of distributed cache lock is not worse than that of database lock. However, a distributed expert wrote an article called "How to do distributed locking" that questioned the correctness of Redlock.

The expert mentioned that there are two aspects to consider when considering distributed locks: performance and correctness.

If you use high-performance distributed locks, in scenarios that do not require high correctness, then using cache locks is sufficient.

If you use distributed locks with high reliability, then you need to consider strict reliability issues. However, Redlock is not correct. Why doesn't it match? The expert listed several aspects.

Nowadays, many virtual machines used by programming languages have GC function. During Full GC, the program will stop to deal with GC. Sometimes Full GC takes a long time, even the program has a few minutes of stutter. The article enumerates the example of HBase. HBase sometimes GC for a few minutes, which can lead to lease timeout. And when the Full GC arrives, the program cannot control it, and the program may stop to process the GC at any time. For example, in the following figure, when client 1 acquires the lock and is preparing to deal with shared resources, Full GC occurs until the lock expires. In this way, client 2 acquires the lock again and begins to process the shared resource. When client 2 processes, client 1 Full GC completes and begins to process shared resources, so there is a situation where both clients are dealing with shared resources.

Experts give a solution, such as the following figure, which looks like MVCC. Putting token,token on the lock is the concept of version. Every time the lock is completed, token will be added by 1. When dealing with shared resources, bring token. Only the specified version of token can handle shared resources.

Then the expert also said that the algorithm depends on local time, and redis relies on the gettimeofday method to obtain time instead of monotonic clock when dealing with key expiration, which can also lead to inaccurate time. For example, in the scenario, two clients, client 1 and client 2, have 5 redis nodes nodes (A, B, C, D and E).

1. Client 1 successfully acquired locks from A, B and C, and timed out the lock network from D and E.

2. The clock of node C is inaccurate, resulting in lock timeout.

3. Client 2 successfully acquired locks from C, D and E, and timed out the lock network from An and B.

4. This way both client 1 and client 2 acquire the lock.

Summarize two points from experts about the unavailability of Redlock:

1. GC and other scenarios may occur at any time, resulting in the acquisition of locks on the client, and timeouts in processing, causing other clients to acquire the lock. Experts also give a solution to the use of self-increasing token.

2. The algorithm depends on the local time, and the clock will be inaccurate, resulting in two clients getting the lock at the same time.

Therefore, experts have concluded that Redlock can only work properly in bounded network delays, bounded program interruptions, and bounded clock error ranges, but the boundaries of these three scenarios cannot be confirmed, so experts do not recommend using Redlock. For scenarios that require high correctness, experts recommend Zookeeper, which will be discussed later on using Zookeeper as a distributed lock.

Redis author's response

When the redis author saw the expert's article, he wrote a blog to respond. The author politely thanked the expert and then expressed his disagreement with the expert's point of view.

I asked for an analysis in the original Redlock specification here: http://redis.io/topics/distlock. So thank you Martin. However I don't agree with the analysis.

The redis author's idea of using token to solve the lock timeout problem can be summarized as follows:

View 1, the use of distributed locks is generally in, you have no other way to control shared resources, experts use token to ensure the processing of shared resources, then there is no need for distributed locks.

Viewpoint 2, for the generation of token, in order to ensure the reliability of the token obtained by different clients, the service of generating token still needs distributed locks to ensure the reliability of the service.

Viewpoint 3, as for the self-increasing token approach that experts say, the redis author thinks that it is completely unnecessary. Each client can generate a unique uuid as a token, setting the shared resource to a state that only the client of that uuid can handle, so that other clients cannot process the shared resource until the client that has acquired the lock releases the lock.

Viewpoint 4 the author believes that the ordering of token does not solve the GC problem raised by experts. As shown in the figure above, if the client of token 34 sends GC during the writing process, resulting in a lock timeout, another client may acquire the lock of token 35 and start writing again, resulting in lock conflicts. So the order of token cannot be combined with shared resources.

Viewpoint 5 the author believes that in most scenarios, distributed locks are used to deal with update problems in non-transactional scenarios. What the author means is that some scenarios are difficult to handle shared resources with token, so you have to rely on locks to lock resources and handle them.

The author of redis also explains another clock problem mentioned by experts. The time the client actually acquires the lock is the default timeout time, minus the time it takes to acquire the lock. If it takes too long to acquire the lock and causes the default timeout of the lock to be exceeded, the client cannot acquire the lock at this time. There will be no examples proposed by experts.

Personal feeling

The first problem I summarize is that after a client acquires a distributed lock, the lock timeout may be released during the client's processing. In addition to non-force majeure such as GC, it is also possible that the program flow is not finished. Previously, I mentioned that the timeout of the database lock setting is 2 minutes. If a task takes more than 2 minutes for an order lock, then another trading center can obtain the order lock, so that both trading centers can process the same order at the same time. Normally, the task is processed in seconds, of course, but sometimes the timeout set by a rpc request is too long, and there are multiple such timeout requests in a task, then the automatic unlocking time is likely to be exceeded. At the beginning, our transaction module was written in C++, and there was no GC. If we use java to write, there may also be Full GC in the middle. After the lock timeout is unlocked, our client cannot perceive it, which is a very serious thing. I don't think this is the problem of the lock itself. Any distributed lock mentioned above will have such a problem as long as it has its own timeout release feature. If you use the lock timeout feature, then the client must set the acquisition lock timeout and take appropriate action instead of continuing to process shared resources. Redlock's algorithm, after the client acquires the lock, will return the lock time that the client can occupy, and the client must deal with this time to make the task stop after that time.

The second problem, of course, is that distributed experts do not understand Redlock. A key feature of Redlock is that the time it takes to acquire the lock is the total time it takes for the lock to time out by default minus the time it takes to acquire the lock, so the client processing time is a relative time, regardless of the local time.

From this point of view, the correctness of Redlock can be well guaranteed. A careful analysis of Redlock shows that the most important feature provided by a node's redis,Redlock is higher reliability, which is an important feature in some scenarios. But I think Redlock costs too much to achieve reliability.

You must first deploy five nodes to make Redlock more reliable.

Then you need to request five nodes to get the lock. Through Future, you can request concurrently to five nodes first, and then get the response results together, which can shorten the response time, but it still takes more time than single-node redis lock.

Then, since more than three of the five nodes must be acquired, there may be acquisition lock conflicts, that is, everyone gets 1-2 locks, as a result, no one can get the lock. The redis author draws lessons from the essence of the raft algorithm. By starting at random time after the conflict, the conflict time can be greatly reduced, but this problem can not be well avoided, especially when the lock is acquired for the first time. So the time cost of acquiring the lock increases.

If two of the five nodes are down, the availability of the lock will be greatly reduced. First of all, you must wait for the result of the two down nodes to time out before returning, and there are only three nodes. The client must acquire the locks of all three nodes in order to have the lock, which makes it more difficult.

If a network partition occurs, it is possible that the client will never be able to acquire the lock.

After analyzing so many reasons, I think the most important thing about Redlock is that Redlock needs the client to ensure the consistency of writes. The back-end five nodes are completely independent, and all clients have to operate these five nodes. If five nodes have a leader, as long as the client acquires the lock from the leader, the other nodes can synchronize the data of the leader. In this way, there will be no problems such as partitioning, timeout, conflicts, and so on. Therefore, in order to ensure the correctness of distributed locks, I think the use of highly consistent distributed coordination services can better solve the problem.

Here comes the question again. How long should I set the expiration time? How to set the failure time is too short, the method is not finished, the lock is automatically released, then concurrency problems will arise. If you set it for too long, other threads that acquire the lock may have to wait a little longer.

The problem of using databases to implement distributed locks also exists.

The current mainstream approach to this problem is to set a very short timeout each time a lock is acquired, while a thread refreshes the timeout of the lock each time it is about to time out. End the thread while releasing the lock. For example, redisson, the official distributed lock component of redis, uses this scheme.

Advantages of using caching to implement distributed locks

The performance is good.

Disadvantages of using caching to implement distributed locks

The implementation is too responsible and there are too many factors to consider.

Distributed Lock based on Zookeeper

Distributed locking based on zookeeper temporary ordered nodes.

The general idea is that when each client locks a method, a unique instantaneous ordered node is generated under the directory of the specified node corresponding to the method on the zookeeper. The way to determine whether or not to acquire a lock is simple, just to judge the one with the lowest sequence number in the ordered node. When the lock is released, you only need to delete the instantaneous node. At the same time, it can avoid the deadlock problem caused by the unreleased lock caused by service downtime.

Let's see if Zookeeper can solve the problem mentioned earlier.

The lock can't be released? Using Zookeeper can effectively solve the problem that the lock cannot be released, because when the lock is created, the client will create a temporary node in the ZK. Once the client suddenly dies after acquiring the lock (the Session connection is broken), then the temporary node will be deleted automatically. Other clients can acquire the lock again. Non-blocking lock? The blocking lock can be implemented using Zookeeper. The client can create a sequential node in ZK and bind a listener on the node. If the node changes, Zookeeper will notify the client. The client can check whether the node created by itself has the lowest sequence number of all the nodes. If so, it can acquire the lock and execute the business logic. Non-reentrant? Using Zookeeper can also effectively solve the problem of non-reentrant. When creating a node, the client writes the host information and thread information of the current client directly to the node, and the next time you want to acquire the lock, compare it with the data in the smallest node. If the information is the same as your own, then you get the lock directly, and if it is different, create a temporary sequential node to participate in the queue. Single point problem? Using Zookeeper can effectively solve the single point problem. ZK is deployed in a cluster. As long as more than half of the machines in the cluster survive, it can provide services. A question of fairness? The problem of fair locking can be solved by using Zookeeper. The temporary nodes created by the client in ZK are orderly. Every time the lock is released, ZK can notify the smallest node to acquire the lock, which ensures fairness.

Here comes the question again. We know that Zookeeper requires cluster deployment. Will there be data synchronization problems like Redis clusters?

Zookeeper is a distributed component that ensures weak consistency, that is, ultimate consistency.

Zookeeper uses a data synchronization protocol called Quorum Based Protocol. If the Zookeeper cluster has N Zookeeper servers (N usually takes an odd number, 3 can satisfy data reliability and have high read and write performance, and 5 have the best balance in terms of data reliability and read and write performance), then a write operation of the user is first synchronized to the NCMUP 2 + 1 server, and then returned to the user to prompt the user to write successfully. The data synchronization protocol based on Quorum Based Protocol determines what strength of consistency Zookeeper can support.

In the distributed environment, the data storage that meets the strong consistency basically does not exist, it requires that when updating the data of one node, all nodes need to be updated synchronously. This synchronization strategy occurs in a master-slave database that is replicated synchronously. However, the impact of this synchronization strategy on write performance is too great to be seen in practice. Because Zookeeper is synchronized to write 2 nodes, and 2 nodes are not updated synchronously, Zookeeper is not strongly consistent.

The user's data update operation does not guarantee that the updated value can be read by subsequent read operations, but it will eventually show consistency. At the expense of consistency, it is not completely regardless of the consistency of the data, otherwise the data is chaotic, then there is no value no matter how high the availability of the system is distributed. Sacrifice consistency, but no longer require strong consistency in relational databases, but as long as the system can achieve the final consistency.

Whether Zookeeper satisfies causal consistency depends on how the client is programmed.

Process A writes a data to Zookeeper / z and successfully returns process A to notify process B that A has modified / z data B reads Zookeeper / z data because the Zookeeper server connected by B may not have been updated by A write data, then B will not be able to read the data written by A

The practice of satisfying causal consistency

Process B listens for the data change of / z on Zookeeper. Process A writes a data to / z of Zookeeper. Before returning successfully, Zookeeper needs to call the listener registered on / z. Leader tells the BB process that the event response method of the BB process gets a response to take the changed data. Then B must be able to get the value of the change. The causal consistency here withdraws the causal consistency between Leader and B. That is, Leader notified that the data had changed.

The second kind of event listening mechanism is also the method that should be used to program Zookeeper correctly, so Zookeeper should be consistent with cause and effect.

Therefore, when we implement a distributed lock based on Zookeeper, we should use the method that satisfies causal consistency, that is, the thread waiting for the lock listens to the change of the lock on the Zookeeper, and when the lock is released, Zookeeper will notify the waiting thread that meets the fair lock condition.

You can directly use the zookeeper third-party library client, which encapsulates a reentrant lock service.

Public boolean tryLock (long timeout, TimeUnit unit) throws InterruptedException {try {return interProcessMutex.acquire (timeout, unit);} catch (Exception e) {e.printStackTrace ();} return true;} public boolean unlock () {try {interProcessMutex.release ();} catch (Throwable e) {log.error (e.getMessage (), e) } finally {executorService.schedule (new Cleaner (client, path), delayTimeForClean, TimeUnit.MILLISECONDS);} return true;}

Distributed locks implemented using ZK seem to meet all the expectations we had for a distributed lock at the beginning of this article. However, it is not. The drawback of distributed locks implemented by Zookeeper is that the performance may not be as high as that of caching services. Because every time in the process of creating and releasing locks, we have to dynamically create and destroy instantaneous nodes to achieve the lock function. The creation and deletion of nodes in ZK can only be performed through the Leader server, and then the data is not available on all Follower machines.

Advantages of using Zookeeper to implement distributed locks

Effectively solve the single point problem, non-reentrant problem, non-blocking problem and lock can not be released. It is relatively simple to implement.

Disadvantages of using Zookeeper to implement distributed locks

In terms of performance, it is better to use caching to implement distributed locks. You need to understand the principles of ZK.

The comparison of the three schemes is based on the degree of difficulty of understanding (from low to high).

Database > cache > Zookeeper

From the perspective of implementation complexity (from low to high)

Zookeeper > cache > database

From a performance perspective (from high to low)

Cache > Zookeeper > = database

From the perspective of reliability (from high to low)

Zookeeper > Cache > Database\

The above is the whole content of this article, I hope it will be helpful to your study, and I also hope that you will support it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.