Analysis and solution of inconsistent double writes of cache and database in high concurrency scenario 07/03 Update SLTechnology News&Howtos

Analysis and solution of inconsistent double writes of cache and database in high concurrency scenario

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you an analysis and solution to the problem of inconsistent caching and database writing in high concurrency scenarios. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.

Redis is a very important part of the high concurrency and high availability architecture of enterprise systems. Redis mainly solves the problem of low concurrency of relational databases, helps to alleviate the pressure of relational databases in high concurrency scenarios, and improve the throughput of the system (specific Redis is how to improve the performance and throughput of the system, which will be discussed later).

In the actual use of Redis, we will inevitably encounter the problem of data inconsistency between cache and database, which is also a problem that we must consider. If there are students who do not understand this problem, you can move the bench to listen to it.

1. The problem of database + cache double write inconsistency is introduced.

To talk about the problem of database + cache write inconsistency, we need to talk about how this problem occurs. We choose the inventory service which requires high real-time data in the e-commerce system to illustrate this problem.

The inventory may be modified, and the cache data will be updated each time the database is modified; each time the inventory data expires in the cache or is cleaned up, the request for inventory data from the front end will be sent to the inventory service to obtain the corresponding data.

Do you update the redis cache directly when writing to the database? Actually, it's not, because it's not that simple. Here, in fact, it involves a problem, database and cache double write, data inconsistency. Around and combined with the real-time inventory service, the database and cache double write inconsistency and its solutions are shared with you.

Second, various levels of inconsistencies and solutions 1. The most basic cache inconsistency problem and its solution

problem

If the solution is to modify the database first, and then delete the cache, there will be problems. If the deletion of the cache fails, it will result in new data in the database and old data in the cache, resulting in data inconsistencies.

Solution idea

Conversely, delete the cache before modifying the database. The read cache cannot be read, and you get the latest inventory data when you check the database update cache. If the cache is deleted successfully and the database modification fails, then there is still old data in the database and the cache is empty, so the data will not be inconsistent. Because there is no cache when reading, the old data in the database is read and then updated to the cache.

two。 Analysis of complex data inconsistencies

When the inventory data changes, we delete the cache first, and then modify the database.

Imagine that if the operation of modifying the database is not completed at this time, a request comes suddenly, reads the cache, finds that the cache is empty, queries the database, finds the old data before modification, and puts it in the cache.

After the completion of the data change operation, the inventory of the database is modified to the new value, but the cache becomes the old data. So will there still be inconsistencies between the cache and the database at this time?

3. Why does this problem occur when hundreds of millions of traffic is high concurrency?

The above problems can occur only when a data is being read and written concurrently.

In fact, if the concurrency is very low, especially if the read concurrency is very low, with 10,000 visits per day, then in rare cases, there will be inconsistent scenarios like those just described.

But the problem is, after high concurrency, there are a lot of problems. If there are hundreds of millions of traffic every day and tens of thousands of simultaneous reads per second, as long as there is a request for data update per second, the above database + cache inconsistency may occur.

How to solve it?

4. Asynchronous serialization of update and read operations

Here's a solution.

Don't you just check the database and read the old data before updating the database? Isn't it because the reading is in front of the update? Then I'll let you wait in line.

4.1 Asynchronous serialization

I maintain n memory queues within the system, and when updating the data, the operation is routed according to the unique identity of the data, and then sent to the memory queue within one of the jvm (requests for the same data are sent to the same queue). When reading the data, if it is found that the data is not in the cache and there is an operation to update the inventory in the queue, then the operation of re-reading the data and updating the cache will be re-read and updated. After the route is uniquely identified, it will also be sent to the memory queue within the same jvm. Then each queue corresponds to a worker thread, and each worker thread gets the corresponding operation sequentially, and then executes one by one.

In this way, for a data change operation, delete the cache first, and then update the database, but before the update is completed, if a read request comes and reads the empty cache, then the request for cache update can be sent to the queue first. At this time, there will be a backlog in the queue, after the operation of updating the database, and then synchronously wait for the cache update to complete before reading the library.

4.2 read operation to remove repetition

It is meaningless for multiple read library update cache requests to be in the same queue, so it can be filtered. If it is found that there is already a request to update the cache for this data in the queue, then there is no need to put it in it. Just wait for the previous update operation request to be completed, after the worker thread corresponding to that queue has completed the last operation (database modification). The next operation (read the library to update the cache) is performed, where the latest value is read from the database and then written to the cache.

If the request is still within the waiting time range, and continuous polling finds that the value can be obtained, it will be returned directly; if the request waits for more than a certain amount of time, the current old value will be read directly from the database this time. (doesn't returning the old value cause the cache to be inconsistent with the database again? That can at least reduce the occurrence of this situation, because the wait timeout is not always, the probability is very small. What I want to do here is to read the old value directly if it times out, and then just read the library and return without slowing down)

5. In the scenario of high concurrency, the problems that should be paid attention to in this scheme

In high concurrency scenarios, this solution actually has some problems that need to be paid special attention to.

5.1 long-term blocking of read requests

Because the read request is very slightly asynchronized, it is important to pay attention to the read timeout problem, and each read request must be returned within the timeout range.

The biggest risk point of this solution is that frequent data updates lead to a large backlog of update operations in the queue, and then a large number of read requests will time out, resulting in a large number of requests directly going to the database to get the old values. So be sure to pass some real-world tests to see what happens when the data is updated frequently.

In addition, because there may be a backlog of update operations for multiple data items in a queue, you need to test according to your own business situation to determine how many memory queues are created in an instance, and you may need to deploy multiple services, each service apportion some data update operations.

If there is a backlog of inventory modification operations for 100 items in a memory queue, and each inventory modification operation takes 10ms to complete, the read request for the last item may wait for 10 * 100 = 1000ms = 1s before the data can be obtained.

This leads to long-term blocking of the read request.

Be sure to do some stress tests and simulate the online environment according to the operation of the actual business system to see how many update operations the memory queue may squeeze during the busiest time, which may lead to the read request corresponding to the last update operation and how long it will take to hang. If the read request returns at 200ms, and after you calculate, even at the busiest time, there is a backlog of 10 update operations, waiting for 200ms at most, that's fine.

If a memory queue may have a large backlog of update operations, then you need to add machines so that the service instances deployed on each machine handle less data, and the less update operations there will be in each memory queue.

Tips:

In fact, based on previous project experience, generally speaking, the frequency of data writing is very low, so in fact, normally, there should be very little update backlog in the queue.

For projects with high read concurrency and read cache architecture, generally speaking, write requests are very few compared to reads, and it would be nice to have hundreds of QPS per second.

If 500 write operations per second, it can be regarded as 5, 100 write operations per 200ms. For a single machine, if there are 20 memory queues, each memory queue, there may be a backlog of 5 write operations. After the performance test of each write operation, it is usually completed around 20ms.

Then the read request for the data in each memory queue will only be hang for a while at most, and it will definitely be returned within the 200ms.

If write QPS expand 10 times, but after just calculated, know, stand-alone support write QPS hundreds of no problem, then expand the machine, expand the capacity of 10 times the machine, 10 machines, each machine 20 queues, 200 queues.

In most cases, it should go like this: a large number of read requests come and access the data directly. In a small number of cases, you may encounter read and data update conflicts. As mentioned above, if the update operation is queued first, there may be a large number of read requests for this data in an instant, but because of the deduplicated optimization, only an update cache operation follows it.

When the data is updated, the cache update operation triggered by the read request is also completed, and then all the temporarily waiting read requests can read the data in the cache.

5.2 read request concurrency is too high

Stress tests must also be done here to ensure that when the above situation happens, there is another risk, that is, suddenly a large number of read requests will hang the service with a delay of tens of milliseconds to see if the service can withstand and how many machines are needed to withstand the peak of the maximum limit.

However, because not all data is updated at the same time, and the cache will not expire at the same time, each time, that is, the cache of a small number of data may fail, and then the corresponding read requests for those data will come, and the amount of concurrency should not be very large.

Tips:

If write and read requests are calculated at 1:99, there may be only 500 updates for 50, 000 reads of QPS per second.

If there are 500write QPS per second, then it is necessary to calculate that 500pieces of data may be affected by the write operation. After these 500pieces of data are invalidated in the cache, how many read cache requests may be sent to the inventory service to request that the cache be updated.

Generally speaking, at 1:2, there are 1000 read requests per second to read the data of the database being updated, and there will be 1000 request hang on the inventory service. If it is stipulated that each request 200ms will be returned, then the maximum hang time of each read request must be calculated.

The maximum number of read requests on a single machine may be 200 on a single hang at the same time, and the worst is that of hang, hang200 read requests on a single machine, or ok.

But if at 1:20, 500 pieces of data are updated per second, there will be 20 * 500 = 10, 000 read requests corresponding to the 500-second data, all of which are hang on the inventory service, and will be dead.

5.3 request routing for multiservice instance deployment

It is possible that multiple instances of this inventory service are deployed, so you must ensure that requests for data updates and cache updates are routed to the same machine for all read and write requests for the same item. You can do hash routing among services according to the parameters of a request, or you can route to the same service instance through the hash routing function of the nginx server.

5.4 routing problems with hot merchandise, resulting in skewed requests

If the request for reading and writing of a commodity is particularly high, it may cause too much pressure on a certain machine if it is all called into the same queue of the same machine.

However, because the cache is cleared only when the commodity data is updated, and then it will lead to read and write concurrency, so if the update frequency is not too high, the impact of this problem is not particularly great.

But it is possible that some machines will have a higher load.

Generally speaking, if your system does not strictly require cache + database consistency, the cache can be slightly inconsistent with the database occasionally, then it is best not to serialize the above serialization scheme, because read and write requests are serialized and serialized into a memory queue, which ensures that there will be no inconsistencies. However, after serialization, the throughput of the system will be greatly reduced, and you will need several times more machines than normal to support a request online.

The above is the analysis and solution of the inconsistent write of cache and database in the high concurrency scenario shared by the editor. If you happen to have similar doubts, please refer to the above analysis for understanding. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.