Analysis of Redis cache consistency, cache penetration, cache breakdown and cache avalanche 07/09 Update SLTechnology News&Howtos

Analysis of Redis cache consistency, cache penetration, cache breakdown and cache avalanche

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article introduces the knowledge of "Redis cache consistency, cache penetration, cache breakdown and cache avalanche analysis". Many people will encounter this dilemma in the operation of actual cases, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

(1) consistency of cache invalidation

The general way to use the cache is to read the cache first, read it from the DB if it does not exist, and write the result to the cache; the next time the data is read, you can get the data directly from the cache.

The modification of the data is to directly invalidate the cached data, and then modify the DB content to avoid successful DB modification, but the cached data is not cleaned due to network or other problems, resulting in dirty data. However, this still can not avoid the generation of dirty data, in a concurrent scenario: assume that the business has a large number of read and modify requests for data Key:Hello Value:World. Thread A reads Key:Hello to OCS, gets the Not Found result, and starts to request data from DB to get data Key:Hello Value:World. Next, we are going to write this piece of data to OCS, but before writing to OCS (network, CPU, etc., may slow down the processing speed of A thread, etc.) another B thread requests to modify the data Key:Hello Value:OCS, and first performs the invalidation cache action (because the B thread does not know whether there is this data, so it performs the invalidation operation directly), OCS successfully handles the invalidation request. Go back to thread A to continue the write OCS, write the Key:Hello Value:World to the cache, and thread A's task ends; thread B also successfully modifies the DB data content to Key:Hello Value:OCS. To solve this problem, OCS extends the Memcached protocol (which will be supported by the public cloud soon) and adds the deleteAndIncVersion interface. This interface does not actually delete the data, but tags the data to indicate that it has expired status and increases the data version number; if the data does not exist, it is written to NULL, and a random data version number is also generated. OCS write supports atomic comparison version number: if the passed version number is the same as the data version number saved by OCS or the original data does not exist, write is allowed, otherwise modification is refused.

Back to the previous scenario: thread A reads the Key:Hello to OCS, gets the Not Found result, starts to request data from DB, and gets the data Key:Hello Value:World; is going to write this piece of data to OCS. The version number information defaults to 1. Before An is written to OCS, another B thread initiates the action to modify the data Key:Hello Value:OCS, and first performs the action of deleting the cache. OCS successfully processes the deleteAndIncVersion request and generates a random version number 12345 (the convention is greater than 1000). Go back to the A thread to continue writing OCS and request that the Key:Hello Value:World be written, and the cache system finds that the passed version number information does not match (1! = 12345), the write failed and the task of thread An ended; thread B also successfully modified the DB data content to Key:Hello Value:OCS.

At this point, the data in OCS is Key:Hello Value:NULL Version:12345;DB and the data in DB is Key:Hello Value:OCS, and the subsequent read task will try to write the data in DB to OCS again.

(2) consistency between write synchronization of cached data and DB

With the growth of the size and reliability of the website, we will face the deployment of multi-IDC, each IDC has its own independent DB and caching system, at this time, cache consistency has become a prominent problem.

First of all, in order to ensure high efficiency, the cache system will eliminate disk IO, even if the write BINLOG; of course the cache system can only delete synchronously and write asynchronously for performance, then cache synchronization generally takes precedence over DB synchronization (after all, the cache system is much more efficient), then there will be scenarios where there is no data in the cache and there is old data in the DB. At this point, when there is business request data, the cache Not Found is read, and the old data is still read from DB and loaded into the cache. When the DB data synchronization arrives, only the DB is updated, and the cache dirty data cannot be cleared.

As can be seen from the above, the root cause of the inconsistency is that there is no collaborative synchronization between heterogeneous systems, and there is no guarantee that DB data can be synchronized first and cached data after synchronization. So we have to consider how the cache system waits for DB synchronization, or can the two share a synchronization mechanism? Cache synchronization also relies on DB BINLOG is a viable option.

DB in IDC1 is synchronized to DB in IDC2 through BINLOG. IDC2-DB data modification will also generate its own BINLOG, and cached data synchronization can be carried out through IDC2-DB BINLOG. After analyzing the BINLOG, the cache synchronization module invalidates the corresponding cache Key, and changes the synchronization from parallel to serial, which ensures the sequence.

(3) Cache traversal (DB endures unnecessary query traffic)

Method one: it's a Bloom filter. It is a space-efficient probabilistic algorithm and data structure that is used to determine whether an element is in a set (similar to Hashset). Its core is a long binary vector and a series of hash functions. Use Google's guava to implement Bloom filters. 1) there is a miscalculation rate, which increases with the increase in the number of elements deposited. 2) in general, elements cannot be deleted from the Bloom filter. 3) the process of determining the length of the array and the number of hash functions is complicated. 1) Spam address filtering (a large number of addresses) 2) crawler URL address deduplication 3) solve cache breakdown problem

Method 2: store the empty result and set the time of the empty result

(4) cache avalanche (DB flood peak caused by cache setting the same expiration time)

Method 1: most system designers consider locking or queuing to ensure cached single-thread (process) writes, so as to avoid a large number of concurrent requests falling on the underlying storage system when it fails.

Method 2: random value of failure time

(5) Cache breakdown (hotspot Key, small avalanche caused by a large number of concurrent read requests)

When the cache expires at a certain point in time, there are a large number of concurrent requests for the Key. These requests find that the cache expiration will generally load the data from the back-end DB and set it back to the cache. At this time, large concurrent requests may instantly overwhelm the back-end DB.

Method 1: 1. Use mutexes (mutex key) supported by caching to set a mutex key. When the operation returns success, perform the load db operation and set the cache back, that is, load DB will only process it with one thread.

Method 2: use mutex key in advance: set a timeout value (timeout1) inside the value. Timeout1 is smaller than the actual memcache timeout (timeout2). When reading from cache to timeout1 and finds that it has expired, immediately extend the timeout1 and reset it to cache. Then load the data from the database and set it to cache. Increased business code intrusion and increased coding complexity

Method 3: "never expire": from the redis point of view, it is true that the expiration time is not set, which ensures that there will be no hot key expiration, that is, the "physical" does not expire. Functionally, if it doesn't expire, isn't it static? So we store the expiration time in the value corresponding to key. If we find that it is about to expire, we build the cache through an asynchronous thread in the background, that is, the "logic" expires.

(6) the common problems of cache full and data loss in cache system.

According to the specific business analysis, we usually use LRU strategy to deal with overflow, Redis RDB and AOF persistence strategy to ensure data security under certain circumstances.

This is the end of "Redis cache consistency, cache penetration, cache breakdown and cache avalanche analysis". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.