In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
Today, the editor will share with you the relevant knowledge points about how to solve the hot key storage problems in Redis. The content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article.
Comparison of the three
Cache penetration, cache breakdown and cache avalanche are all because the data in the cache does not exist, resulting in going to the database to query the data.
Because the cached data does not exist, all requests will go to the database, which will lead to excessive pressure on the database and even service collapse, resulting in the whole system can not be used.
Cache penetration
Definition: cache penetration is due to the fact that the data requested by the client does not exist in the cache, and then query the database, but the database does not have the data to be queried by the client, resulting in a database query operation for each request. The real problem is that the data itself does not exist.
For example, when the client requests product details, it carries a product ID, and the product ID does not exist (whether in the cache or in the database). This causes the data information of the ID product to go to the database every time it is requested.
Harm: because the data corresponding to the requested parameters does not exist at all, it will cause the database to be requested every time, increase the pressure on the database or crash the service, and even affect other business modules. It often occurs in the case of a malicious request from a user.
Solution:
1. Cache a null value according to the requested parameters. And set an expiration time for this value, which can be set for a short time.
2. Use the Bloom filter, first filter through the Bloom filter, query the database if it exists in the filter, and then add it to the cache. If it does not exist, the data returned directly to the client does not exist.
3. Since cache traversal may be a malicious request made by a user, you can record the user's ip and block malicious ip requests.
Scenario analysis:
In the first scenario, an empty value is cached for a key that does not exist. Assuming that there are a lot of such requests, whether a null value cache will be set one by one, then there are a large number of invalid cache null values in the Redis. Assuming that such key is the ID of goods or articles, after we set a null value, if we add data in the background, we should update the cache value corresponding to ID and set a reasonable expiration time.
The second scheme is also the one most widely used in the industry. The advantage of Bloom filter is based on Redis implementation, memory operation and low-level implementation is also very memory-saving. When the data is added successfully at the backend, the ID of the data is added to the Bloom filter, and the frontend goes to the Bloom filter to verify its existence when requesting it. But the Bloom filter also has a drawback, that is, the problem of hash conflicts. What does the hash conflict mean here? That is to say, when multiple ID do hash calculation, the hash bits are all the same value, which leads to misjudgment when verifying the existence of hash bits. There is in itself, but the result is no. One of the drawbacks of the Bloom filter is that it says there is not necessarily there, it says no, there is no.
In the third scheme, a large number of requests are initiated for the same user over a period of time to trigger the cache penetration mechanism, at which time we can display the access of the client. However, attackers cannot completely avoid such attacks if they launch attacks like DDOS, so this scheme is not a good solution.
Summary of the proposal:
First of all, we add the third option at the request level, and do a current limit mechanism and IP blacklist mechanism to control some malicious requests. If it is a misjudgment, we can unblock IP. In the cache layer, it is implemented using the first scheme. Set a reasonable cache time.
For business scenarios that can tolerate misjudgment, you can directly use the second solution. It is completely based on Redis, which reduces the complexity of the system.
Cache breakdown
Definition: cache breakdown is caused by a hotspot key that does not exist, which leads to database query. Increases the pressure on the database. The pressure may be instantaneous or long-lasting. The real problem is that the key exists, but not in the cache, resulting in database operations.
For example: there is a popular item, when users check the details of the product, they bring the ID of the product to get the details of the product. At this time, the data in the cache has expired, so all requests have to go to the database to query.
Harm: compared to cache traversal, this data exists in the database, but because the cache expires, you have to go to the database once, and then when you add it to the cache, the next request can go to the cache normally. The so-called harm is also aimed at the harm at the database level.
Solution:
1. Add mutex. For the first request, it is found that there is no data in the cache, and the query database is added to the cache. In this way, subsequent requests do not need to go through the database query.
2. Increase the expiration time of business logic. When setting up the cache, we can add a cache expiration time. Every time you read it, make a judgment: if the expiration time is less than a range from the current time, trigger a background thread, go to the database to pull the data, and then update the cached data and the cache expiration time. In fact, the principle is to extend the cache duration at the code level.
3. Data preheating. The implementation adds data to the cache through the background. For example, before the second kill scenario starts, the inventory of items is added to the cache, so that when the user request comes, it goes directly to the cache.
4. It will never expire. When you set the expiration time for the cache, make it permanent. A separate thread is opened in the background to maintain the expiration time and data updates of these caches.
Scenario analysis:
Mutexes ensure that only one request goes to the database, which is an advantage. But for the distributed system, it is necessary to use the distributed lock, and the implementation of the distributed lock itself has some difficulties, which increases the complexity of the system.
The second scheme is realized by using the scheme that the Redis does not expire and the business expires. It ensures that every request can get the data, and at the same time, a background thread can update the data. The disadvantage is that the background thread has not finished updating the data, and the data requested at this time is old data, which may have disadvantages corresponding to the business scenarios with high real-time requirements.
The third scenario uses cache preheating to cache each load, which is similar to the second scenario. However, there are also hot data update problems, so this scheme is suitable for data with low real-time requirements.
The fourth scheme, similar to the second and third schemes, is optimized on the basis of which the background asynchronous thread is used to update the cached data actively. The difficulty lies in the frequency control of updates.
Summary of the proposal:
For the data with high real-time requirements, the first scheme is recommended, although it is technically difficult, it can achieve real-time data processing. If some requests wait for a long time, you can return an exception and ask the client to send the request again.
For the data with low real-time requirements, the fourth scheme can be used.
Cache avalanche
Definition: the cache breakdown mentioned earlier is due to the invalidation of a hot key in the cache, resulting in a large number of requests going to the database. However, the same is true of cache avalanches, except that this is more serious, because most of the cache key fails, not one or two key failures.
For example: in an e-commerce system, the commodity data under a certain category is invalid in the cache. However, many of the requests in the current system are commodity data under this category. This causes all requests to go through the database query.
Harm: due to the instant influx of a large number of requests, each request has to go to the database for query. The instant influx of database traffic seriously increases the burden of the database, which can easily lead to database paralysis.
Solution:
1. The cache time is random. Because at a certain time, a large number of caches are invalid, indicating that the expiration time of the cache is more concentrated. We directly set the expiration time to be unfocused and randomly disrupted. In this way, the cache expiration time will not be very concentrated, and there will not be a large number of requests to query from the database at the same time.
2. Multi-level cache. Instead of relying solely on Redis for caching, we can also use memcached for caching (just to give an example, other caching services can also be used). When caching data, do a cache for Redis and a cache for memcached. If Redis fails, we can go memcached.
3. Mutex. In cache breakdown we mentioned the use of mutexes, which can also be used in the case of avalanches.
4. Set the expiration flag. In fact, you can also use the permanent non-expiration mentioned in cache breakdown. When requested, determine the expiration time, and set an expiration flag if the expiration time is approaching, triggering a separate thread to update the cache.
Scenario analysis:
The first scheme uses random number cache time, which can ensure the dispersion of key failure time. The difficulty lies in how to set the cache time, if for some data that need to set a short cache time and a very large amount of data, the scheme needs reasonable control time.
The second scheme uses multi-level caching to ensure that all requests are cached. But this increases the difficulty of the architecture of the system, as well as other problems, such as caching multi-level updates.
The third scheme uses mutexes, and we mentioned mutexes in cache breakdown, which we can use in avalanche scenarios, but this produces a large number of distributed locks.
The fourth scheme uses logical cache time to ensure the cache pressure of the system.
Summary of the proposal:
It would be better to recommend using options 1, 2 and 4 in the actual project.
These are all the contents of the article "how to solve the hot key storage problems in Redis". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.