How to ensure that Redis caches only hotspot data 07/01 Update SLTechnology News&Howtos

How to ensure that Redis caches only hotspot data

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about how to ensure that Redis only caches hot data. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

1 cache avalanche

Cache avalanche simply means that all requests cannot get data from the cache, for example, a large number of data expires at the same time. For scenarios where a large number of data expires at the same time, you can specify a random value within a time range for the expiration time of the data, such as a random value between one day and one day and one hour, but it is not suitable for collection types, such as hash.

There are also decimal scenarios, such as Redis cluster crash caused by peak traffic; redis without persistence configuration, Cluster cluster restart and cluster migration. When the Redis cluster fails, you can first enable the in-memory cache solution, such as Ehcache, and do the current limit and downgrade according to the situation, and finally restart the cluster quickly. You must configure the persistence policy and expand the cluster according to the traffic situation.

2 cache penetration

Cache traversal is simply understood that there are no corresponding records in the database and will never hit the cache. For example, the records in the table only have records with id from 1000 to 100000 and request query with id of 10000000. It is generally a malicious attack, and the best way to deal with this situation is to determine the valid range of id. In other cases, you can cache a null value for the key of the query and set the ttl expiration time.

3 how to ensure that Redis caches hot data

A. set the ttl expiration time for key

It is suitable for business scenarios with low real-time requirements, and for business scenarios where expired data can be tolerated. The expiration time is refreshed each time the key is read or written. To ensure that no junk data is left in the cache, the expiration time of key is generally set, except for those data that will not be changed, will be used all the time, and will not be updated, such as the IP library mentioned in the author's previous article.

B. Select cache elimination strategy

Choosing to eliminate the least recently used cache elimination strategy can ensure that the cache is full of hot data, but this strategy will only work when memory is tight. Generally, it is necessary to ensure that the cached data are hot data, that is, when there is not enough redis memory. It is recommended to clean up the cache data in a timely manner, and the performance will be degraded when relying on the cache elimination strategy.

C. Cache the number of visits and regularly clear the records with fewer visits.

For example, use Sorted Set to cache the number of key reads and periodically delete the key that the number of visits is less than. It is suitable for collection types such as hash to count the number of reads of field. The disadvantage is that there is a performance overhead of counting the number of times per request.

4 how to update cached data

A. use the MQ queue to notify updates when the database changes the record

It is suitable for cache records with few changes, such as user information, and for business scenarios that require data modifications to update the cache in a timely manner, such as some configuration changes that take effect in a timely manner. But it is not suitable for scenarios that require very real-time, such as commodity inventory.

B. Update the cache directly when you modify the database record

This method and the former method can be updated by AOP, the difference is that the former solves the problem of coupling between multiple services and is used to update cross-service data. For cost reasons, small companies do not use separate Redis clusters for each service, which can only be used for data updates within a single service. Even if multiple microservices use the same Redis cluster, do not share the cache by sharing key, otherwise the coupling will be too high and problems will easily occur.

C. Batch updates for scheduled tasks

With ttl, the time setting of ttl is a little longer than the scheduled task cycle to avoid data expiration and new tasks not yet completed. It is suitable for business scenarios where real-time requirements are not very high and a large number of data are updated in a short period of time. For example, the database has 10w data, 70% or 80% of the data will change every 15 minutes, and the change time is only one minute.

If it is a collection type or Hash type, it is usually used with Rename. Only if all the data is successfully written to redis will the old data be atomically replaced. And in the case of a large amount of data, use pipeline to write in batches to avoid batch operations such as hmset. When using collection types such as hash, be sure to consider dirty data.

5 how to deal with the problem of request tilt

Cluster slotting can cause cached data to tilt, resulting in request skewing. Suppose a Cluster cluster with three small masters and slaves distributes slots evenly, and a large number of key falls on the second node, causing requests to be biased towards the second node. The main reason for this problem is that a large number of key are hash, set, sorted sort types, and each collection has a large amount of data. The second is the unreasonable use of HashTag.

The solution is to segment the large hash, reduce the use of HashTag, and redistribute the slots of the second node to the other two nodes according to the actual situation.

6 how to choose the cache data structure in the actual business scenario

Take the advertising industry I am most familiar with and give a few simple examples.

A. to judge whether an advertising bill is out of date

It can be realized by using hash or bitmap. Bitmap is suitable for determining the business needs of true or false. The reading and writing speed of bitmap is better than that of hash, and it takes up less memory. But for other needs, I chose hash. Bitmap is used for other business requirements, such as quickly determining whether the number of daily displays in offer has reached the limit.

B. Count the number of ads pulled from each channel

Both simple key-value and hash support incr self-increment and operate atomicity. In order to reduce the key data in the cache, I chose hash, but also because hash supports hgetall, which is used for real-time statistics and facilitate problem troubleshooting.

C. Limit CAP according to label

Capacity, that is, capacity, for example, limits the number of times an advertisement can be displayed according to tags such as country, city, channel, advertiser, etc. An advertisement may match multiple tags at the same time, and when the minimum Capacity is reached, it is determined to be true. All the tags matched by an advertisement are stored through Sorted Set, and the total number of matching tags is obtained through zcount according to the current display times to determine whether the zcount result is greater than zero.

D. Filter daily repeat ip

If it is used to filter users who click on ads repeatedly in a short period of time, it is just an example. At this point, you can use HyperLogLog to store IP,HyperLogLog, which will filter duplicate data with errors in accuracy, but have little impact on the business.

The above is how to ensure that Redis only caches hot data. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.