What are the functions of cache penetration, cache breakdown and cache avalanche 04/25 Update SLTechnology News&Howtos

What are the functions of cache penetration, cache breakdown and cache avalanche

2025-04-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly introduces "what are the functions of cache penetration, cache breakdown and cache avalanche". In daily operations, I believe many people have doubts about the functions of cache penetration, cache breakdown and cache avalanche. I have consulted all kinds of materials and sorted out simple and useful methods of operation. I hope it will be helpful to answer the questions about "what are the functions of cache penetration, cache breakdown and cache avalanche"! Next, please follow the editor to study!

Cache penetration

What is cache traversal? It means that when a user is querying a piece of data, and the database and cache do not have any records about the data, and the data is not found in the cache, it will request the database to obtain the data. When it can not get the data, it will always query the database, which will cause a lot of pressure on the database access.

For example: users query an id =-1 commodity information, the general database id value is self-increasing from 1, it is obvious that this information is not in the database, when there is no information returned, it will always query the database, causing great access pressure to the current database.

At this time, we have to think about how to solve this problem. O (╥ ╥) o

In general, we can think of starting from the cache, thinking that if we set an information for the cache if the current database does not exist, cache it as an empty object and return it to the user.

Yes, this is a solution, which is what we often call caching empty objects (the code is easy to maintain, but the effect is not very good).

Redis also provides us with a solution, which is the Bloom filter (code maintenance is complex and works well).

"next, Erha will first explain these two plans:"

Cache empty object

Caching an empty object means that a request is sent. If the relevant information to be queried by the request does not exist in the cache or in the database, the database will return an empty object and associate the empty object with the request in the cache. When the next time the request comes, the cache will hit and return the empty object directly from the cache. This can reduce the pressure of accessing the database and improve the access performance of the current database. Let's take a look at the following process.

At this time, we will ask, if a large number of non-existent requests come, won't the cache cache a lot of empty objects at this time?

That's right! This is also a problem caused by the use of caching empty objects: if it takes a long time, it will lead to a large number of empty objects in the cache, which will not only take up a lot of memory space, but also waste a lot of resources! Is there any solution to this? Let's think about it: can we clean up these objects soon after a period of time?

Mm-hmm, yes! Think about whether Redis provides us with commands about expiration time (^ ▽ ^), so that we can set the time of empty objects and set an expiration time by the way, which can solve a problem!

Setex key seconds valule: set the key value pair and specify the expiration time (s)

You can call the API operation directly in Java:

RedisCache.put (Integer.toString (id), null, 60) / / Expiration time is 60s

Bloom filter

Bloom filter is a probability-based data structure, which is mainly used to determine whether an element is in the set or not. We can also simply understand that it is an imprecise set structure (set has the effect of removing weight). But there is a small problem: when you use its contains method to determine whether an object exists, it may misjudge. In other words, the Bloom filter is not particularly imprecise, but as long as the parameters are set properly, its accuracy can be controlled relatively accurately, and there will only be a small probability of misjudgment (this is acceptable). When a Bloom filter says that a value exists, the value may not exist; when it says it does not exist, it certainly does not exist.

"here is a typical example from Qian Da:"

For example, when it says it doesn't know you, it certainly doesn't; when it says it has seen you, it may not have met you at all, but because your face is similar to one of the people it knows (the combination of the coefficients of some familiar faces), so misjudged to have seen you before. In the above usage scenario, the Bloom filter can accurately filter out those who have already seen the content, those who have not seen the new content, it will also filter out a very small part (misjudgment), but it can accurately identify most of the new content. In this way, you can fully guarantee that the content recommended to users is non-repetitive.

"after all this time, what are the characteristics of the Bloom filter?"

A very large binary array (there are only 0 and 1 in the array) has several hash functions (Hash Function) that are very efficient in space and query. Bloom filters do not provide deletion methods, so it is difficult to maintain the code.

Each Bloom filter corresponds to the Redis data structure with a large array of digits and several different unbiased hash functions. The so-called unbiased is the ability to calculate the hash value of the element more evenly.

When adding key to the Bloom filter, multiple hash functions are used to hash the key to get an integer index value and then modulo the length of the bit array to get a position, and each hash function calculates a different position. Then set all these positions of the bit array to 1 to complete the add operation. (each key is mapped to a large array of bits through several hash functions, and when the mapping is successful, the corresponding position on the array is changed to 1. )

"then why is there a misjudgment rate in the Bloom filter? "

As shown in the following figure:

When the position on the key1 and key2 mapping array is 1, suppose there is a key3 to query whether it is inside, and the corresponding position of key3 is also mapped to it, then the Bloom filter will think that it exists, which will lead to misjudgment (because key3 is not there).

O (∩ _ ∩) O , at this time you will ask: how to improve the accuracy of Bloom filter?

"to improve the accuracy of the Bloom filter, it is necessary to talk about three important factors that affect it:"

Hash function good or bad storage space size hash function number

The design of hash function is also a very important problem, for a good hash function can greatly reduce the misjudgment rate of Bloom filter.

(it's as if the reason why excellent accessories can run so smoothly lies in their proper internal design. )

At the same time, for a Bloom filter, if its bit array is larger, the location of each key mapped by the hash function will become much more sparse and less compact, which will help to improve the accuracy of the Bloom filter. At the same time, for a Bloom filter, if key is mapped by many hash functions, then there will be many locations marked on the in-place array, so that when the user queries, when looking through the Bloom filter, the false positive rate will be reduced accordingly.

For its internal principle, interested students can take a look at the mathematical knowledge of Bloom filtering, which includes its design algorithm and mathematical knowledge. (it's actually quite simple ~)

Cache breakdown

Cache breakdown means that a certain key is often queried and is often specially cared for by users, who love it (^ ▽ ^) very much, just like "regular customers" or a key that is often not accessed. But at this time, if the key expires when the cache expires or when it is an unpopular key, there are suddenly a large number of access requests about the key, which will cause large concurrent requests to directly penetrate the cache, request the database, and instantly increase the pressure on the database access.

"to sum up: there are two reasons for cache breakdown. "

(1) A "unpopular" key was suddenly requested by a large number of users.

(2) A "hot" key that happens to expire in the cache, when a large number of users come to access it.

For the problem of cache breakdown: our common solution is locking. When the key expires, add a lock when the key wants to query the database, then only let the first request query the database, and then store the values queried from the database into the cache. For the rest of the same key, you can get it directly from the cache.

If we are in a stand-alone environment: we can directly use commonly used locks (such as Lock, Synchronized, etc.), we can use distributed locks in a distributed environment, such as database-based, Redis-based or zookeeper-based distributed locks.

Cache avalanche

Cache avalanche means that the cache set expires within a certain period of time. If there are a large number of requests during this period and the amount of data queried is huge, all requests will reach the storage layer, and the call volume of the storage layer will soar, causing excessive pressure on the database and even downtime.

"reason:"

Redis suddenly went down, most of the data failed.

"take an example to understand:"

For example, we have almost all experienced a shopping carnival, suppose the merchant holds a 23:00-24:00 promotion of goods with broken bones. When designing the program, the little brother put the broken goods in the cache at 23:00 and set the expiration time to 1 hour through redis's expire. During this period, many users access these goods information, purchase, and so on. But just to 24:00, there happen to be many users accessing these goods, at this time access to these goods will fall on the database, resulting in the database to withstand tremendous pressure, a little carelessness will lead to direct database downtime (over).

"when the goods do not fail, it is like this:"

When caching GG (invalidation), it looks like this:

"there are the following solutions for cache avalanches:"

"(1) redis High availability"

Redis may hang up and add several more redis instances (one master and multiple slaves or multiple masters and multiple slaves), so that the others can continue to work after one server is dead. In fact, it is a cluster built.

"(2) current restriction and demotion"

After the cache expires, the number of threads reading the database write cache is controlled by locking or queuing. For a key, only one thread is allowed to query data and write cache, while other threads wait.

"(3) data preheating"

Data heating means that before formal deployment, I access the possible data in advance, so that some of the data that may be accessed in large quantities will be loaded into the cache. Manually trigger the loading of different caches of key before large concurrent access is about to occur.

"(4) different expiration times"

Set different expiration times to make the cache expiration time as uniform as possible.

At this point, the study on "what are the functions of cache penetration, cache breakdown and cache avalanche" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.