What is the solution to github cache traversal 09/13 Update SLTechnology News&Howtos

What is the solution to github cache traversal

2025-09-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "what is the solution to github cache penetration". In daily operation, I believe that many people have doubts about the solution to github cache penetration. The editor consulted all kinds of data and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubt of "what is the solution to github cache penetration?" Next, please follow the editor to study!

There are three goals when using caching:

First, speed up user access and improve user experience

Second, reduce the back-end load, reduce potential risks, and ensure the stability of the system.

Third, ensure that the data is updated "as much as possible" in a timely manner

Cause of cache penetration

Cache penetration refers to querying a data that does not exist at all, and neither the cache layer nor the storage layer will hit, but for the sake of fault tolerance, if the data cannot be found from the storage layer, it will not be written to the cache layer.

Cache layer misses

The storage layer misses, so the empty result is not written back to the cache

Returns an empty result

Cache traversal will cause non-existent data to be queried at the storage layer every time, which loses the meaning of cache protection for back-end storage.

The problem of cache penetration may increase the load of backend storage, because many backend storage does not have high concurrency and may even cause backend storage to crash. Usually, the total calls, cache layer hits and storage layer hits can be counted in the program, respectively. If a large number of storage layer hits are found, it may be the problem of cache penetration.

There are basically two things that cause cache penetration:

There is a problem with the business's own code or data.

Some malicious attacks, reptiles, etc., caused a large number of empty hits.

Solution to cache traversal 1) cache empty objects

When the storage layer misses, the empty object is still left in the cache layer, and then accessing this data will be fetched from the cache, protecting the back-end data source.

There are two problems with caching empty objects:

Null values are cached, which means that more keys are stored in the cache layer and more memory space is needed (if it is an attack, the problem is more serious). A more effective way is to set a short expiration time for this kind of data and let it be removed automatically.

The data in the cache layer and the storage layer will be inconsistent for a period of time, which may have a certain impact on the business. For example, if the expiration time is set to 5 minutes, if this data is added to the storage layer, there will be inconsistency between the cache layer and the storage layer data during this period of time. At this time, you can use the message system or other means to clear the empty objects in the cache layer.

2) Bloom filter intercept

Before accessing the cache layer and storage layer, save the existing key in advance with a Bloom filter and do the first layer interception. For example, a personalized recommendation system has 400 million users ID, and every hour the algorithm engineer will put the personalization made by each user's previous historical behavior into the storage layer, but the latest user will have cache penetration behavior because there is no historical behavior, so all users with personalized recommendation data can be made into Bloom filters. If the Bloom filter believes that the user ID does not exist, then the storage tier is not accessed, protecting the storage tier to some extent.

You can use Redis's Bitmaps to implement Bloom filter.

This method is suitable for the application scenarios where the hit of the data is not high and the real-time performance of the data is relatively fixed (usually a large data set). The code maintenance is more complex, but the cache space is less.

Optimization of cache avalanche problem

To prevent and solve the problem of cache avalanche, we can start from the following three aspects.

1) ensure the high availability of cache layer services.

Like airplanes with multiple engines, if the cache layer is designed to be highly available, services can be provided even if individual nodes, individual machines, or even computer rooms are down.

2) rely on the isolation component to limit the current and downgrade the backend.

Both the cache layer and the storage layer have the probability of error, and they can be treated as resources. As a system with a large amount of concurrency, if a resource is not available, it may cause all threads to hang on that resource, resulting in the unavailability of the whole system. Downgrading is very normal in high concurrency systems: for example, in recommendation services, if personalized recommendation services are not available, you can downgrade and supplement hot data, so as not to cause the front-end page to open a skylight.

In a real project, we need to isolate important resources (such as Redis, MySQL, Hbase, external interfaces) so that each resource runs separately in its own thread pool, even if there is a problem with individual resources, there is no impact on other services. But how to manage the thread pool, such as how to close the resource pool, how to open the resource pool, and how to manage the resource pool threshold are still quite complicated. Here we recommend a Java dependency isolation tool Hystrix (https://github.com/Netflix/Hystrix)).

3) rehearse in advance. Before the project goes online, after the cache layer is down, the load of the application and the backend and the problems that may occur, make some pre-plan settings on this basis.

Optimization of cache hotspot key reconstruction

Developers can not only speed up data reading and writing, but also ensure regular data updates by using the strategy of cache + expiration time. This model can basically meet most of the needs. But there are two problems that can be fatal to the application if they occur at the same time:

At present, key is a hot key (for example, a hot entertainment news), and the concurrent volume is very large.

Rebuilding the cache cannot be done in a short time, and it may be a complex calculation, such as complex SQL, multiple IO, multiple dependencies, and so on.

At the moment of cache failure, there are a large number of threads to rebuild the cache, resulting in an increased load on the back end and may even cause the application to crash.

The solution is as follows:

1) Mutex (mutex key)

Only one thread is allowed to rebuild the cache, while the other threads wait for the thread that rebuilds the cache to finish execution and retrieve data from the cache again.

2) never expire. "never expire" has two meanings:

From the cache level, it is true that the expiration time is not set, so there is no problem caused by the expiration of the hotspot key, that is, the "physical" does not expire.

From a functional point of view, set a logical expiration time for each value, and when it is found that the logical expiration time has been exceeded, a separate thread is used to build the cache.

Comparison of schemes:

Mutex (mutex key): this solution is relatively simple, but there are some hidden dangers. If there is a problem or takes a long time to build the cache, there may be the risk of deadlock and thread pool blocking, but this method can better reduce the backend storage load and do better in terms of consistency.

"never expire": this scheme does not actually have a series of hazards caused by hot key because it does not set the real expiration time, but there will be data inconsistencies and code complexity will increase.

At this point, the study on "what is the solution to github cache penetration" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.