Introduction of local application caching algorithm and caching strategy 04/22 Update SLTechnology News&Howtos

Introduction of local application caching algorithm and caching strategy

2025-04-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

Special statement: this article is some of the information I searched on the Internet, slightly sorted out, but also hope that you do not misunderstand, specifically out of that I have forgotten. Please don't get me wrong!

Through the design of well-designed data segmentation, prefetching, replacement, update and other algorithms to improve the hit rate and consistency of the cache content.

1) data segmentation

By default, Memcached uses a mechanism called Slab Allocator to allocate and manage memory. Before the advent of this mechanism, memory allocation was done by simply malloc and free all records. However, this approach can lead to memory fragmentation, increase the burden on the operating system memory manager, and, at worst, cause the operating system to be slower than the memcached process itself. Slab Allocator is born to solve this problem.

2) prefetch strategy

Most cache algorithms use prefetching strategy to put part of the disk data into the cache in advance, in order to further reduce the disk Icano and increase the cache hit rate. By recording and analyzing the previous data request patterns to predict the data segments that may be requested in the future, and put the data segments with high access possibility into the cache. VoD VOD has a strong sequential data access mode, that is, without VCR operation, all parts of the movie file are accessed sequentially, and the accessed data is rarely accessed again. This kind of data access mode must be considered when designing VoD caching algorithm. Therefore, sequential prefetching is suitable for VoD system.

3) replacement strategy

Because the data access modes of different systems are different, it is difficult for the same cache policy to achieve satisfactory performance under various data access modes.

Researchers propose different caching strategies to meet different needs. Caching policies can be divided into the following categories:

Based on access time: this kind of algorithm organizes the cache queue according to the access time of each cache item and decides to replace the object. Such as LRU.

Based on access frequency: this kind of algorithm organizes the cache according to the access frequency of cache items. Such as LFU, LRU-2, 2Q, LIRS.

Both access time and frequency: by taking into account both access time and frequency, the cache strategy still has a good performance when the data access mode changes. Such as FBR, LRFU, ALRFU. Most of these algorithms have an adjustable or adaptive parameter, which adjusts the cache strategy to achieve a certain balance based on access time and frequency.

Based on access mode: some applications have clear characteristics of data access, and then produce corresponding caching policies. For example, the SARC cache strategy designed for VoD system adapts to random and sequential access modes at the same time.

Caching strategy based on access time: LRU (LeastRecentlyUsed) is a widely used caching algorithm. The algorithm maintains a queue of cache items, and the cache items in the queue are sorted according to the last access time of each item. When the cache space is full, it will be at the end of the queue, that is, delete the item that has been accessed for the last time and put the new section at the head of the queue.

However, the LRU algorithm only maintains the access time information of the cache block, and does not consider the factors such as the access frequency, so it can not obtain the ideal hit rate in some access modes. For example, for VoD systems, in the absence of VCR operations, data is accessed sequentially from front to back, and data that has been accessed will not be accessed again. Therefore, it is not suitable for VoD system that the LRU algorithm replaces the newly accessed data finally.

Caching policy based on access frequency:

LFU (LeastFrequentlyUsed) sorts the blocks in the cache according to the access frequency of each cache block, and replaces the least frequently accessed item in the cache queue when the cache space is full. Similar to the shortcomings of LRU, LFU only maintains the access frequency information of items. For a cache item, if the item has a very high access frequency in the past and the recent access frequency is low, it is difficult to replace the item from the cache when the cache space is full, which leads to a decrease in the hit rate.

The LRU-2 [2,3] algorithm records the last two visits to each cached page. Replace the item that has been the longest since the penultimate visit when you replace the page.

In the IRM (IndependentReferenceModel) access mode, LRU-2 has the best expected hit rate. Because the LRU-2 algorithm maintains a priority queue, the complexity of the algorithm is logN (N is the number of cache items in the cache queue).

2Q [4] (2 Queues) replaces the priority queue in LRU-2 with LRU queue, reduces the time complexity of the algorithm from logN to constant, and the space needed to maintain the information of cache items is less than that of LRU-2.

LIRS [5] (Low Inter-ReferenceRecency Set) maintains a variable-length LRU queue whose LRU side is the Llirs term that has been accessed at least twice recently (Llirs is an algorithm parameter). LIRS algorithm can achieve a high hit rate in IRM access mode, but the efficiency of SDD access mode is significantly reduced.

For VoD systems, the strategy based on access frequency can capture hot movie clips, so that a large number of requests for hot clips can be avoided from the slow disk IBO. However, movie hotspots change over time, and such policies cannot take advantage of the sequential access mode of VoD, so it is not suitable for VoD systems to replace them purely on the basis of access frequency.

Take into account both access time and frequency:

FBR [6] (Frequency Based Replacement) maintains a LRU queue and divides the queue into three segments: New, Middle and Old. A count value is maintained for each cache item in the queue. When an item in the cache is hit, the hit cache item is moved to the MRU end of the New segment. If the item is originally located in the Old or Middle segment, its count value is increased by 1, and the count value is unchanged if it is originally located in the New segment. When performing a replace operation, delete the item with the lowest count in the Old segment (the LRU side).

LRFU [7] (LeastRecently FrequentlyUsed) maintains a weight C (x) for each cache item, whose initial value is 0, and C (x) varies according to the following formula.

At t moment, when C (x) = 1x 2-λ C (x): X is accessed to 2-λ C (x): X is not accessed for replacement operation, the item with the lowest C (x) value is deleted. At that time, the behavior of the LRFU algorithm was similar to that of the LFU; algorithm, while at that time, the behavior of the algorithm approximated that of the LRU algorithm. The algorithm achieves the balance of time and frequency by selecting the appropriate λ value.

Although LRFU uses a value to take into account both access time and frequency factors, but because the value is fixed, when the access mode changes, the algorithm can not make corresponding adjustments, resulting in performance degradation. ALRFU [8] (Adaptive LRFU) has improved LRFU in this respect. By monitoring the history of the data access pattern, ALRFU dynamically adjusts the value to adapt to the change of the data access pattern, which shows better adaptability than LRFU.

Caching strategy based on access mode:

According to the characteristics of VoD system, AbeliL algorithm is proposed in reference [9]. The algorithm estimates the future access frequency of each cache item by recording the historical access time and the number of visits of each cache item. Using this frequency value as a metric, the item with the lowest value is replaced when cache replacement is made. Because the algorithm takes into account the data access characteristics of VoD system, it is widely used in VoD system.

However, the algorithm generates cache weights by directly calculating the total number and frequency of visits since the generation of cache segments, and does not take into account that the access hotspots of VoD movies will change with time. When the historical access frequency of some cache segments is high but the recent access frequency decreases, it still maintains a large weight, which affects the caching of new hot spots and can not adapt to the dynamic changes of movie hotspots.

SARC [10] proposed by IBM is a caching algorithm for large servers, which can be dynamically adapted to random and sequential data access modes. SARC adapts to two different access modes by dividing random access and sequential access into two queues and managing them separately. By analyzing the simulation data curve of cache size-hit rate, the cost function of replacing random queue and sequential queue items is obtained. When cache replacement is carried out, the queue with low cost is replaced by the cost function of two queues.

4) Update policy

Update strategy is divided into active update and passive update:

Passive update

Set the expiration time of the key so that it expires automatically.

It is the cache data expiration time setting that we usually do. For example, both redis and memcache provide API such as expire to set the expiration time of Kmuri V.

Generally speaking, businesses can tolerate inconsistencies between cached data and real data (such as mysql, hbase, etc.) within a period of time (for example, an hour). Generally speaking, caching can improve access speed and reduce backend load. Then we can set an expiration time for a data. After the data expires, we obtain the data from the real data source, put it back in the cache, and continue to set the expiration time.

Active update

When you update DB, update the cache at the same time. General business is a combination of active update and passive update.

The business requires high consistency of the data, and the cached data needs to be updated immediately after the real data is updated.

Specific approach: for example, you can use a messaging system or other means (such as database triggers, or the listener mechanism of other data sources to do so) to notify cache updates.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.