Cache theme 07/19 Update SLTechnology News&Howtos

Cache theme

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

(the first law of website performance optimization: give priority to the use of caching to optimize performance)

The basic principle of caching: caching refers to storing data in a storage medium with relatively high access speed for processing by the system.

The essence of caching: caching is an in-memory Hash table. In website applications, the data cache is stored in the in-memory Hash table in the form of a pair of key-value pairs (Key, Value).

By calculating the Hash table index corresponding to the HashCode of the key in the KV pair, you can quickly access the data in the Hash table. (many languages support getting the HashCode of any object, and HashCode can be understood as the unique identifier of the object.)

The HashCode method in Java is contained in the root object Object, and its return value is an Int. Then the index subscript of the Hash table is calculated by Hashcode. The simplest method is the remainder method, which uses the length of the Hash table array to compute the remainder of the Hashcode table, and the remainder is the Hash table index.

Using this index, you can directly access the KV pairs stored in the Hash table, as shown in the following figure (values are for reference only).

The basic concept of caching: caching is to store objects that programs or systems often call in memory so that they can be called quickly when they are used, without having to create new duplicate instances. This can reduce the system overhead and improve the system efficiency.

The role of cache: used to store data with a high read-to-write ratio and little change. Such as the category information of goods, the search list of popular words, popular goods and so on. When the application reads the data, it reads the cache first, and then accesses the database and writes the data to the cache if it cannot be read or the data has been invalidated

Website data access usually follows the law of twenty-eight, and 80% of the access falls on 20% of the data, so caching these 20% of the data by making use of the high-speed access characteristics of Hash table and memory can improve the performance of the system, improve the data reading speed and reduce the pressure of storage access.

Problems you should pay attention to when using caching:

1. Frequently modified data should not be cached. Generally speaking, the read-write ratio of more than 2:1 makes sense to use caching, because if the frequently modified data is cached, the data will become invalid before the application can read it. This will increase the burden on the system.

two。 There are no hot spots to visit. Memory resources are precious and limited, it is impossible to cache all the data, only the newly accessed data and popular data can be cached, and the historical data can be cleared out of the cache.

3. Data inconsistency and dirty reading. Generally speaking, the expiration time of the cached data is set. Once the expiration time is exceeded, it will be reloaded from the database, so it is necessary to tolerate data inconsistencies for a certain period of time.

Cache availability:

Caching is designed to improve data access performance, and the loss of cached data or the unavailability of the cache does not affect the processing of the program-it can be obtained directly from the database. However, with the development of the business, the cache will bear most of the data access pressure, and the database has become accustomed to the days when the cache is unavailable. Once the cache is unavailable (the cache service crashes), the database will go down because it cannot bear such a great pressure. This leads to the unavailability of the entire website, which is called cache avalanche. In practice, cache availability can be improved by means of cache hot backup, and when a cache server goes down, cache access can be switched to a hot backup server, but this violates the original intention of cache. What is more, through the distributed cache server cluster, distributing the cache data to multiple servers in the cluster can improve the availability of the cache to some extent.

Cache prefetch (Warm Up):

The hot spot data is stored in the cache, and the hot spot data is filtered out by the cache system through the LRU (the most recently unused) algorithm, which takes a long time. For the newly started cache system, if there is no data, the performance of the system and the load of the database are not very good in the process of rebuilding the cached data.

Then it is best to load the hot spot data when the cache system starts, and this cache preloading method is called cache preheating. For some metadata, such as city lists and catalog information, you can load all the data in the database into the cache to warm up at startup.

Cache traversal:

If you continuously request some non-existent data concurrently because of inappropriate business or malicious * *, and because the data is not stored in the cache, all requests will fall on the database, which will cause great pressure on the database and even crash. A simple countermeasure is to cache non-existent data (with a value of null).

Caching can be divided into two main categories:

First, through file caching, as the name implies, file caching refers to storing data on disk, whether you are in XML format, serialized file DAT format or other file format

Second, memory cache, that is, to implement a static Map in a class, to add, delete and query the Map.

Distributed cache architecture:

Distributed caching means that caching is deployed in a cluster composed of multiple servers and provides caching services in a cluster manner.

It is structured in two ways:

1. Distributed cache, represented by JBoss Cache, which needs to update synchronization

Second, the distributed cache represented by Memcached, which does not communicate with each other.

JBoss Cache's distributed cache stores the same cached data in all servers in the cluster. When a server is updated by cached data, it will notify other machines in the cluster to update cached data or clear cached data. JBoss Cache usually deploys applications and caches on the same server, and applications can quickly obtain cached data locally, but the problem with this approach is that the amount of cached data is limited by the memory space of a single server, and when the cluster is large, cache update information needs to be synchronized to all machines in the cluster, which is costly, so this approach is more common in enterprise application systems. It is rarely used in large websites.

Large websites generally need a large amount of data to be cached, and may need to count the memory of TB for caching. At this time, another kind of distributed cache-Memcached is needed.

It uses a centralized cache cluster management, also known as a distributed architecture that does not communicate with each other. The cache is deployed separately from the application, and the cache system is deployed on a group of special servers. The application selects the cache server to access the cache data remotely through routing algorithms such as consistent Hash, and the cache servers do not communicate with each other. The scale of the cache cluster can be easily expanded and has good scalability.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.