Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Cache penetration problem

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

I. Cache penetration (request data cache misses):

Cache penetration refers to querying a certain non-existent data, because the cache misses, and for fault tolerance considerations, if the data is not found from the storage layer, it will not be written to the cache, which will cause the non-existent data to be queried to the storage layer every time, losing the meaning of caching.

For example: the following figure is a typical cache-storage architecture, cache(such as memcache, redis, etc.) + storage(such as mysql, hbase, etc.) architecture, look for a value that does not exist at all, if not compatible, always query storage.

II. Hazard:

There is too much pressure on the underlying data sources (mysql, hbase, http interfaces, rpc calls, etc.), some of which do not have high concurrency. download

For example, mysql is generally good enough for a single machine to carry 1000-QPS (don't say that your queries are select * from table where id=xx and how awesome your machine is, it's a bit pretentious)

For example, someone else provides a very poor http interface, and penetration may break his service.

III. How to discover:

We can record cache hits, storage hits, and total calls separately, and if we find a lot of empty hits (cache,storage do not hit), we may have a cache penetration problem. download

Note: The cache hit rate itself (e.g. info in redis provides a similar number and represents only the cache itself) does not represent the hit rate of storage and business.

IV. Causes and business permission?

There are many reasons for this: it may be caused by problems with the code itself or the existence of data, and it is also likely to be some malicious ***, crawlers, etc.(because http read interfaces are open)

Whether the business is allowed: whether the project or business to be seen allows this to happen, such as doing some non-real-time recommendation system, if a new user comes, there is really no recommendation data (recommendation data is usually calculated according to historical behavior), this kind of business will occur penetration phenomenon, as for whether the business is allowed to be specific analysis. download

v. Solution:

There are roughly two solutions, as shown in the table below. These are explained separately below

Solution Cache Penetration Applicable Scenario Maintenance Cost Cache Empty Object

1. Data hits are not high

2. Frequent data changes, high real-time

1. Simple code maintenance

2. Requires excessive cache space

3. data inconsistency

bloomfilter or compression filter early interception

1. Data hits are not high

2. Data is relatively fixed and real-time is low

1. Complex code maintenance

2. Less cache space

1. cache empty object download

(1). Definition: As shown in the figure above, after MISS in step 2, the empty object is still retained in the Cache (it may be retained for a few minutes or a period of time, and the specific problem is analyzed). The next new Request (the same key) will obtain data from the Cache, protecting the backend Storage.

(2)Applicable scenarios: low data hit rate, frequent data changes and high real-time performance (some random transfer services)

(3)Maintenance costs: The code is relatively simple, but there are two problems:

The first is that the null value is cached, which means that there are more key-values stored in the cache system, that is, more space is needed (some people say that there are not many null values, but there are not many). The solution is that we can set a shorter expiration time.

The second is that the data will be inconsistent for a period of time. If Cache is set to expire in 5 minutes, then Storage does have the value of this data, and the data inconsistency will occur during this period of time. The solution is that we can use messages or other methods to clear the data in Cache.

(4)Pseudocode:

Java code download

package com.carlosfu.service;

import org.apache.commons.lang.StringUtils;

import com.carlosfu.cache.Cache;

import com.carlosfu.storage.Storage;

/**

* a service

*

* @author carlosfu

* @Date 2015-10-11

* @Time PM 6:28:46

*/

public class XXXService {

/**

* cache

*/

private Cache cache = new Cache();

/**

* storage

*/

private Storage storage = new Storage();

/**

* analog normal mode

* @param key

* @return

*/

public String getNormal(String key) {

//Get data from cache

String cacheValue = cache.get(key);

//cache empty

if (StringUtils.isBlank(cacheValue)) {

//get from storage

String storageValue = storage.get(key);

//If the stored data is not empty, set the stored value to cache

if (StringUtils.isNotBlank(storageValue)) {

cache.set(key, storageValue);

}

return storageValue;

} else {

//cache is not empty

return cacheValue;

}

}

/**

* simulated penetration prevention mode

* @param key

* @return

*/

public String getPassThrough(String key) {

//Get data from cache

String cacheValue = cache.get(key);

//cache empty

if (StringUtils.isBlank(cacheValue)) {

//get from storage

String storageValue = storage.get(key);

cache.set(key, storageValue);

//If the stored data is empty, set an expiration time (300 seconds)

if (StringUtils.isBlank(storageValue)) {

cache.expire(key, 60 * 5);

}

return storageValue;

} else {

//cache is not empty

return cacheValue;

}

}

}

2. bloomfilter or compressed filter(bitmap, etc.) to block downloads in advance

(1). Definition: As shown in the above figure, before accessing all resources (cache, storage), the existing key is saved in advance with Bloom filter to do the first layer interception, for example: our recommendation service has 400 million user uid, we will recommend according to the user's historical behavior (non-real time), all user recommendation data is put into hbase, but many new users come to the website every day, and these users will penetrate to hbase on the same day. To do this we make a Bloom filter for all uids at 4:00 a day. If Bloom filter thinks uid doesn't exist, hbase won't be accessed, protecting hbase to some extent (by about 30%). download

(2)Applicable scenarios: low data hit, low data real-time stability (usually large data set)

(3)Maintenance cost: code maintenance is complex, cache space is small

The first is that the null value is cached, which means that there are more key-values stored in the cache system, that is, more space is needed (some people say that there are not many null values, but there are not many). The solution is that we can set a shorter expiration time.

The second is that the data will be inconsistent for a period of time. If Cache is set to expire in 5 minutes, then Storage does have the value of this data, and the data inconsistency will occur during this period of time. The solution is that we can use messages or other methods to clear the data in Cache.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report