Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Redis implements data deletion strategy and eviction algorithm

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

This article will explain in detail about the data deletion strategy and eviction algorithm of redis. The editor thinks it is very practical, so I share it with you for reference. I hope you can get something after reading this article.

Data storage and expiration date

In the redis workflow, expired data does not need to be deleted immediately. Because whether these deletions are just a state indication, they can be processed asynchronously, and these non-urgent deletion operations can be done when they are not busy, so as to ensure the efficiency of redis.

Storage of data

Data storage in redis requires not only the preservation of the data itself, but also the life cycle of the data, that is, the expiration time. The storage structure of the data in redis is shown below:

Obtain the validity period

Redis is a memory-level database. All data is stored in memory, and the data in memory can be obtained through TTL instructions.

Delete Policy

Finding a balance between memory usage and CPU usage can lead to a decline in overall redis performance and even lead to server downtime or memory leaks.

Scheduled deletion

Create a timer that deletes the key immediately by the timer task when the expiration time is set by key and the expiration time arrives

Advantages

Save memory, delete it then, and quickly release unnecessary memory footprint.

Shortcoming

The pressure of CPU is very high, no matter how high the load of CPU at this time, it takes up CPU, which will affect the response time and instruction throughput of redis server.

Summary

Exchange processor performance for storage space

Lazy deletion

The data arrives at the expiration time and is not processed. The next time the data is accessed, if it is not expired, the data will be returned. Found to have expired, delete, return does not exist. In this way, each time you read and write the data, you need to detect whether the data has reached the expiration time. That is, lazy deletion always occurs when data is read and written.

ExpireIfNeeded function

Check all read and write commands to see if the object you operate on is out of date. Delete if it expires and return it to expire. If you don't expire, you will do nothing.

In the process of data writing, the expiration of the written key is judged by the expireIfNeeded function.

/ * * fetches the value of the key key in the database db to perform a write operation. * * unlike lookupKeyRead, this function does not update the server's hit / miss information. * * the value object is returned when it is found, but NULL is not found. * / robj * lookupKeyWrite (redisDb * db, robj * key) {/ / delete the expired key expireIfNeeded (db,key); / / find and return the value object return lookupKey (db,key) of key;}

In the process of data reading, the expireIfNeeded function is used to judge the expiration of the written key.

/ * * fetches the value of the key key in the database db to perform a read operation. * * and update the hit / miss information of the server according to whether the value is found successfully. * * the value object is returned when it is found, but NULL is not found. * / robj * lookupKeyRead (redisDb * db, robj * key) {robj * val; / / check that the key release has expired expireIfNeeded (db,key); / / fetch the value of the key from the database val = lookupKey (db,key); / / update hit / miss message if (val = = NULL) server.stat_keyspace_misses++; else server.stat_keyspace_hits++; / / return value return val;}

The expireIfNeeded actually does three things internally to execute the expired action, namely:

Check the key to determine whether the expired key is propagated to the slave node and send an event notification to delete the expired key/* * check whether the key has expired and, if so, delete it from the database. * * returning 0 means that the key does not expire or the key does not expire. * * return 1 indicates that the key has been deleted because it has expired. * / int expireIfNeeded (redisDb * db, robj * key) {/ / Expiration time of the fetch key mstime_t when = getExpire (db,key); mstime_t now; / / No expiration time if (when)

< 0) return 0; /* No expire for this key */ /* Don't expire anything while loading. It will be done later. */ // 如果服务器正在进行载入,那么不进行任何过期检查 if (server.loading) return 0; // 当服务器运行在 replication 模式时 // 附属节点并不主动删除 key // 它只返回一个逻辑上正确的返回值 // 真正的删除操作要等待主节点发来删除命令时才执行 // 从而保证数据的同步 if (server.masterhost != NULL) return now >

When; / / runs here, indicating that the key has an expiration time, and the server primary node / * Return when this key has not expired * / returns 0 if (now id) if it does not expire; / / removes the expired key from the database return dbDelete (db,key);}

The data structure to determine whether the key is out of date is db- > expires, that is, to determine whether the data is out of date by the data structure of the expires.

Internally gets the expiration time and returns.

/ * * return the node containing the key key in the dictionary * * find the return node, and cannot find the return NULL * * T = O (1) * / dictEntry * dictFind (dict * d, const void * key) {dictEntry * he; unsigned int h, idx, table; / / dictionary (hash table of if) is empty if (d-> ht [0] .size = = 0) return NULL / * We don't have a table at all * / / if conditions permit, single-step rehash if (dictIsRehashing (d)) _ dictRehashStep (d); / / calculate the hash value of the key h = dictHashKey (d, key); / / find the key / / T = O (1) for (table = 0; table ht [table] .sizemask) in the hash table of the dictionary / / iterate through all nodes of the linked list on a given index, looking for key he = d-> ht [table] .table [idx]; / / T = O (1) while (he) {if (dictCompareKeys (d, key, he- > key)) return he; he = he- > next } / / if the program traverses the hash table No. 0 and still cannot find the node of the specified key / / then the program will check whether the dictionary is rehash, / / and then decide whether to return NULL directly or continue to search the hash table No. 1 if (! dictIsRehashing (d)) return NULL;} / / at this point, the two hash tables have not found return NULL;}

Advantages

Save CPU performance and delete it only when you find it must be deleted.

Shortcoming

The memory pressure is very high, and there is data that takes up memory for a long time.

Summary

Exchange storage space for processor performance

Delete periodically

Periodically poll the timeliness data in the redis database, adopt the strategy of random extraction, and delete the frequency by the proportion of expired data.

Advantages

The CPU performance occupancy setting has a peak, and the detection frequency can be customized.

The memory pressure is not very great, and the cold data that has occupied memory for a long time will be cleaned continuously.

Shortcoming

Storage space needs to be checked periodically.

Delete details on a regular basis

The periodic deletion of redis is achieved through scheduled tasks, that is, timed tasks call the serverCron method in a loop. Then the method to check the expired data periodically is databasesCron. A major feature of regular deletion is that it takes up cpu time to delete expired data on a regular basis, so the usage of cpu is limited to no more than 25% each time databasesCron is executed. What really does the deletion is the activeExpireCycle method.

Time event

For a continuously running server, the server needs to regularly check and organize its own resources and status, so as to keep the server in a healthy and stable state, which is collectively referred to as cron job.

In Redis, general operations are implemented by redis.c/serverCron (), which mainly performs the following operations

Update all kinds of statistical information of the server, such as time, memory footprint, database occupancy and so on.

2 clean up the expired key-value pairs in the database.

(3) resize the unreasonable database.

4 close and clean up the clients whose connection fails.

5 try an AOF or RDB persistence operation.

6 if the server is the primary node, synchronize the subsidiary nodes periodically.

7 if you are in cluster mode, perform regular synchronization and connectivity tests on the cluster.

Because serverCron () needs to be run periodically while the Redis server is running, it is a cyclical time event: serverCron () is executed periodically until the server shuts down.

In Redis 2.6, the program requires serverCron () to run 10 times per second, an average of once every 100ms. Starting with Redis 2.8, users can adjust the number of serverCron () execution per second by modifying the hz option. For more information, please refer to the description of the hz option in the redis.conf file.

View hz

Way1: config get hz # "hz"10" way2: info server # server.hz 10

ServerCron ()

ServerCron () is executed periodically, and the databasesCron () method is called in the execution of serverCron () (serverCron () does a lot of other things, but we won't discuss it now, just delete policies).

Int serverCron (struct aeEventLoop * eventLoop, long long id, void * clientData) {/ / omit the polyunrelated code / * We need to do a few operations on clients asynchronously. * / check the client, close the timeout client, and release the client's extra buffer clientsCron (); / * Handle background operations on Redis databases. * / / perform various operations on the database databasesCron (); / *! The way we focus! , /

DatabasesCron ()

The activeExpireCycle () method is called in databasesCron () to process the expired data. (some other operations will be done here ~ resize the database, active and progressive rehash)

/ / A pair of databases perform deletion of expired keys, resize, and active and progressive rehashvoid databasesCron (void) {/ / determine whether it is the primary server. If active expired keys are executed to clear if (server.active_expire_enabled & & server.masterhost = = NULL) / / clear mode is CYCLE_SLOW, this mode will clear as many expired keys as possible activeExpireCycle (ACTIVE_EXPIRE_CYCLE_SLOW) / / when there is no BGSAVE or BGREWRITEAOF execution, rehash if the hash table (server.rdb_child_pid =-1 & & server.aof_child_pid = =-1) {static unsigned int resize_db = 0; static unsigned int rehash_db = 0; unsigned int dbs_per_call = REDIS_DBCRON_DBS_PER_CALL; unsigned int j; / * Don't test more DBs than we have. * / / set the number of databases to be tested if (dbs_per_call > server.dbnum) dbs_per_call = server.dbnum; / * Resize * / resize the dictionary for (j = 0; j

< dbs_per_call; j++) { tryResizeHashTables(resize_db % server.dbnum); resize_db++; } /* Rehash */ // 对字典进行渐进式 rehash if (server.activerehashing) { for (j = 0; j < dbs_per_call; j++) { int work_done = incrementallyRehash(rehash_db % server.dbnum); rehash_db++; if (work_done) { /* If the function did some work, stop here, we'll do * more at the next cron loop. */ break; } } } }} activeExpireCycle() 大致流程如下 1 遍历指定个数的db(默认的 16 )进行删除操作 2 针对每个db随机获取过期数据每次遍历不超过指定数量(如20),发现过期数据并进行删除。 3 如果有多于25%的keys过期,重复步骤 2 除了主动淘汰的频率外,Redis对每次淘汰任务执行的最大时长也有一个限定,这样保证了每次主动淘汰不会过多阻塞应用请求,以下是这个限定计算公式: #define ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC 25 /* CPU max % for keys collection */ ``... ``timelimit = 1000000*ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC/server.hz/100; 也就是每次执行时间的25%用于过期数据删除。 void activeExpireCycle(int type) { // 静态变量,用来累积函数连续执行时的数据 static unsigned int current_db = 0; /* Last DB tested. */ static int timelimit_exit = 0; /* Time limit hit in previous call? */ static long long last_fast_cycle = 0; /* When last fast cycle ran. */ unsigned int j, iteration = 0; // 默认每次处理的数据库数量 unsigned int dbs_per_call = REDIS_DBCRON_DBS_PER_CALL; // 函数开始的时间 long long start = ustime(), timelimit; // 快速模式 if (type == ACTIVE_EXPIRE_CYCLE_FAST) { // 如果上次函数没有触发 timelimit_exit ,那么不执行处理 if (!timelimit_exit) return; // 如果距离上次执行未够一定时间,那么不执行处理 if (start < last_fast_cycle + ACTIVE_EXPIRE_CYCLE_FAST_DURATION*2) return; // 运行到这里,说明执行快速处理,记录当前时间 last_fast_cycle = start; } /* * 一般情况下,函数只处理 REDIS_DBCRON_DBS_PER_CALL 个数据库, * 除非: * * 1) 当前数据库的数量小于 REDIS_DBCRON_DBS_PER_CALL * 2) 如果上次处理遇到了时间上限,那么这次需要对所有数据库进行扫描, * 这可以避免过多的过期键占用空间 */ if (dbs_per_call >

Server.dbnum | | timelimit_exit) dbs_per_call = server.dbnum; / / the upper limit of microsecond time processed by the function / / ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC defaults to 25, that is, 25% of the CPU time timelimit = 1000000 active active expired expired CYCLED sloping timed PERCmax 100; timelimit_exit = 0; if (timelimit expires)) = 0) {db- > avg_ttl = 0; break } / / get the number of key-value pairs in the database slots = dictSlots (db- > expires); / / current time now = mstime (); / / the utilization rate of this database is less than 1%, so it is too hard to scan (most of them will MISS) / / skip, and wait for the dictionary shrinking program to run if (num & slots > DICT_HT_INITIAL_SIZE & & (num*100/slots).

< 1)) break; /* * 样本计数器 */ // 已处理过期键计数器 expired = 0; // 键的总 TTL 计数器 ttl_sum = 0; // 总共处理的键计数器 ttl_samples = 0; // 每次最多只能检查 LOOKUPS_PER_LOOP 个键 if (num >

ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP) num = ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP; / / start traversing the database while (num--) {dictEntry * de; long long ttl; / / randomly take a key if with expiration time from expires ((de = dictGetRandomKey (db- > expires)) = = NULL) break; / / calculate TTL ttl = dictGetSignedIntegerVal (de)-now / / if the key has expired, delete it and add an if (activeExpireCycleTryExpire (db,de,now)) expired++; if (ttl) to the expired counter

< 0) ttl = 0; // 累积键的 TTL ttl_sum += ttl; // 累积处理键的个数 ttl_samples++; } /* Update the average TTL stats for this database. */ // 为这个数据库更新平均 TTL 统计数据 if (ttl_samples) { // 计算当前平均值 long long avg_ttl = ttl_sum/ttl_samples; // 如果这是第一次设置数据库平均 TTL ,那么进行初始化 if (db->

Avg_ttl = = 0) db- > avg_ttl = avg_ttl; / * Smooth the value averaging with the previous one. * / / take the average of the last average TTL of the database and the average TTL of this time db- > avg_ttl = (db- > avg_ttl+avg_ttl) / 2;} / / We cannot take too long to process expired keys, / / so this function will return / / update the number of traverses iteration++; / / execute if ((iteration & 0xf) = 0 & / * check once every 16 iterations) 16 times per traversal. * / (ustime ()-start) > timelimit) {/ / if the traversal number happens to be a multiple of 16 / / and the traversal time exceeds timelimit / / then disconnect timelimit_exit timelimit_exit = 1;} / / has timed out, return if (timelimit_exit) return / / if deleted expired keys account for 25% of the total number of keys with expiration time in the current database / / then no longer traverse} while (expired > ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP/4);}}

Scaling up hz will increase the frequency of active elimination of Redis. If your Redis storage contains a lot of cold data that takes up too much memory, you can consider increasing this value, but the Redis authors recommend that this value not exceed 100. We actually increased this value to 100 online and observed that CPU increased by about 2%, but there was a significant increase in memory release speed for cold data (by observing the number of keyspace and used_memory size).

You can see that the relationship between timelimit and server.hz is reciprocal, that is, the larger the hz configuration, the smaller the timelimit. In other words, the higher the expected frequency of active elimination per second, the shorter the maximum duration of each phase-out. Here, the maximum phase-out time per second is a fixed 250ms (1000000*ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC/100), while the phase-out frequency and the maximum time of each phase-out are controlled by the hz parameter.

Therefore, when the expired key ratio in the redis does not exceed 25%, increasing the hz can significantly increase the minimum number of scanned key. Assuming that hz is 10, at least 200 key are scanned in a second (10 calls per second * at least 20 key are randomly selected each time). If hz is changed to 100, then at least 2000 key; are scanned in one second. On the other hand, if the expired key ratio exceeds 25%, there is no limit to the number of scanned key, but the cpu time takes up 250ms at most per second.

When REDIS is running in master-slave mode, only the master node executes these two expired deletion policies, and then synchronizes the deletion operation "del key" to the slave node.

If (server.active_expire_enabled & & server.masterhost = = NULL) / / to determine whether the master node is a slave node does not need to execute the activeExpireCycle () function. / / clear mode is CYCLE_SLOW, which removes as many expired keys as possible, activeExpireCycle (ACTIVE_EXPIRE_CYCLE_SLOW)

Random number

Redis.config.ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP determines the number of values randomly selected from the database expire per loop

Ejection algorithm

If you don't limit reids's memory usage, it will use all of its memory. You can specify the amount of memory used by redis through config.memory.

Here are the instructions in the redis configuration file

543 # Set a memory usage limit to the specified amount of bytes.

544 # When the memory limit is reached Redis will try to remove keys

545 # according to the eviction policy selected (see maxmemory-policy).

546 #

547 # If Redis can't remove keys according to the policy, or if the policy is

548 # set to 'noeviction', Redis will start to reply with errors to commands

549 # that would use more memory, like SET, LPUSH, and so on, and will continue

550 # to reply to read-only commands like GET.

551 #

552 # This option is usually useful when using Redis as an LRU or LFU cache, or to

553 # set a hard memory limit for an instance (using the 'noeviction' policy).

554 #

555 # WARNING: If you have replicas attached to an instance with maxmemory on

556 # the size of the output buffers needed to feed the replicas are subtracted

557 # from the used memory count, so that network problems / resyncs will

558 # not trigger a loop where keys are evicted, and in turn the output

559 # buffer of replicas is full with DELs of keys evicted triggering the deletion

560 # of more keys, and so forth until the database is completely emptied.

561 #

562 # In short... If you have replicas attached it is suggested that you set a lower

563 # limit for maxmemory so that there is some free RAM on the system for replica

564 # output buffers (but this is not needed if the policy is' noeviction').

Sets the memory usage limit to the specified bytes. When the memory limit is reached, Redis attempts to delete data based on the selected eviction policy (see maxmemory policy).

If Redis cannot remove the key according to the eviction policy, or if the policy is set to "noeviction", Redis will start replying incorrectly to commands that use more memory, such as set, LPUSH, and so on, and will continue to reply to read-only commands, such as GET.

This option is often useful when using Redis as a LRU or LFU cache or setting a hard memory limit for an instance (using the "noeviction" policy).

Warning: if a copy is attached to an maxmemory-enabled instance, the size of the output buffer required to feed the copy will be subtracted from the used memory count, so that network problems / resynchronization will not trigger a key recovery loop, and the copy's output buffer will be filled with recovered key increments, triggering the deletion of more keys, and so on, until the database is completely empty.

In a nutshell. If a copy is attached, it is recommended that you set the lower limit of the maxmemory so that there is some free RAM on the system for the copy output buffer (but this restriction is not required if the policy is "noeviction").

Configuration of eviction strategy

Maxmemery-policy volatile-lru

Active cleanup policy is triggered when the current used memory exceeds the maxmemory limit

Volatile data cleaning

Volatile-lru: LRU only key with expiration time set (default)

Volatile-random: randomly delete an expiring key

Volatile-ttl: delete the expiring

Volatile-lfu: pick the most recently used data to be eliminated

All data cleanup

Allkeys-lru: delete the key of the lru algorithm

Allkeys-lfu: pick the most recently used data to be eliminated

Allkeys-random: random deletion

Prohibition of eviction

(Redis 4.0 default policy)

Noeviction: never expires, returns an error when the mem_used memory has exceeded the maxmemory setting, the redis.c/freeMemoryIfNeeded (void) function is triggered for all read and write requests to clean up the excess memory. Note that the cleanup process is blocked until enough memory space is cleared. So if the maxmemory is reached and the caller is still writing, the active cleanup policy may be triggered repeatedly, resulting in a certain delay in the request.

During cleaning, appropriate cleaning will be done according to the maxmemory-policy configured by the user (usually LRU or TTL). The LRU or TTL policy here does not aim at all key of redis, but takes the maxmemory-samples key in the configuration file as the sample pool for sample cleaning.

The default configuration of maxmemory-samples in redis-3.0.0 is 5. If you increase it, it will improve the accuracy of LRU or TTL. The result of the redis author's test is that when this configuration is 10:00, it is very close to the accuracy of full LRU, and increasing maxmemory-samples will lead to more CPU time consumed during active cleaning. It is recommended:

1 do not trigger maxmemory as far as possible. After the memory consumption of mem_used reaches a certain proportion of maxmemory, you need to consider increasing hz to speed up phase-out, or to expand cluster capacity.

2 if you can control the memory, you do not need to modify the maxmemory-samples configuration; if the Redis itself is a LRU cache service (this service is usually in the maxmemory state for a long time, and Redis automatically does LRU elimination), you can appropriately increase the maxmemory-samples.

To mention here, in fact, redis does not accurately delete the longest unused keys in the entire database, but randomly takes five keys from the database at a time and deletes the longest unused keys of the five keys. All the random operations mentioned above are actually like this, and this 5 can be configured with the maxmemeory-samples parameter in redis's configuration file.

Data eviction policy configuration basis

Use the INFO command to output monitoring information, query cache int and miss times, and tune Redis configuration according to business requirements.

On the redis implementation of data deletion strategy and eviction algorithm to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report