Learn these skills to keep Redis big key problems away from you. 07/06 Update SLTechnology News&Howtos

Learn these skills to keep Redis big key problems away from you.

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

As an early entrant in the domestic third-party push market, GE Tweet focuses on providing efficient and stable push services for developers. After 9 years of accumulation and development, it has served hundreds of thousands of APP, including Sina and Didi. Because our push business requires high concurrency and speed, we choose the high-performance in-memory database Redis. However, in the actual business scenario, we also encounter some service blocking problems caused by Redis big key, so we have accumulated some experience in dealing with it. This article will introduce the discovery of large key and the solution of blocking caused by large key deletion.

Some scenarios and problems of Redis Big key

Big key scene

Redis users should have encountered scenarios related to large key, such as:

1. Ranking scenarios of comments and answers under hot topics.

2. Fan list of Big V.

3. Improper use, or inaccurate business estimation, not timely processing of garbage data, etc.

Large key problem

Because the Redis main thread is a single-threaded model, large key also brings some problems, such as:

1. In cluster mode, data and queries are skewed when the slot is evenly shredded. Some Redis nodes with large key take up more memory and have high QPS.

2. When a large key is deleted or expires automatically, the qps will drop or rise sharply. In extreme cases, it will cause a master-slave replication exception, and the Redis service blocking will not be able to respond to the request. For the volume and deletion time of a large key, please refer to the following table:

Key Typ

Number of field

Time-consuming

Hash

~ 1 million

~ 1000ms

List

~ 1 million

~ 1000ms

Set

~ 1 million

~ 1000ms

Sorted Set

~ 1 million

~ 1000ms

Methods of Discovery and deletion of large key before Redis 4.0

1. Redis-rdb-tools tool. Execute bgsave on the redis instance, and then analyze the rdb file from dump to find the big KEY in it.

2. Redis-cli-- bigkeys command. You can find the maximum key of five data types (String, hash, list, set, zset) for an instance.

3. Custom scan scripts, most of which are Python scripts, similar to redis-cli-bigkeys.

4. Debug object key command. You can view the serialized length of a key, and you can only find information about a single key at a time. It's not officially recommended.

Redis-rdb-tools tool

For a detailed description of the rdb tool, please see the link https://github.com/sripathikrishnan/redis-rdb-tools, which only describes memory-related usage here. The basic command is rdb-c memory dump.rdb (where dump.rdb is the rdb file of the Redis instance and can be generated through bgsave).

The output is as follows:

Database,type,key,size_in_bytes,encoding,num_elements,len_largest_element

0,hash,hello1,1050,ziplist,86,22

0,hash,hello2,2517,ziplist,222,8

0,hash,hello3,2523,ziplist,156,12

0,hash,hello4,62020,hashtable,776,32

0,hash,hello5,71420,hashtable,1168,12

You can see that the output information includes data type, key, memory size, encoding type, and so on. Rdb tool has the advantages of obtaining detailed key information, many optional parameters, supporting customized requirements, and the result information can be selected in json or csv format, which is convenient for subsequent processing. Its disadvantage is that it needs offline operation and takes a long time to obtain results.

Redis-cli-- bigkeys command

Redis-cli-bigkeys is a command that comes with redis-cli. It scans the whole redis, looks for a larger key, and prints the statistical results.

For example, redis-cli-p 6379-- bigkeys

# Scanning the entire keyspace to find biggest keys as well as

# average sizes per key type. You can use-I 0.1 to sleep 0.1 sec

# per 100SCAN commands (not usually needed).

[00.72%] Biggest hash found so far 'hello6' with 43 fields

[02.81%] Biggest string found so far 'hello7' with 31 bytes

[05.15%] Biggest string found so far 'hello8' with 32 bytes

[26.94%] Biggest hash found so far 'hello9' with 1795 fields

[32.00%] Biggest hash found so far 'hello10' with 4671 fields

[35.55%] Biggest string found so far 'hello11' with 36 bytes

-summary-

Sampled 293070 keys in the keyspace!

Total key length in bytes is 8731143 (avg len 29.79)

Biggest string found 'hello11' has 36 bytes

Biggest hash found 'hello10' has 4671 fields

238027 strings with 2300436 bytes (81.22% of keys, avg size 9.66)

0 lists with 0 items (00.00% of keys, avg size 0.00)

0 sets with 0 members (00.00% of keys, avg size 0.00)

55043 hashs with 289965 fields (18.78% of keys, avg size 5.27)

0 zsets with 0 members (00.00% of keys, avg size 0.00)

We can see that the print result is divided into two parts, and the scanning process part shows only the largest key scanned to the current stage. The summary section gives the maximum Key and statistics of each data structure.

The advantage of redis-cli-bigkeys is that it can scan online without blocking the service, while the disadvantage is that there is less information and the content is not accurate enough. Only the string type in the scan results is measured by byte length. List, set, zset and so on are all measured by the number of elements. A large number of elements does not mean that it takes up a lot of memory.

Customize the Python scan script

Get the size of bytes or the number of elements through strlen, hlen, scard and other commands. The scan result is more refined than redis-cli-keys, but the disadvantage is the same as redis-cli-keys.

In short, the previous method is either a long time offline parsing, or a sampling scan that is not detailed enough, which is a distance from the ideal online scan with memory as the dimension to obtain detailed information. Since there is no lazy free mechanism before redis4.0, for the scanned large key,DBA, several elements can only be deleted gradually through hscan, sscan, and zscan. However, in the case of scenarios with expired delete keys, this clever deletion is powerless. We can only pray that automatic cleaning of expired key will reduce the impact on the business just when the system is at its low peak.

Discovery and deletion of large key after Redis 4. 0

Redis 4.0introduces the memory usage command and lazyfree mechanism, which can significantly improve both the discovery of large key and the resolution of blocking problems caused by deletion or expiration of large key.

Let's understand the characteristics of memory usage and lazyfree from the source code (extracted from Redis version 5.0.4).

Memory usage

{"memory", "memoryCommand,-2,"rR", 0penol Null, 0re0re0re0re0re0jue 0}

(server.c 285)

Void memoryCommand (client c) {

/... /

The key size is calculated by sampling partial field to estimate the total size. /

Else if (! strcasecmp (c-> argv [1]-> ptr, "usage") & & c-> argc > = 3) {

Size_t usage = objectComputeSize (dictGetVal (de), samples)

/. * /

}

(object.c 1299)

From the above source code, you can see that memory usage calculates the size of key by calling objectComputeSize. Let's look at the logic of the objectComputeSize function.

# define OBJ_COMPUTE_SIZE_DEF_SAMPLES 5 / Default sample size. /

Size_t objectComputeSize (robj o, size_t sample_size) {

/. The code classifies the data types. Here only the hash type description /

/... /

/ cyclic sampling field, accumulating to obtain the memory value of the sampling sample. The default sampling sample is 5 /

While ((de = dictNext (di))! = NULL & & samples

< sample_size) { ele = dictGetKey(de); ele2 = dictGetVal(de); elesize += sdsAllocSize(ele) + sdsAllocSize(ele2); elesize += sizeof(struct dictEntry); samples++; } dictReleaseIterator(di); /根据上一步计算的抽样样本内存值除以样本量，再乘以总的filed个数计算总内存值/ if (samples) asize += (double)elesize/samplesdictSize(d); /.../ } (object.c 779⾏) 由此，我们发现memory usage默认抽样5个field来循环累加计算整个key的内存大小，样本的数量决定了key的内存大小的准确性和计算成本，样本越大，循环次数越多，计算结果更精确，性能消耗也越多。我们可以通过Python脚本在集群低峰时扫描Redis，用较小的代价去获取所有key的内存大小。以下为部分伪代码，可根据实际情况设置大key阈值进行预警。 for key in r.scan_iter(count=1000): redis-cli = '/usr/bin/redis-cli' configcmd = '%s -h %s -p %s memory usage %s' % (redis-cli, rip,rport,key) keymemory = commands.getoutput(configcmd) lazyfree机制 Lazyfree的原理是在删除的时候只进行逻辑删除，把key释放操作放在bio(Background I/O)单独的子线程处理中，减少删除大key对redis主线程的阻塞，有效地避免因删除大key带来的性能问题。在此提一下bio线程，很多人把Redis通常理解为单线程内存数据库, 其实不然。Redis将最主要的网络收发和执行命令等操作都放在了主工作线程，然而除此之外还有几个bio后台线程，从源码中可以看到有处理关闭文件和刷盘的后台线程，以及Redis4.0新增加的lazyfree线程。 / Background job opcodes / #define BIO_LAZY_FREE 2 / Deferred objects freeing. / (bio.h 38⾏) 下面我们以unlink命令为例，来理解lazyfree的实现原理。 {"unlink",unlinkCommand,-2,"wF",0,NULL,1,-1,1,0,0}, (server.c 137⾏) void unlinkCommand(client *c) { delGenericCommand(c,1); } (db.c 490⾏) 通过这几段源码可以看出del命令和unlink命令都是调用delGenericCommand，唯一的差别在于第二个参数不一样。这个参数就是异步删除参数。 / This command implements DEL and LAZYDEL. / void delGenericCommand(client c, int lazy) { /.../ int deleted = lazy ? dbAsyncDelete(c->

Db,c- > argv [j]):

DbSyncDelete (c-> db,c- > argv [j])

/. * /

}

(db.c 468)

You can see that the delGenericCommand function determines whether to delete synchronously or asynchronously based on the lazy parameter. When the unlink command is executed, the lazy parameter value of 1 is passed in and the asynchronous delete function dbAsyncDelete is called. Otherwise, execute the del command to pass in the parameter value 0 and call the synchronous delete function dbSyncDelete. Let's focus on the implementation logic of asynchronously deleting dbAsyncDelete:

# define LAZYFREE_THRESHOLD 64

/ defines the threshold for background deletion. It is only when the element of key is greater than this threshold that it is actually thrown to the background thread to delete /

Int dbAsyncDelete (redisDb db, robj key) {

/... /

/ lazyfreeGetFreeEffort to get the number of elements contained in the val object /

Size_t free_effort = lazyfreeGetFreeEffort (val)

/ * judge the deletion of key, and delete it in the background when the threshold condition is met * / if (free_effort > LAZYFREE_THRESHOLD & & val- > refcount = = 1) {atomicIncr (lazyfree_objects,1); bioCreateBackgroundJob (BIO_LAZY_FREE,val,NULL,NULL); / * put the deleted object into the task queue of BIO_LAZY_FREE background thread * / dictSetVal (db- > dict,de,NULL) / * set the vale value obtained in the first step to null*/} / *... * /

}

(lazyfree.c 53)

It was mentioned above that when the delete key meets the threshold condition, the key is placed in the BIO_LAZY_FREE background thread task queue. Next let's take a look at the BIO_LAZY_FREE background thread.

/... /

Else if (type = = BIO_LAZY_FREE) {

If (job- > arg1)

/ the background deletes the object function, and calls decrRefCount to reduce the reference count of key. When the reference count is 0, the resource is actually released /

LazyfreeFreeObjectFromBioThread (job- > arg1)

Else if (job- > arg2 & & job- > arg3)

/ clear the database dictionary in the background and call dictRelease loop to traverse the database dictionary to delete all key /

LazyfreeFreeDatabaseFromBioThread (job- > arg2,job- > arg3)

Else if (job- > arg3)

/ delete the key-slots mapping table at the backend, which will be used in Redis cluster mode /

LazyfreeFreeSlotsMapFromBioThread (job- > arg3)

}

(bio.c 197)

The logic of the unlink command can be summarized as follows: execute unlink, call the delGenericCommand function, pass in the lazy parameter value 1, call the asynchronous delete function dbAsyncDelete, and put the large key that meets the threshold into the BIO_LAZY_FREE background thread task queue for asynchronous deletion. Similar background delete commands include flushdb async and flushall async. Their principle is to get the deletion identity to judge, and then call the asynchronous delete function emptyDbAsnyc to empty the database. The specific implementation logic of these commands can be viewed on their own part of the flushdbCommand source code, here do not repeat.

In addition to active large key deletions and database emptying operations, deletions caused by expired key evictions can also block Redis services. Therefore, in addition to the above three background deletion commands, Redis4.0 has also added four background deletion configuration items, namely slave-lazy-flush, lazyfree-lazy-eviction, lazyfree-lazy-expire and lazyfree-lazy-server-del.

The slave-lazy-flush:slave clears the data option after receiving the RDB file. It is recommended that you enable slave-lazy-flush, which can reduce the flush operation time of slave nodes, thus reducing the possibility of full master-slave synchronization.

Lazyfree-lazy-eviction: memory is full of eviction options. If this option is turned on, the memory release of eliminating key may not be enough in time, and the memory will be overused.

Lazyfree-lazy-expire: expired key deletion option. It is recommended to open it.

Lazyfree-lazy-server-del: internal delete options, such as when the rename command modifies oldkey to an existing newkey, the newkey will be deleted first. If the newkey is a large key, it may cause blocking deletions. It is recommended to open it.

The logic of the above four background deletion-related parameters are not much different, and they are all judged by the parameter options, thus choosing whether to use dbAsyncDelete or emptyDbAsync for asynchronous deletion.

Summary

In some business scenarios, the problem of Redis large key is unavoidable. However, memory usage command and lazyfree mechanism provide memory dimension sampling algorithm and asynchronous deletion optimization function respectively. These features help us to better prevent the generation of large key and solve the blocking caused by large key in the actual business. The idea of optimizing the Redis kernel can also be seen in the blog of Redis author Antirez, who proposed "Lazy Redis is better Redis" and "Slow commands threading" (allowing slow operation commands to be executed in different threads), and asynchronization should be the main direction of Redis optimization.

As an important basic service of push message push, the performance of Redis is very important. After upgrading the Redis version from 2.8to 5.0, the blocking problem caused by the deletion or expiration of some large key has been effectively solved. In the future, Tweet will continue to follow Redis 5.0and subsequent Redis 6.0to discuss how to make better use of Redis.

Reference documentation:

1. Http://antirez.com/news/93

2. Http://antirez.com/news/126

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.