Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the problem location and optimization suggestion of Redis large KEY in distributed cache database

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article introduces the distributed cache database Redis big KEY problem positioning and optimization suggestions are how, the content is very detailed, interested friends can refer to, hope to be helpful to you.

How to locate the distributed cache database Redis big KEY problem, the actual case will take you to master the optimization method.

[background]

OOM error occurred when visiting Redis 5.0cluster cluster, and the error message was (error) OOM command not allowed when used memory > 'maxmemory',. Some ECS applications could not write to the database, which affected the normal use of the service. When executing set T2 S2, the database reports an error OOM, as shown in the following figure:

[topology]

Environmental Information:

Redis 5.0G cluster cluster 4G memory

DCS network segment: 192.168.1.0 + 24

Sharded 1:master 192.168.1.12 slave 192.168.1.37

Sharded 2:master 192.168.1.10 slave 192.168.1.69

Sharded 3:master 192.168.1.26 slave 192.168.1.134

[analytical ideas]

[detailed steps] 1. View and monitor

Check the Redis instance monitoring and show that the memory consumption of the Redis cluster is 46.97%. There is no obvious exception. The result is shown below:

View the memory monitoring of the node. The memory utilization of the master node 192.168.1.10 in shard 2 is 100%, and the memory utilization of the other two shards is about 20%. The result is shown below:

2. Confirm the abnormal fragmentation information

According to the above monitoring information, the memory utilization of shard 2 in the redis cluster is up to 100%. There is and only this shard memory exception.

3. Online analysis of large KEY analysis

① tool analysis: use Huawei Cloud Management console to cache analysis-large Key analysis tool. After the execution is complete, you can view the information. The result is as follows: (save string type save top20,list/set/zset/hash type save top80)

For more information on how to use it, please see the following link: https://support.huaweicloud.com/usermanual-dcs/dcs-ug-190808001.html

② command analysis: using the redis-cli-h IP-p port-bigkeys command, the tool lists information about the largest key in the large Key of each type of data. The result is shown in the following figure:

As shown in the figure above, it can be concluded that the large key of type string in this environment is "nc_filed/_pk", and no large key is found for data of type 13283, set, hash, and zset.

Offline mode

Offline analysis requires the use of special rdb_bigkeys analysis tools to analyze rdb files. Tool address: https://github.com/weiyanwei412/rdb_bigkeys. The specific steps are as follows:

Compilation method:

# yum install git go-y

# mkdir / home/gocode/

# cd / home/gocode/

# git clone https://github.com/weiyanwei412/rdb_bigkeys.git

# cd rdb_bigkeys

# go build

The execution finishes generating the executable file rdb_bigkeys.

How to use it:

. / rdb_bigkeys-bytes 1024-file bigkeys.csv-sorted-threads 4 / home/redis/dump.rdb

Parameter description:

-bytes 1024: filter key larger than 1024 bytes

-file bigkeys.csv: save the results to the bigkeys.csv file

-sorted: sort from big to small

-threads: number of threads used

/ home/redis/dump.rdb: the actual rdb file path

The generation file style is as follows:

Each column is the database number, key type, key name, key size, number of elements, maximum element name, element size, and key expiration time. Document link: https://www.cnblogs.com/yqzc/p/12425533.html

IV. Solution

The root cause of this OOM problem is the uneven distribution of data size due to large KEY. A shard memory reaches maxmemory. In the process of data writing, if the shard is scheduled to the shard, an OOM problem will occur. Export a copy of the rdb file of the fragment so that the corresponding optimization can be made for the large key later.

Interim plan:

In order to reply to the business as soon as possible, delete the large KEY queried in the steps above, and perform the following operations: (for non-string bigkey, do not use del to delete, but use hscan, sscan, and zscan to delete gradually)

Long-term plan:

By splitting a large KEY, a large KEY is split into several small KEY, which becomes a value1,value2. ValueN, which can not be divided into different shards to avoid uneven data distribution caused by data tilt.

Other types of data can be split and reassembled in the same way to avoid the impact of large KEY.

5. Verification of results

After viewing the sharding monitoring, the memory utilization of 192.168.1.10 dropped to 24%. The result is as follows:

Execute set T2 S2, return to normal, log in to the cluster, execute get command, and return data information normally. The results are shown below, and the business has returned to normal.

[optimization and suggestion]

1) configure the alarm of memory utilization monitoring metrics at the node level. If a node has a large key, the memory utilization of this node is much higher than that of other nodes, and an alarm will be triggered to facilitate users to find potential large key.

2) configure the alarms of maximum inbound bandwidth, maximum outbound bandwidth and CPU utilization monitoring metrics at the node level. If there is a hot key in a node, the bandwidth consumption and CPU utilization of this node are higher than those of other nodes. This node will easily trigger an alarm to facilitate users to find potential hot keys.

3) the string type should be controlled within 10KB, and the elements of hash, list, set and zset should not exceed 5000 as far as possible.

4) check whether there is a big key problem in the cluster regularly through large key and hot key analysis tools, and identify the risk as soon as possible.

On the distributed cache database Redis big KEY problem positioning and optimization suggestions are shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report