Redis blocking analysis 07/09 Update SLTechnology News&Howtos

Redis blocking analysis

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Redis is a classic single-threaded architecture, where all read and write operations are done in a single main thread. When redis is in a high concurrency situation, if blocking occurs, even for a very short time, it will be very serious for the application, there will be a large number of timeout problems, application problems.

1. The blocking of redis mainly includes two aspects:

1.1Internal causes: unreasonable use of API or data structures, CPU saturation persistence blocking

1.2 external causes: CPU competition, memory swapping, network problems

1.1 Internal reasons:

1.1.1: how to find slow query: slowlog get [N] option: n, optional, represents the number of log entries obtained

1.1.2: how to find large objects: redis-cli-h {ip}-p {port}-bigkeys

1.1.3:CPU saturation problem: single-threaded Redis can only use one CPU when processing commands, while CPU saturation means that Redis runs the single-core CPU utilization to close to 100%. CPU saturation causes Redis to be unable to handle more commands, seriously affecting throughput and the stability of the application side.

How to find CPU saturation: redis-cli-h {ip}-p {port}-stat

1.1.4: persistence related blocking:

A.fork blocking: the fork operation itself takes too long and causes the main thread to block.

Determined by the latest_fork_usec metric in info stats (in microseconds), it means that the last fork operation is time-consuming. If it takes a long time, such as more than 1 second, you need to make optimization adjustments, such as not using too large memory instances, or avoiding the slow xen virtual machine of fork.

B.AOF flushing block: when we enable AOF persistence, the file flushing method is usually once per second, and the background thread does fsync operations on AOF files every second. When the hard disk is under too much pressure, the fsync operation needs to wait until the write is complete. If the main thread finds that it has been more than 2 seconds since the last fsync was successful, it blocks until the background thread executes the fsync operation for the sake of data security. This blocking behavior is mainly caused by the pressure of the hard disk. The following message appears in the background log:

Asynchronous AOF fsync is taking too long (disk is busy). Writing the AOFbuffer without waiting for fsync to complete, this may slow down Redis.

1.2 external reasons:

1.2.1:CPU competition: redis is a classic CPU-intensive application and is not recommended for use with other programs. You can use the top command for both problems

1.2.2: bind CPU: optimize the binding of Redis to CPU to reduce frequent context switching in CPU.

Note: binding CPU is not recommended for master nodes that enable persistence or participate in replication, to prevent fierce CPU competition between parent and child processes and affect Redis stability.

1.2.3: memory interchange: locate the memory swap method:

a. Query redis process number: redis-cli-p 6384 info server | grep process_id

b. Query memory exchange information by process number: cat / proc/xxxx/smaps | grep Swap

c. If the exchanges are all 0kb or occasional 4kb is normal.

d. Reduce the priority of using swap in the system: modify swappiness

1.2.4: network problems:

A. Redis connection rejection: Redis controls the maximum number of client connections through the maxclients parameter. Default is 10000. Check info stats's rejected_connections statistics to show the number of rejections. Client access should be long-connection or connection pool square as far as possible. Process restriction optimization: set ulimit-n 65535 to prevent Too many Open files

B.backlog queue overflow: the default backlog of the system is 128. optimization: use echo 512 > / proc/sys/net/core/somaxconn to modify the system default parameters. If backlog queue overflow is suspected, queue overflow statistics:

Netstat-s | grepoverflowed to see if there is a growing number of connection rejections.

c. Network delay: network delay statistics:

Redis-cli-h {host}-p {port}-- latency

Statistics: minimum value, maximum value, average value, sampling times

Network delay generally occurs in cross-room deployment.

d. Network card soft interrupt: only one CPU can be used for a single network card queue. Under high concurrency, the network card data is concentrated under one CPU, resulting in the inability to use multi-core CPU. The soft interrupt bottleneck of the network card generally occurs in the scenario of high network traffic, and the si index of top is too high.

Using the top command, press 1 to troubleshoot.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.