How to determine the performance problem of Redis and how to solve it 05/09 Update SLTechnology News&Howtos

How to determine the performance problem of Redis and how to solve it

2025-05-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

In this article, the editor introduces in detail "how to determine the performance problem of Redis and how to solve it". The content is detailed, the steps are clear, and the details are handled properly. I hope this article "how to determine the performance problem of Redis and how to solve it" can help you solve your doubts.

Redis is usually an important component in our business system, such as cache, account login information, ranking and so on.

Once the delay of Redis requests increases, it may lead to an avalanche of business systems.

I work for a single matchmaker-type Internet company, where I launch an event to send my girlfriend as soon as an order is issued on Singles Day.

Who would have thought that after 12:00 in the morning, the number of users exploded, there was a technical glitch, the user could not place an order, and the boss was furious at that time!

After searching, I found that Redis reported to Could not get a resource from the pool.

Connection resources are not available, and the number of connections to a single Redis in the cluster is high.

A large number of traffic without the cache response of Redis, directly called MySQL, and finally the database was down.

As a result, various changes to the maximum number of connections, connection waiting number, although the frequency of error information has been alleviated, but still continue to report errors.

Later, after offline testing, it is found that the character data stored in Redis is very large, returning data in an average of 1 second.

It can be found that once the Redis latency is too high, it will cause all kinds of problems.

Today, let's take a look at how to determine that Redis has performance problems and solutions.

Is there something wrong with Redis performance?

The maximum delay is the time between the time the client issues the command and the time the client receives the response to the command. Normally, the Redis processing time is very short, at the microsecond level.

When the performance of Redis fluctuates, such as a few seconds to more than ten seconds, it is clear that we can assume that Redis performance has slowed down.

Some hardware configurations are relatively high, and when 0.6ms is delayed, we may decide that it is slower. If the hardware is poor, it may be 3 ms before we think there is a problem.

So how do we define that Redis is really slowing down?

Therefore, we need to measure the Redis baseline performance of the current environment, that is, the basic performance of a system under low pressure and no interference.

When you find that the latency of Redis runtime is more than twice the baseline performance, you can determine that Redis performance is slower.

Delayed baseline measurement

The redis-cli command provides the-intrinsic-latency option to monitor and count the maximum delay (in milliseconds) during the test, which can be used as the baseline performance of the Redis.

Redis-cli-- latency-h `host`-p `port`

For example, execute the following instructions:

Redis-cli-- intrinsic-latency 100Max latency so far: 4 microseconds.Max latency so far: 18 microseconds.Max latency so far: 41 microseconds.Max latency so far: 57 microseconds.Max latency so far: 78 microseconds.Max latency so far: 170 microseconds.Max latency so far: 342 microseconds.Max latency so far: 3079 microseconds.45026981 total runs (avg latency: 2.2209 microseconds / 2220.89 nanoseconds per run). Worst run took 1386x longer than the average latency.

Note: parameter 100 is the number of seconds the test will be executed. The longer we run the test, the more likely we are to find peak latency.

Usually running for 100 seconds is usually appropriate enough to find the delay problem, of course, we can choose to run several times at different times to avoid errors.

The maximum running delay is 3079 microseconds, so the baseline performance is 3079 (3 milliseconds) microseconds.

It is important to note that we are running on the server side of Redis, not on the client side. In this way, the impact of the network on baseline performance can be avoided.

The server can be connected through-h host-p port, and if you want to monitor the performance impact of the network on the Redis, you can use Iperf to measure the network latency from the client to the server.

If the network delay is a few hundred milliseconds, it means that there may be other high-traffic programs running on the network, which may cause network congestion, and it is necessary to find the operation and maintenance staff to coordinate the traffic distribution of the network.

Slow instruction monitoring

How to tell if it is a slow instruction?

See if the complexity of the operation is O (N). The complexity of each command is described in the official documentation, using O (1) and O (log N) commands whenever possible.

The complexity related to collection operations is generally O (N), such as set full query HGETALL, SMEMBERS, and collection aggregation operations: SORT, LREM, SUNION and so on.

Is there any monitoring data to observe? I didn't write the code. I don't know if anyone used slow instructions.

There are two ways to troubleshoot:

Use the Redis slow log feature to find out slow commands

Latency-monitor (delay Monitoring) tool.

In addition, you can use yourself (top, htop, prstat, etc.) to quickly check the CPU consumption of the Redis main process. If CPU usage is high and traffic is not high, it usually indicates the use of slow commands.

Slow log function

The slowlog command in Redis allows us to quickly locate slow commands that exceed the specified execution time. By default, commands that take longer than 10ms will be logged.

Slowlog only records the time it takes to execute its commands, does not include io round trips, and does not record slow responses caused by network delays alone.

We can customize the standard of slow commands based on baseline performance (configured to 2 times the maximum delay of baseline performance) and adjust the threshold to trigger recording slow commands.

You can enter the following command in redis-cli to configure instructions that record more than 6 milliseconds:

Redis-cli CONFIG SET slowlog-log-slower-than 6000

It can also be set in the Redis.config configuration file in microseconds.

To see all the commands that are slow to execute, you can view them by using the Redis-cli tool, typing the slowlog get command, and the third field that returns the result shows the execution time of the command in microsecond bits.

If you only need to view the last two slow commands, type slowlog get 2.

Example: get the last 2 slow query commands

127.0.1 SLOWLOG get 6381 > integer 21) 1) (integer) 6 2) (integer) 1458734263 3) (integer) 74372 4) 1) "hgetall" 2) "max.dsp.blacklist" 2) 1) (integer) 52) (integer) 1458734258 3) (integer) 5411075 4) 1) "keys" 2) "max.dsp.blacklist"

Taking the first HGET command as an example, there are four fields for each slowlog entity:

Field 1:1 integer, indicating the sequence number of this slowlog, which is incremented after server is started, and is currently 6.

Field 2: indicates the Unix timestamp when the query was executed.

Field 3: indicates the number of microseconds in which the query is executed, which is currently 74372 microseconds, about 74ms.

Field 4: represents the commands and parameters of the query. If there are many or large parameters, only part of the number of parameters will be displayed. The current command is hgetall max.dsp.blacklist.

Latency Monitoring

Redis introduced the Latency Monitoring feature in version 2.8.13 to monitor the frequency of various events at a granularity of seconds.

The first step in enabling the delay monitor is to set the delay threshold in milliseconds. Only the time that exceeds this threshold is recorded, for example, we set the threshold to 9 ms based on 3 times the baseline performance (3ms).

It can be set with redis-cli or in Redis.config.

CONFIG SET latency-monitor-threshold 9

Details of the related events recorded by the tool can be found in the official document: https://redis.io/topics/latency-monitor

Such as getting the most recent latency

127.0.0.1 debug sleep 2OK (2.00s) 127.0.0.1 command > latency latest1) 1) "command" 2) (integer) 1645330616 3) (integer) 2003 4) (integer) 2003

Name of the event

The Unix timestamp of the latest delay in the event

Time delay in milliseconds

The maximum delay for this event.

How to solve the problem of Redis slowing down?

The data reading and writing of Redis is performed by a single thread, and if the operation time of the main thread is too long, it will cause the main thread to block.

Let's analyze what operations will block the main thread, and how should we solve it?

Delay caused by network traffic

The client connects to the Redis using a TCP/IP connection or a Unix domain connection. The typical delay of 1 Gbit/s network is about 200 us.

The redis client executes a command in four procedures:

Send Command-> Command queue-> Command execution-> return result

This process is called Round trip time (RTT for short, round trip time). Mget mset effectively saves RTT, but most commands (such as hgetall, without mhgetall) do not support batch operation and need to consume N times of RTT. In this case, pipeline is needed to solve this problem.

Redis pipeline connects multiple commands together to reduce the number of network response round trips.

Redis-pipeline

Delay caused by slow instruction

According to the slow instruction above, the query document is monitored and the slow query instruction is queried. It can be solved in two ways:

For example, in a Cluster cluster, O (N) operations such as aggregation operations are run on slave or completed on the client.

Use efficient commands instead. Use incremental iterations to avoid querying large amounts of data at once, as shown in the SCAN, SSCAN, HSCAN, and ZSCAN commands.

In addition, the KEYS command is disabled in production, which applies only to debugging. Because it traverses all key-value pairs, the operation latency is high.

Delay caused by Fork generating RDB

To generate a RDB snapshot, the Redis must be a fork background process. The fork operation (which runs in the main thread) itself causes delays.

Redis uses the operating system's multi-process write-time replication technology COW (Copy On Write) to achieve snapshot persistence and reduce memory footprint.

Replication-on-write technology ensures that data can be modified during snapshots

However, fork involves copying a large number of linked objects, and a large Redis instance of 24 GB requires a page table of 24 GB / 4 kB * 8 = 48 MB.

When performing bgsave, this will involve allocating and copying 48 MB memory.

In addition, the read and write service can not be provided during loading RDB from the library, so the data size of the master library is controlled at about 2cm 4G, so that the slave library can be loaded quickly.

Large pages of memory (transparent huge pages)

Regular memory pages are allocated according to 4 KB, and the Linux kernel has supported the memory large page mechanism since 2.6.38, which supports 2MB-sized memory page allocation.

Redis uses fork to generate RDB for persistence to provide a guarantee of data reliability.

In the process of generating RDB snapshots, Redis uses * * copy while writing * * technology so that the main thread can still receive write requests from the client.

That is, when the data is modified, Redis will copy a copy of the data and then modify it.

Large pages in memory are used. During the generation of RDB, even if the data modified by the client is only 50B, Redis needs to copy the large pages of 2MB. When more instructions are written, it will result in a large number of copies, resulting in slower performance.

Disable Linux memory large pages using the following directive:

Echo never > / sys/kernel/mm/transparent_hugepage/enabled

Swap: operating system paging

When the physical memory (memory bar) is not enough, swap some of the data on the memory to the swap space, so that the system will not run out of memory and lead to oom or more fatal situations.

When a process requests OS for insufficient memory, OS will swap out data that is temporarily unused in memory and put it in the SWAP partition, a process called SWAP OUT.

When a process needs this data again and OS finds that there is still free physical memory, it swaps the data from the SWAP partition back into physical memory, a process called SWAP IN.

Memory swap is the mechanism of swapping memory data back and forth between memory and disk in the operating system, which involves reading and writing of disk.

What are the situations that trigger swap?

There are two common situations for Redis:

Redis uses more memory than available memory

Other processes running on the same machine as Redis are performing a large number of file read and write operations (including RDB files and AOF backstage threads that generate large files). File read and write takes up memory, resulting in a reduction of memory acquired by Redis and triggering swap.

How can I troubleshoot if the performance is slow due to swap?

Linux provides a good tool to troubleshoot this problem, so when you suspect delays caused by swapping, just follow these steps.

Get Redis instance pid

$redis-cli info | grep process_idprocess_id:13160

Enter the / proc file system directory for this process:

Cd / proc/13160

Here is a file for smaps, which describes the memory layout of the Redis process. Run the following instruction to find the Swap fields in all files with grep.

$cat smaps | egrep'^ (Swap | Size) 'Size: 316 kBSwap: 0 kBSize: 4 kBSwap: 0 kBSize: 8 kBSwap: 0 kBSize: 40 kBSwap: 0 kBSize: 132 kBSwap: 0 kBSize: 720896 kBSwap: 12 kB

Each row of Size indicates the size of a piece of memory used by the Redis instance, and how much data has been swapped out to disk in the Size-sized memory area corresponding to the Swap under the Size.

If Size = = Swap, the data has been completely swapped out.

You can see that there is a memory size of 720896 kB with 12 kb swapped out to disk (only 12 kB swapped), which is no problem.

Redis itself uses many blocks of memory of different sizes, so you can see that there are many Size lines, some very small, which is 4KB, while others are very large, such as 720896KB. Different blocks of memory are swapped out to disk in different sizes.

Hit the point.

If everything in Swap is 0 kb, or sporadic 4k, then everything is fine.

When a 100-MB or even GB-level swap appears, it indicates that the memory pressure on the Redis instance is very high and is likely to slow down.

Solution

Increase machine memory

Run Redis on a separate machine to avoid running processes that require a lot of memory on the same machine, thus meeting the memory requirements of Redis

Increase the number of Cluster clusters, share the amount of data, and reduce the memory required for each instance.

Latency caused by AOF and disk IBO

To ensure data reliability, Redis uses AOF and RDB snapshots for fast recovery and persistence.

You can use the appendfsync configuration to configure AOF to execute write or fsync on disk in three different ways (you can modify this setting at run time using the CONFIG SET command, such as redis-cli CONFIG SET appendfsync no).

No:Redis does not execute fsync, the only delay comes from write calls, and write only needs to write log records to the kernel buffer to return.

Everysec:Redis executes fsync once a second. Use the backstage child thread to complete the fsync operation asynchronously. Lose up to 1 second of data.

Always: each write operation executes fsync and then replies to the client with OK code (Redis actually tries to aggregate many commands executed at the same time into a single fsync) without data loss. In this mode, performance is usually very low, and it is highly recommended to use fast disks and file system implementations that can execute fsync in a short period of time.

We usually use Redis for caching. Data loss is obtained from data maliciously and does not require high data reliability. It is recommended to set it to no or everysec.

In addition, to avoid the AOF file being too large, Redis will rewrite the AOF to generate a scaled-down AOF file.

You can set the configuration item no-appendfsync-on-rewrite to yes, which means that no fsync operation occurs when AOF is rewritten.

In other words, after the Redis instance writes the write command to memory, it returns without calling the background thread for fsync operation.

Expires obsolete data

Redis has two ways to phase out expired data:

Lazy deletion: delete is performed when the key is found to have expired when the request is received

Scheduled deletion: delete some expired key every 100ms.

The algorithm for timing deletion is as follows:

Randomly sample the key of the number of ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP, and delete all expired key

If more than 25% of the key is found to be out of date, perform step 1.

The default setting of ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP is 20, which is executed 10 times per second, and it is not a problem to delete 200 key.

If the second entry is triggered, it will cause Redis to delete expired data to free memory. The deletion is blocked.

What is the trigger condition?

That is, a large number of key sets the same time parameters. In the same second, a large number of key expires and needs to be deleted multiple times before it can be reduced to less than 25%.

In short: a large number of key that expires at the same time can cause performance fluctuations.

Solution

If a batch of key does expire at the same time, you can add a random number within a certain range to the expiration time parameters of EXPIREAT and EXPIRE, which not only ensures that the key is deleted within an adjacent time range, but also avoids the pressure caused by simultaneous expiration.

Bigkey

Usually, a Key with more than big data or a large number of members and lists is called a large Key. Here are some practical examples to describe the characteristics of a large Key:

A Key of type STRING whose value is 5MB (the data is too large)

A Key of type LIST with 10000 lists (too many lists)

A Key of type ZSET with 10000 members (too many members)

A Key in HASH format. Although there are only 1000 members, the total value size of these members is 10MB (members are too large)

The problems caused by bigkey are as follows:

Increasing Redis memory leads to OOM, or reaching the maxmemory setting causes write blocking or important Key being expelled

One node memory in Redis Cluster is much larger than the rest of node, but the memory on node cannot be equalized because the minimum granularity of data migration of Redis Cluster is Key.

The read request of bigkey takes up too much bandwidth, which slows down itself and affects other services on the server.

Deleting a bigkey causes the master database to block for a long time and causes synchronization interruption or master-slave switching.

Find bigkey

Use the redis-rdb-tools tool to find the large Key in a customized way.

Solution

Split large key

If we split a HASH Key with tens of thousands of members into multiple HASH Key, and ensure that the number of members of each Key is within a reasonable range, in the Redis Cluster structure, the split of the large Key can play a significant role in the memory balance between node.

Asynchronously clean up large key

Redis has provided the UNLINK command since 4. 0, which can slowly and gradually clean up incoming Key in a non-blocking manner. With UNLINK, you can safely delete large Key and even extra-large Key.

After reading this, the article "how to determine the performance problem of Redis and how to solve it" has been introduced. If you want to master the knowledge of this article, you still need to practice and use it yourself to understand it. If you want to know more about related articles, welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.