How to realize Fast recovery and persistence of fearless downtime in Redis 07/02 Update SLTechnology News&Howtos

How to realize Fast recovery and persistence of fearless downtime in Redis

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly shows you "how to achieve fast recovery and persistence of fearless downtime in Redis". The content is simple and clear. I hope it can help you solve your doubts. Let me lead you to study and learn this article "how to achieve fast recovery and persistence of fearless downtime in Redis".

It's right to be a maverick, and it's right to fit into the circle. The point is to figure out what kind of life you want and what price you are willing to pay for it.

We usually use Redis as a cache to improve read response performance. Once Redis goes down, all data in memory will be lost. If we directly access the database and a large amount of traffic calls to MySQL, it may bring more serious problems.

In addition, the performance of slowly reading from the database to Redis must be faster than getting it from Redis, which will also lead to slower response.

In order to achieve fearless downtime and fast recovery, Redis has designed two killer mace, namely AOF (Append Only FIle) log and RDB snapshot.

Learning a technology usually only comes into contact with scattered technical points, does not establish a complete knowledge framework and architecture system in mind, and does not have a systematic view. This will be very difficult, and will appear as if they can, and then forget, look confused.

Along with the "code brother byte" to get through the Redis, deeply grasp the core principles of Redis and actual combat skills. Build a complete knowledge framework and learn to take a global view to sort out the whole knowledge system.

This article hard core, it is recommended to collect likes, calm down to read, I believe there will be a lot of gains.

In the previous article, we analyzed the core data structure of Redis, the IO model, the threading model, and the use of appropriate data encoding according to different data. In-depth grasp of the reasons for real speed!

This article will focus on the following points:

How to recover quickly after downtime?

How can Redis avoid data loss when it is down?

What is a RDB memory snapshot?

AOF log implementation mechanism

What is the write-time replication technique?

... .

The knowledge points involved are shown in the figure:

Redis panorama

Panoramas can be expanded around two dimensions, namely:

Application dimensions: cache use, cluster use, ingenious use of data structures

System dimension: can be classified as three high

High performance: thread model, network IO model, data structure, persistence mechanism

High availability: master-slave replication, Sentinel cluster, Cluster sharding cluster

High expansion: load balancing

The Redis series revolves around the following mind maps, this time exploring the secrets of Redis's high-performance, persistence mechanisms.

Have panoramic picture, master system view.

In fact, the system view is very important. to some extent, having a system view when solving problems means that you can locate and solve problems in a systematic and methodical manner.

RDB memory snapshot for fast recovery of downtime

65 Brother: when Redis goes down for some reason, all traffic will be called to the backend MySQL. I immediately restart Redis, but its data is stored in memory. How do I still have no data after reboot? how can I prevent data loss from restart?

65 Don't worry, "Code Bytes" will give you a step-by-step understanding of how Redis recovers quickly after a crash.

Redis data is stored in memory, can you consider writing the data in memory to disk? When Redis restarts, the data saved on disk is quickly restored to memory, so that services can be provided normally after restart.

65 Brother: I came up with a plan to write to disk every time I perform a "write" operation to operate memory.

There is a fatal problem with this scheme: each write instruction not only writes to memory but also writes to disk, and the performance of disk is too slow compared to memory, which will greatly degrade the performance of Redis.

Memory snapshot

65 Brother: so how to avoid the problem of writing at the same time?

We usually use Redis as a cache, so even if Redis does not save all the data, it can still be obtained through the database, so Redis will not save all the data. Redis data persistence uses "RDB data snapshot" to achieve fast recovery from downtime.

65 Brother: then what is the RDB memory snapshot?

As Redis executes the write instruction, the in-memory data changes all the time. The so-called memory snapshot refers to the state data of the data in Redis memory at some point.

For example, when the time is fixed at a certain moment, when we take a picture, we can completely record the instantaneous picture of a certain moment through the photo.

Redis is similar to this in that it takes pictures of the data at a certain moment in the form of a file and writes it to disk. This snapshot file is called the RDB file, and RDB is the abbreviation of Redis DataBase.

Redis performs RDB memory snapshots regularly so that you don't have to write to the disk every time the write instruction is executed, only when the memory snapshot is performed. It not only ensures that it is fast but not broken, but also achieves persistence and rapid recovery from downtime.

When doing data recovery, read the RDB file directly into memory to complete the recovery.

65 Brother: which data should be snapped? Or how often do you take a snapshot? This will affect the efficiency of snapshot execution.

Brother 65 is not bad. I'm starting to think about data efficiency. In the previous article, we know that his single-threaded model determines that we should avoid blocking the main thread as much as possible, and prevent RDB file generation from blocking the main thread.

Generate RDB policy

Redis provides two instructions for generating RDB files:

Save: the main thread executes and blocks

Bgsave: the function fork that calls glibc produces a child process to write to the RDB file, snapshot persistence is completely handed over to the child process, and the parent process continues to process client requests to generate the default configuration of the RDB file.

65 Brother: when taking a snapshot of the memory data, can the memory data still be modified? That is, can write instructions be handled properly?

First of all, let's be clear that avoiding blocking is not the same thing as being able to handle writes during RDB file generation. Although the main thread is not blocked, at that time, in order to ensure the consistency of the data of the snapshot, it can only handle read operations and cannot modify the data that is performing the snapshot.

Obviously, Redis is not allowed to suspend write operations in order to generate RDB.

65 Brother: so how does Redis process write requests and generate RDB files at the same time?

Redis uses the operating system's multi-process write-time replication technology, COW (Copy On Write), for snapshot persistence, which is interesting and little known. Multi-process COW is also an important indicator to evaluate the breadth of programmers' knowledge.

During persistence, Redis calls glibc's function fork to generate a child process, snapshot persistence is completely handed over to the child process, and the parent process continues to process client requests.

When a child process is first generated, it shares code and data segments in memory with the parent process. At this point, you can think of the father-son process as conjoined twins, sharing the body.

This is the mechanism of the Linux operating system, so let them be shared as much as possible in order to save memory resources. At the moment of process separation, there is almost no significant change in memory growth.

The bgsave child process can share all the memory data of the main thread, read the data of the main thread and write to the RDB file.

When you execute the SAVE command or the BGSAVE command to create a new RDB file, the program checks the keys in the database, and expired keys are not saved to the newly created RDB file.

When the main thread executes the write instruction to modify the data, the data will make a copy, and the bgsave child process reads the copy data and writes it to the RDB file, so the main thread can modify the original data directly.

This not only ensures the integrity of the snapshot, but also allows the main thread to modify the data at the same time, avoiding the impact on normal business.

Redis uses bgsave to take a snapshot of all data currently in memory, which is done by the child process in the background, which allows the main thread to modify the data at the same time.

65 Brother: can the RDB file be executed every second, so that no more than 1 second of data can be lost even if there is an outage?

Performing full data snapshots too frequently has two serious performance overhead:

Frequently generate RDB files to write to disk, disk pressure is too high. It will appear that the last RDB is not finished, and the next one starts to be generated, falling into an endless loop.

The fork bgsave child process blocks the main thread, and the larger the memory of the main thread, the longer the blocking time.

Advantages and disadvantages

The recovery speed of snapshots is fast, but the frequency of generating RDB files is not easy to grasp, and if the frequency is too low, there will be more data lost due to downtime; if it is too fast, it will consume extra overhead.

RDB uses binary + data compression to write to disk, which has the advantages of small file size and fast data recovery.

In addition to RDB full snapshots, Redis also designs AOF post-write logs. Next, let's talk about what AOF logs are.

AOF logs after writing to avoid downtime data loss

The AOF log stores the sequence of instructions of the Redis server, and the AOF log records only the instructions that make changes to memory.

Assuming that the AOF log records all the modification instruction sequences since the creation of the Redis instance, you can restore the state of the in-memory data structure of the current Redis instance by executing all the instructions sequentially on an empty Redis instance, that is, "replay".

Comparison of pre-write and post-write logs

Log before writing (Write Ahead Log, WAL): before the data is actually written, the modified data is written to the log file, and the failure recovery is guaranteed.

For example, the redo log (redo log) in the MySQL Innodb storage engine records the modified data log and records the modified data before the actual modification.

Log after writing: first execute the "write" instruction request, write the data to memory, and then record the log.

Log format

When Redis receives the "set key MageByte" command to write the data to memory, Redis writes to the AOF file in the following format.

"* 3": indicates that the current instruction is divided into three parts, each beginning with a "$+ number", followed by the specific "instruction, key, value" of that part.

"number": indicates the size of bytes occupied by commands, keys, and values in this part. For example, "$3" means that this part contains 3 bytes, that is, the "set" instruction.

65 Brother: why does Redis use post-writing log?

The post-write log avoids the extra check overhead and does not need to check the syntax of the executed command. If you use pre-write logging, you need to check the syntax first, otherwise the log records the wrong commands and errors will occur when using log recovery.

In addition, logging only after writing will not block the execution of the current write instruction.

65 Brother: is it foolproof with AOF?

Silly boy, it's not that simple. If Redis has just finished executing instructions and has not recorded log downtime, it is possible to lose data related to this command.

Also, AOF avoids blocking the current command, but may pose a risk of blocking the next command. The AOF log is executed by the main thread, and during the process of writing the log to disk, if the disk pressure is high, it will cause the disk to be written very slowly, resulting in subsequent "write" instruction blocking.

Have you found that these two problems are related to disk write back? if you can reasonably control the time when the AOF log is written back to disk after the "write" instruction is executed, the problem will be easily solved.

Writeback strategy

In order to improve the writing efficiency of the file, when the user calls the write function to write some data to the file, the operating system usually temporarily saves the written data in a memory buffer until the buffer space is filled up, or after the specified time limit is exceeded, the data in the buffer is actually written to disk.

While this improves efficiency, it also poses security problems for writing data, because if the computer goes down, the write data stored in the memory buffer will be lost.

For this reason, the system provides two synchronization functions, fsync and fdatasync, which can force the operating system to write the data in the buffer to the hard disk immediately, thus ensuring the security of writing data.

The AOF configuration item appendfsync writeback policy provided by Redis directly determines the efficiency and security of AOF persistence.

Always: write back synchronously, and write the contents of the aof_buf buffer to the AOF file immediately after the write instruction is executed.

Everysec: write back every second, when the write instruction is executed, the log is only written to the AOF file buffer, and the contents of the buffer are synchronized to disk every other second.

No: operating system control, write execution is completed, write the log to the AOF file memory buffer, and it is up to the operating system to decide when to write to disk.

There is no strategy of having the best of both worlds, and we need to make a trade-off between performance and reliability.

Always synchronous writeback does not lose data, but each "write" instruction needs to be written to disk, which has the worst performance.

Everysec writes back every second, avoiding the performance overhead of synchronous writeback, and the loss of data written to disk in one second may occur in the event of downtime, making a tradeoff between performance and reliability.

No operating system control, after the execution of write instructions, write AOF file buffer can execute subsequent "write" instructions, the best performance, but may lose a lot of data.

65 Brother: then how should I choose my strategy?

We can choose the writeback strategy according to the requirements of the system for high performance and high reliability. To sum up: if you want high performance, choose the No strategy; if you want high reliability, choose the Always strategy; if you allow a little bit of data loss, but do not want the performance to be affected too much, then choose the Everysec strategy.

Advantages and disadvantages

Advantages: log only after successful execution, avoiding the overhead of instruction syntax checking. At the same time, the current write instruction is not blocked.

Disadvantages: because AOF records the contents of individual instructions, please see the log format above. Each instruction needs to be executed during fault recovery, and if the log file is too large, the whole recovery process will be very slow.

In addition, the file system also has restrictions on the file size, can not save too large files, the file becomes larger, the additional efficiency will become lower.

Log is too large: AOF rewriting mechanism

65 Brother: what if the AOF log file is too large?

The AOF pre-write log records each "write" instruction operation. It will not cause performance loss like the full snapshot of RDB, but the execution speed is not as fast as RDB, and the large log file will also cause performance problems. For the real man, Redis, who is fast and unbreakable, the problem caused by excessive log size can never be tolerated.

So, Redis designed a killer "AOF rewriting mechanism", and Redis provided bgrewriteaof instructions to slim down AOF logs.

The principle is to open up a child process to traverse the memory and convert it into a series of Redis operation instructions, which are serialized into a new AOF log file. After serialization is completed, the incremental AOF logs that occur during the operation are appended to the new AOF log file, and the old AOF log file is replaced immediately after the append is completed, and the slimming work is completed.

65 Brother: why can the AOF rewrite mechanism shrink the log files?

The rewriting mechanism has a "changeable one" function, which turns multiple instructions in the old log into one instruction after rewriting.

As follows:

Three LPUSH instructions are rewritten by AOF to generate one. For scenarios that have been modified many times, the reduction effect is more obvious.

65 Brother: after rewriting, the AOF log becomes smaller, and finally the operation log of the latest data in the whole database is brushed to disk. Will rewriting block the main thread?

As mentioned above, the AOF log is written back by the main thread, and the process of AOF rewriting is actually completed by the backstage child process bgrewriteaof to prevent blocking of the main thread.

Rewriting process

Unlike AOF logs, which are written back by the main thread, the rewrite process is done by the backstage child process bgrewriteaof, which is also to avoid blocking the main thread and causing database performance degradation.

In general, there are two logs, one copy of in-memory data, the old AOF log and the new AOF rewrite log and Redis data copy.

Redis records the "write" instruction operations received during the rewrite to both the old AOF buffer and the AOF rewrite buffer, so that the rewrite log also saves the latest operations. When all operation records for copying data are rewritten, the latest operations for rewriting buffer records are also written to the new AOF file.

Each time AOF is rewritten, Redis first performs a memory copy to traverse the data to generate a rewrite record; two logs are used to ensure that newly written data is not lost during the rewrite process and that data consistency is maintained.

65 Brother: AOF rewrite also has a rewrite log, why doesn't it share the log using AOF itself?

This is a good question for two reasons:

One reason is that the parent-child process will inevitably have competition problems when writing the same file, and controlling the competition means that it will affect the performance of the parent process.

If the AOF rewrite process fails, the original AOF file is equivalent to being contaminated and cannot be reused. So Redis AOF rewrites a new file, and if the rewrite fails, just delete the file and will not affect the original AOF file. After the rewrite is complete, you can directly replace the old file.

Redis 4. 0 mixed log model

When restarting Redis, we rarely use rdb to restore the memory state because a large amount of data is lost. We usually use AOF log playback, but the performance of replaying AOF logs is much slower than rdb, so it takes a long time to start when the Redis instance is large.

To solve this problem, Redis 4.0introduced a new persistence option-hybrid persistence. Store the contents of the rdb file with the incremental AOF log file. The AOF log here is no longer a full log, but an incremental AOF log that occurs during the period from the beginning of the persistence to the end of the persistence, which is usually very small.

Therefore, when Redis is restarted, the contents of rdb can be loaded first, and then the incremental AOF logs can be replayed, which can completely replace the previous AOF full file replay, thus greatly improving the restart efficiency.

So RDB memory snapshots are performed at a slightly slower frequency, using all "write" operations that occur during AOF logging during two RDB snapshots.

In this way, snapshots do not need to be executed frequently, and because AOF only needs to record the "write" instructions that occur between two snapshots, there is no need to record all operations to avoid excessive file size.

These are all the contents of the article "how to achieve Fast recovery and persistence in Redis without fear of downtime". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.