Detailed explanation of Redis persistent Storage (1) 07/19 Update SLTechnology News&Howtos

Detailed explanation of Redis persistent Storage (1)

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Why do you want persistent storage?

Persistent storage is to store the data stored in Redis in memory in the hard disk to achieve permanent preservation of the data. We all know that Redis is a memory-based nosql database, memory storage is easy to cause data loss, because when the server shutdown and other abnormal conditions will lead to the loss of data stored in memory.

Persistent storage classification

In Redis, there are two types of persistent storage. One is to append aof logs, and the other is to take snapshots of rdb data.

RDB persistent storage

What is RDB persistent storage

RDB persistent storage is to save the data stored in redis in memory in the form of a snapshot on the local disk.

.RDB persistent storage is divided into automatic backup and manual backup.

1. Manual backup is done through the save command and the bgsave command. Save is synchronous blocking, while bgsave is non-blocking (blocking actually occurs in a child process of fork). Therefore, in our actual process, we mostly use the bgsave command to achieve backup.

Redis > SAVEOKredis > BGSAVEBackground saving started

two。 Automatic backup

a. Modifying the configuration item save m n means that the command has been executed n times in m seconds and the backup will be made.

b. When Redis sends a copy request from the master server, the master server uses the bgsave command to generate the rbd file and transfer it to the slave server.

c. The save command is also used to generate rdb files when the debug reload command is executed.

d. When using the shutdown command to shut down the service, if aof is not enabled for persistence, bgsave will be used for persistence. At the same time, the backup parameter [nosave | save] can be added after shutdown.

Implementation principle of bgsave persistent Storage

1. Execute the bgsave command, and the Redis parent process determines whether there is an executing child process, and returns directly if it does.

two。 The parent process fork a child process (blocking occurs in the process of fork). This process can use the info stats command to view the latest_fork_usec option and the time of the last fork operation on the trumpet, in subtle units.

3. After the parent process fork, the Background saving started message prompt is returned, and the fork blocking is unblocked.

The child process out of 4.fork starts to generate temporary snapshot files based on the memory data of the parent process, and then replaces the original file. Use the lastsave command to see when the rdb was last generated, corresponding to the rdb_last_savetime option of info.

5. Send completion information to the parent process after the backup is completed. See the rbd* option under info Persistence.

Advantages and disadvantages of RDB persistence

Advantages:

1. File to achieve data snapshot, full backup, to facilitate data transmission. For example, we need to transfer the backup file from server A to server B and copy the rdb file directly.

two。 Files use compressed binaries and load data files when the service is restarted, which is faster than aof.

Disadvantages:

1.rbd uses encrypted binary format to store files. Due to the compatibility between different versions of Redis, rdb can not be used in other Redis versions due to version compatibility problems.

two。 The timeliness is poor, which is easy to cause the incompleteness of the data. Because rdb is not a real-time backup, when the Redis service is abnormal and memory data is lost in a certain period of time, the data in this period of time can not be recovered, so it is easy to lead to data loss.

Common ways to deal with RDB files

1. When the disk is full, you can use the following command to switch the storage disk

/ / dirName is the new storage directory name (this method is also applicable to aof format) config set dir dirName

two。 File compression processing, although CPU consumption, but reduce the volume of temporary use, at the same time do file transfer (master-slave copy) also reduce consumption.

/ / change compression to enable or disable config set rdbcompression yes | no

3.rbd backup file corruption detection. You can use the redis-check-rdb tool to detect rdb files, which are located under the / usr/local/bin/ directory by default.

[root@syncd redis-data] # / usr/local/bin/redis-check-rdb. / 6379-rdb.rdb [offset 0] Checking RDB file. / 6379-rdb.rdb [offset 26] AUX FIELD redis-ver = '5.0.3' [offset 40] AUX FIELD redis-bits ='64'[offset 52] AUX FIELD ctime = '1552061947' [offset 67] AUX FIELD used-mem = '852984' [offset 83] AUX FIELD aof-preamble ='0' [offset 85] Selecting DB ID 0 [offset Checksum OK [offset]\ o / RDB looks OK!\ o / [info] 1 keys read [info] 0 expires [info] 0 already expired

AOF persistent storage

What is AOF persistent storage

AOF persistent storage is to write the data stored by redis in the aof_buf buffer to disk in the form of logs. In short, it records the operation log of redis, downloads the commands executed by redis, and when we need data recovery, redis reexecutes the commands in the log file.

How to configure persistent storage / / change no to yes, control whether aof is enabled or not appendonly no// controls the name of aof file, and the stored directory is dir configuration item appendfilename "appendonly.aof" / / three backup strategies (only need to be enabled with one) # appendfsync always / / command to write immediately to disk appendfsync everysec / / to synchronize files per second Write to disk # appendfsync no / / to synchronize files randomly, and the synchronization operation is left to the operating system. Usually, the time is the longest 30sAOF persistent storage implementation principle.

Aof log append method to achieve persistent storage, need to go through the following four processes. Command write-> File synchronization-> File rewrite-> File reload

The 1.redis command is written, and the redis command is written to the aof_buf swap area.

two。 The data in the buffer is written to the log file according to the backup strategy.

3. When the aof file is getting larger and larger, we will rewrite the aof according to our configuration strategy, compress the file and reduce the size.

4. When redis restarts, rewrite and load the aof file to achieve the purpose of data recovery.

Command write

The main purpose of command writing is to write commands executed by the file to the log file. And the log file Xu text protocol format, the following example code is the content format stored in the aof log file.

* 3\ r\ nroom3\ r\ nset\ r\ nroom5\ r\ nhello\ r\ nroom5\ r\ nworld\ r\ n

Aof is in a text protocol format. The main reason is that according to the information, it can be made due to the following reasons.

1. The compatibility of text protocol is good. We mentioned earlier that rdb files are binary encrypted, and there may be incompatibility between different versions, which can be avoided by using a text protocol. At the same time, text protocols can also reduce many problems caused by cross-platform use.

two。 It is highly readable. Because aof writes commands to a file, we can view the contents of the command directly and modify the contents of the log file.

3. After opening aof, all files contain additional operations, directly using text protocol to reduce secondary overhead (which I don't quite understand. Because our aof saves commands, when we load it again, we will execute the commands once, which should be time-consuming when the file is large. If you do not have a good file rewriting strategy, a large number of repeated invalid command execution, for binary encrypted rdb format, there is no need to convert, which can indeed reduce secondary overhead.

File write

File write is to write the command of the aof_buf buffer to the file. There are three ways to write a file

Configuration item configuration indicates that immediately after the always command is written into the aof_buf buffer, the fsync operation of the system is synchronized to the aof file. After the fsync is completed, the thread returns the .everysec command to write to the aof_buf buffer and invokes the system's write operation every other second. After the write is completed, the thread returns the .no command to write the aof_bug buffer and invokes the system write operation. The aof file is not synchronized, and the synchronous hard disk operation is completed by the system operation. The longest time is generally 30s.

System calls write and fsync instructions:

The write operation triggers the deferred write (delayed write) mechanism. Linux provides page buffers in the kernel to improve hard disk IO performance. The write operation returns directly after writing to the system buffer. Synchronous hard disk operations depend on system scheduling mechanisms, such as full buffer page space or reaching a specific time period. Before synchronizing files, if the system goes down at this time, the data in the buffer will be lost.

Fsync forces hard disk synchronization for individual file operations (such as AOF files), and fsync will block until writing to the hard disk is completed and return, ensuring data persistence.

Analysis of file writing strategy

When configured as always, the AOF file should be synchronized for each write. On a typical SATA hard disk, Redis can only support about a few hundred TPS writes, which obviously runs counter to the high-performance characteristics of Redis.

Configuration is not recommended.

Configured as no. Because the operating system synchronizes AOF files each time the cycle is uncontrollable, and will increase the amount of data of each synchronization hard disk, although the performance is improved, but the data security can not be guaranteed.

Configured as everysec. Is the recommended synchronization strategy and the default configuration to strike a balance between performance and data security. In theory, only 1 second of data is lost in the event of a sudden system downtime.

File overload

1. Why do you want the file to reload the file?

As aof uses log append, our redis command continues to write, and the volume of aof files will continue to increase. Therefore, redis introduced an aof rewriting mechanism to reduce the size of aof files. Aof file rewriting is the process of converting data within the redis process into a write command to synchronize to a new aof file. Why the data in the redis process is converted into commands to write to the file? the data in the process here is not very clear and needs to be further studied. What I personally understand is to optimize the contents of the old aof file according to the rewriting strategy to generate a new aof file.) .

two。 What are the benefits of file reloading?

The main optimizations of file reloading are as follows. The use of file reloading can not only reduce the volume of files, but also remove some invalid operations, which can speed up the efficiency of file reloading.

a. Write some data that is not valid in the process to a new file. Such as expired keys.

b. Get rid of some invalid orders. Such as del key1.

c. Simplify the operation. For example, lpush list aformai l push list b. It can be directly simplified to lpush list a b.

3. What are the ways in which files are reloaded?

File overloading has automatic trigger mechanism and manual trigger mechanism.

Manual trigger mechanism: use the bgrewriteaof command directly. This command blocks when a child process is called fork.

Automatic trigger mechanism:

The smallest size of a file when auto-aof-rewrite-min-size:aof is rewritten. The default is 64m.

Auto-aof-rewrite-percentage: represents the ratio of the current AOF file space (aof_current_size) to the last rewritten AOF file space (aof_base_size).

Automatic trigger time = aof_current_size > auto-aof-rewrite-minsize&& (aof_current_size-aof_base_size) / aof_base_size > = auto-aof-rewritepercentage

Aof_current_size and aof_base_size can be viewed in info Persistence statistics.

4. What is the principle of file overloading implementation?

1. Execute the rewrite command to determine whether a child process exists.

If there is already a child process doing aof rewriting, the following message will be prompted.

ERR Background append only file rewriting already in progress

If there is already a child process performing the bgsave operation, the rewrite command will be delayed until the completion of the bgsave command and the following information will be returned.

Background append only file rewriting scheduled

two。 The parent process fork a child process, causing blocking during the fork child process.

3.Fork child process ends blocking and removes other new command operations. The new command still synchronizes data according to the file writing policy to ensure that the aof mechanism works correctly (figure 3.1).

4. In the process of writing, because the fork operation uses the write-time replication technology, the child process can only share the data retained in memory during the fork operation, and the new data can not be operated. The parent process is still responding to other commands during this process, so Redis uses aof to rewrite the cache to save the new data (figure 3.2).

5. The child process writes the data to the new aof file according to the rewrite rule, and there is a limit on the size of each write, which is controlled by the aof-rewrite-incremental-fsync configuration item, which defaults to 32m, so that the blocking of the hard disk caused by a single flush can be reduced.

6. After the child process completes the override, it sends information to the parent process, which updates the statistics. For more information, please see aof_* related statistics under info persistence.

7. The parent process writes data newly written to the aof file that has an aof rewrite buffer (figure 5.2).

8. Replace the new aof file with the old aof file.

In 3 and 4, it is not very well understood. What I don't understand is why the parent process writes to the old aof file in response to the new command and requires aof to rewrite the cache. It is personally understood that the parent process's strategy for writing new commands is to write the old aof according to the normal backup strategy while writing the new commands to the rewrite buffer, and write the new data to the new aof file in 5.2, so as to ensure the integrity of the data.

File overload

File reloading is to add files back to the redis service. For example, redis service restart for data recovery. Redis overloading mechanism is very perfect, the specific process is as follows.

Dealing with common problems in AOF files

1. File corruption

We may prompt the following information when loading corrupted files.

Bad file format reading the append only file: make a backup of your AOF file,then use. / redis-check-aof-fix

At this point we can use the redis-check-aof-- fix command to fix it (remember to make a backup of the file). After repair, use diff-u to compare the data and find out some of the lost data.

two。 File loading is incomplete

This may be due to the redis service exception when the data is backed up, resulting in an incomplete backup. This exception can be compatible with redis's aof-load-truncated

Advantages and disadvantages of AOF

Advantages:

Multiple file writing (fsync) strategies.

The data is saved in real time and the data integrity is strong. Even if some data is lost, the best strategy is to lose data within one second at most.

Strong readability, because the data is stored in the text protocol format, there are commands to view the operation directly, and you can also rewrite the commands manually.

Disadvantages:

The file size is too large and the loading speed is slower than rbd. Because aof records the log of redis operations, some invalid and simplified operations will also be recorded, resulting in aof files being too large. However, this method can be optimized by file rewriting strategy.

Choose AOF or RDB for data persistence

1. According to different situations, it is recommended to use a combination of two ways.

two。 Aof method is adopted for those with high requirements of data security and integrity.

3. Rdb can be used for less important data.

4. For full data backup, rdb can be used to facilitate data backup.

The original text is transferred from the official Wechat account: prodigal son programming goes in all directions.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.