Introduction to Redis replication process 07/13 Update SLTechnology News&Howtos

Introduction to Redis replication process

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "introduction to Redis replication process". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

The replication function of Redis consists of two steps: synchronization (sync) and command propagation (command propagate):

Synchronization is used to update the database state of the slave server to the database state of the master server.

Command propagation is used to make the database of the master and slave server return to a consistent state when the database state of the master server is modified, resulting in inconsistencies in the database state of the master-slave server.

Synchronization

Redis uses the psync command to complete master-slave data synchronization. The synchronization process is divided into full replication and partial replication.

Full replication: generally used in the initial replication scenario, it will send all the data of the master node to the slave node and send it to the slave node at once. When the amount of data is large, it will cause a lot of overhead to the master node and the network.

Partial replication: used to deal with network loss scenarios caused by network flash breaks in master-slave replication. When the slave node is connected to the master node again, if conditions permit, the master node will reissue the lost data to the slave node. Because the reissued data is far less than the full data, the excessive overhead of full replication can be effectively avoided.

The master and slave nodes copy the offset respectively

Primary node replicates backlog buffer

The primary node is running id

Slave nodes that participate in replication maintain their own replication offsets. After processing the write command, the master node will accumulate the byte length of the command and count it in the masterreploffset index in info replication. After receiving the command sent by the master node, the slave node will also accumulate its own offset and report its own replication offset to the master node every second. By comparing the replication offset of master-slave nodes, we can judge whether the data of master-slave nodes are consistent or not.

The copy backlog buffer is a fixed-length queue stored in the master node. The default size is 1MB, which is created when the master node has a connected slave node. When the master node responds to the write command, it not only sends the command to the slave node, but also writes it to the copy backlog buffer.

The replication backlog buffer size is limited and only the most recent replication data can be saved for data remediation when some replication and replication commands are lost.

After each Redis node starts, a 40-bit hexadecimal string is dynamically assigned as the running ID. The main function of running ID is to uniquely identify the Redis node, for example, the slave node saves the running ID of the master node to identify which master node it is replicating.

Full synchronization

1) the slave node sends the psync command for data synchronization. Since it is the first time to replicate, the slave node has no replication offset and the master node is running ID, so the command sent is PSYNC?-1.

2) the master node resolves the current full copy according to PSYNC-1, and replies + FULLRESYNC response.

3) the slave node receives the response data of the master node to save and run ID and offset offset.

4) the master node executes bgsave to save the RDB file locally. For more information about RDB, please see "Redis RDB persistence details".

5) the master node sends the RDB file to the slave node, and the slave node saves the received RDB file locally and directly as the data file of the slave node. After receiving the RDB, the slave node prints the relevant log, and you can view the amount of data sent by the master node in the log.

It should be noted that extra care should be taken for master nodes with a large amount of data, such as when the generated RDB file exceeds 6GB. If the transfer time of RDB exceeds the value configured by repl-timeout, the slave node will initiate to receive the RDB file and clean up the downloaded temporary files, causing the full copy to fail.

6) during the period from the master node starts saving RDB snapshot to the completion of receiving from slave node, the master node still responds to the read command, so the master node will save the write command during this period in the copy client buffer. When the slave node loads the RDB file, the master node sends the data in the buffer to the slave node to ensure data consistency between the master and slave.

If the primary node takes too long to create and transmit the RDB, a primary node replication client buffer overflow may occur. The default configuration is client-output-buffer-limit slave 256MB 64MB 60. If the buffer consumption is continuously greater than 64MB or directly exceeds 256MB within 60s, the master node will directly close the replication client connection, resulting in full synchronization failure.

7) after receiving all the data sent by the master node from the node, the old data will be emptied. This step corresponds to the following log.

8) the RDB file is loaded after the node clears the data. For the enlarged RDB file, this step is still time-consuming. The total time spent loading RDB can be determined by calculating the time difference between logs.

9) the master server that receives the SYNC command executes the BGSAVE command, generates an RDB file in the background, and uses a buffer to record all write commands executed from now on.

10) when the BGSAVE command of the master server is executed, the master server will send the RDB file generated by the GBSAVE command to the slave server, receive and load the RDB file from the server, and update its database status to the database status of the master server when it executes the BGSAVE command.

11) the master server sends all write commands recorded in the buffer to the slave server, executes these write commands from the slave server, and updates its database state to the current state of the master server database.

By analyzing all the processes of full replication, readers will find that full replication is a very time-consuming and laborious operation.

Master node bgsave time

RDB file network transfer time

Clear data time from node

Time to load the RDB from the node

In the process of full synchronization, it will not only consume a lot of time, but also carry out multiple persistence-related operations and network data transmission, which will consume a lot of CPU, memory and network resources of the server where the master and slave nodes are located. Therefore, in addition to the first replication is the use of full synchronization can not be avoided, other scenarios should avoid full replication and adopt partial synchronization.

Partial synchronization

Partial replication is mainly an optimization measure made by Redis for the excessive overhead of full replication, which is implemented by using the psync {runId} {offset} command. When the slave node is copying the master node, if there are abnormal conditions such as network flash or command loss, the slave node will ask the master node to reissue the lost command data. if there is a copy backlog buffer of the master node, this part of the data is directly sent to the slave node, which ensures the consistency of the replication of the master and slave nodes. This part of the reissued data is generally much smaller than the full amount of data, so the cost is very small.

1) when the network between the master and slave nodes is interrupted, if the repl-timeout time is exceeded, the master node will consider the slave node to fail and interrupt the replication connection.

2) the master node still responds to the command when the master-slave connection is interrupted, but the command cannot be sent to the slave node because the replication connection is interrupted. However, there is a replication backlog buffer (repl-backlog-buffer) inside the master node, which can still save the write command data for the most recent period of time. The default maximum cache is 1MB.

3) when the master-slave node network is restored, the slave node will be connected to the master node again.

4) when the master-slave connection is restored, because the slave node previously saved its own replicated offset and the running ID of the master node. Therefore, they are sent as psync parameters to the master node, requiring a reissue copy operation.

5) after receiving the psync command, the master node first checks whether the parameter runId is consistent with itself, and if so, it indicates that the current master node was copied before. Then the master node looks up the backlog buffer of its own replication according to the parameter offset. If the data after the offset is stored in the buffer, a + CONTINUE response is sent to the slave node, indicating that partial replication can be carried out.

6) the master node sends the data in the replication backlog buffer to the slave node according to the offset to ensure that the master-slave replication enters a normal state.

Heartbeat detection

After the master and slave nodes establish replication, they maintain a long connection and send heartbeat commands to each other, as shown in the following figure.

The master-slave heartbeat judgment mechanism is as follows:

1) Master and slave nodes have heartbeat detection mechanism, and each node is simulated as the client of each other to communicate, and the client list command is used to view the replication-related client information. The connection status of master node is flags=M, and the connection status of slave node is flags=S.

2) by default, the master node sends ping commands to the slave node every 10 seconds to judge the viability and connection status of the slave node. The sending frequency can be controlled by the parameter repl-ping-slave-period.

3) the slave node sends the replconf ack {offset} command every 1 second in the master thread to report its current replication offset to the master node.

The replconf command can not only monitor the network status of master-slave nodes in real time, but also report the replication offset of slave nodes. The master node checks whether the replication data is lost according to the offset uploaded from the slave node, and if the slave node data is lost, it pulls the lost data from the replication cache of the master node and sends it to the slave node.

Asynchronous replication and command propagation

The master node is not only responsible for reading and writing data, but also responsible for synchronizing write commands to the slave node. The sending process of the write command is completed asynchronously, that is, the master node directly returns to the client after processing the write command, and does not wait for the slave node to complete the replication.

This asynchronous process is handled by command propagation, which not only sends write commands to all slave servers, but also queues write commands into the replication backlog buffer.

This is the end of the introduction to the Redis replication process. Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.