What is the underlying principle of Redis master-slave replication? 07/06 Update SLTechnology News&Howtos

What is the underlying principle of Redis master-slave replication?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

Redis master-slave replication of the underlying principle is what, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can gain something.

Principle of replication

1. Replication process

The steps of the replication process are as follows:

1. Execute slaveof command from node

2. The slave node only saves the information of the master node in the slaveof command, and does not initiate replication immediately.

3. The information of the master node is found from the timing task within the node, and the master node is connected with socket.

4. After the connection is successfully established, send the ping command, hoping to get a response from the pong command, otherwise the connection will be reconnected

5. If the primary node has permissions set, permission verification is required; if the verification fails, the replication is terminated.

6. After the permission verification is passed, data synchronization is carried out, which is the longest operation, and the master node will send all the data to the slave node.

7. When the master node synchronizes the current data to the slave node, the replication establishment process is completed. Next, the master node will continuously send write commands to the slave node to ensure the consistency of the master-slave data.

two。 Synchronization between data

One of the steps in the replication process mentioned above is "synchronizing datasets", which is now referred to as "synchronization between data".

There are 2 commands for redis synchronization:

Sync and psync, the former is the synchronization command before redis 2.8, and the latter is the command of redis 2.8 to optimize the new design of sync. We will focus on the psync command of2.8.

The psync command requires three components to support:

1. Replication offset of master and slave nodes respectively

2. Copy the backlog buffer on the primary node.

3. The master node runs ID

The master and slave nodes copy the offset separately:

1. The master and slave nodes participating in the replication will maintain their own replication offset.

2. After processing the write command, the master node will accumulate the byte length of the command, and the statistical information will be found in the masterreploffset indicator in info replication.

3. The slave node reports its own replication offset to the master node every second, so the master node also saves the replication offset of the slave node.

4. After receiving the command sent by the master node, the slave node will also accumulate its own offset, and the statistical information is in info replication.

5. By comparing the replication offset of master-slave nodes, we can judge whether the data of master-slave nodes are consistent or not.

The primary node copies the backlog buffer:

1. The copy backlog buffer is a fixed-length first-in-first-out queue stored in the primary node. The default size 1MB.

2. This queue is created in the slave connection. When the master node responds to the write command, it will not only send the command to the slave node, but also write to the copy buffer.

3. His role is to remedy the data lost in some replication and replication commands. You can see the relevant information through info replication.

The primary node runs ID:

1. When each redis starts, a 40-bit running ID is generated.

2. The main function of running ID is to identify Redis nodes. If ip+port is used, it is not safe for the slave node to copy based on the offset if the master node restarts and modifies the RDB/AOF data. Therefore, when running id changes, the slave node will make a full copy. In other words, when redis is restarted, the slave node will replicate in full by default.

What if I don't change to run ID when I restart?

1. You can reload RDB and keep running ID unchanged through the debug reload command. In order to effectively avoid unnecessary full replication.

2. His disadvantage is that the debug reload command will block the main thread of the current Redis node, so it needs to be used cautiously for master nodes with large amounts of data or nodes that cannot tolerate blocking. Generally, this problem can be solved through the failover mechanism.

How the psync command is used:

The command format is psync {runId} {offset}

RunId: the running id of the master node copied from the slave node

Offset: the current data offset replicated from the node

Psync execution process:

Process description: send a psync command to the master node from the slave node, and runId is the ID of the target master node. If there is no default,-1 offset is the copy offset saved by the slave node. If it is the first replication, it is-1.

The master node returns the result based on the runid and offset decisions:

1. If you reply + FULLRESYNC {runId} {offset}, then the slave node will trigger the full replication process.

2. If you reply + CONTINUE, the slave node will trigger partial replication.

3. If you reply to + ERR, the master node does not support the 2.8psync command and will use sync to perform a full copy.

At this point, the synchronization between the data is almost done, the length is still relatively long. It mainly focuses on the introduction of psync commands.

3. Full copy

Full replication is the earliest replication mode supported by Redis, and it is also a stage that must be experienced when the master and slave establish replication for the first time. The commands that trigger full replication are sync and psync. As mentioned earlier, the watershed version of these two commands is 2.8. Before using sync, only full synchronization can be performed. After 2.8, both full synchronization and partial synchronization are supported.

The process is as follows:

Introduce the steps in the above figure:

1. Send psync command (spync? -1)

2. The master node returns FULLRESYNC according to the command

3. Record the master node ID and offset from the slave node

4. Bgsave the master node and save the RDB locally

5. The master node sends the RBD file to the slave node

6. Receive the RDB file from the node and load it into memory

7. During the period of receiving data from the slave node, the master node saves the new data to the "copy client buffer". When the slave node loads the RDB, it sends it to the slave node. (if you spend too much time from the node, it will cause a buffer overflow and the full synchronization will fail.)

8. Load the RDB file after emptying the data from the node. If the RDB file is very large, this step is still time-consuming. If the client accesses it at this time, it will lead to data inconsistency. You can use the configuration slave-server-stale-data to close it.

9. After successfully loading RBD from the slave node, if AOF is enabled, bgrewriteaof will be done immediately.

The bold part above is the time-consuming part of the whole synchronization.

Note:

1. If the RDB file is larger than 6GB and is a gigabit Nic, the default timeout mechanism of Redis (60 seconds) will cause full copy failure. You can solve this problem by adjusting the repl-timeout parameter.

2. Although Redis supports diskless replication, that is, it is directly sent to the slave node through the network, its function is not perfect and should be used cautiously in the production environment.

4. Partial replication

When the slave node is copying the master node, if a network flash and other exceptions occur, the slave node will ask the master node to reissue the lost command data, and the master node only needs to send the data of the replication buffer to the slave node to ensure the consistency of the data. compared with full replication, the cost is much lower.

The steps are as follows:

1. When the network interruption occurs in the slave node, when the repl-timeout time is exceeded, the master node will interrupt the replication connection.

2. The master node writes the requested data to the "copy backlog buffer", which defaults to 1MB.

3. When the slave node is restored and the master node is reconnected, the slave node will send offset and master node id to the master node.

4. After the primary node is verified, if the data after the offset is in the buffer, send a cuntinue response-- indicating that partial replication can be carried out.

5. The master node sends the data of the buffer to the slave node to ensure the normal state of master-slave replication.

5. heartbeat

After the master and slave nodes establish replication, they maintain a long connection and send heartbeat commands to each other.

The key mechanism of heartbeat is as follows:

1. Both Chinese and slaves have heartbeat detection mechanism, which is simulated as each other's client to communicate, and the client list command is used to view the replication-related client information. The connection status of the master node is flags = M and that of the slave node is flags = S.

2. By default, the master node sends ping commands to the slave node every 10 seconds, and the configuration repl-ping-slave-period can be modified to control the sending frequency.

3. The slave node sends the replconf ack {offset} command every other second in the master thread to report its current replication offset to the master node.

4. After receiving the replconf message, the master node judges the slave node's timeout. If it exceeds 60 seconds of repl-timeout, the slave node is determined to go offline.

Note: in order to reduce the master-slave delay, redis master-slave nodes are generally deployed in the same data center / data center in the same city to avoid heartbeat interruption caused by network partition caused by network delay.

6. Asynchronous replication

The master node is not only responsible for data reading and writing, but also responsible for synchronizing the write command to the slave node. The sending process of the write command is completed asynchronously, that is to say, the master node returns to the client immediately after processing the write command, and does not wait for the slave node to complete the replication.

The steps for asynchronous replication are simple, as follows:

1. The master node accepts the processing command

2. The master node returns the response result after processing.

3. For the modification command, it is sent asynchronously to the slave node, and the slave node executes the copied command in the main thread.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.