How to analyze the highly available schemes in redis 07/11 Update SLTechnology News&Howtos

How to analyze the highly available schemes in redis

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

Today, the editor will show you how to analyze the highly available solutions in redis. The knowledge points in the article are introduced in great detail. Friends who feel helpful can browse the content of the article with the editor, hoping to help more friends who want to solve this problem to find the answer to the problem. Follow the editor to learn more about "how to analyze the highly available solutions in redis".

Master-slave replication

Users can use the SLAVEOF command or configuration to have one server replicate another. The replicated server is called the master server, and the replicated server is called the slave server. This way you can increase the key value on the master server and read it on the slave server at the same time. [related recommendation: Redis video tutorial]

The process of replication is divided into two steps: synchronization and command propagation.

Synchronization

Synchronization updates the database state of the slave server to the current database state of the master server.

When the client sends the SLAVEOF command to the slave server, the slave server issues the SYNC command to the master server to synchronize, as follows:

The SYNC command is issued from the server to the master server.

The primary server that receives the SYNC command executes the BGSAVE command, generates an RDB file in the background, and records all write commands executed from now on with a buffer.

After the execution of the BGSAVE command of the master server, the master server sends the RDB file generated by BGSAVE to the slave server, receives and loads the RDB file from the server, and updates the database status of the slave server to the database status of the master server when the BGSAVE command is executed.

The master server sends all write commands of the buffer to the slave server, and the slave server executes these write commands to update the database state to the current database state of the master server.

Command propagation

After the completion of the synchronization operation, the database state of the master server and the slave server is consistent, but after the master server receives the client write command, the data inconsistency occurs between the master and slave databases, and the database consistency is achieved through command propagation.

Optimization of PSYNC synchronization

The synchronization before 2.8 is full synchronization every time, but if you just disconnect from the server for a while, in fact, you don't have to start synchronization from scratch, you just need to synchronize the data that will be disconnected. So version 2.8 began to use PSYNC instead of the SYNC command.

PSYNC is divided into two cases: full synchronization and partial synchronization. Full synchronization is to deal with the state of the first synchronization, while partial synchronization is to deal with disconnection and reconnection.

Realization of partial synchronization

Partial synchronization mainly uses the following three parts:

Replication offset of master server and replication offset of slave server

Replication backlog buffer for the primary server

Running ID of the server

Copy offset

Replication offset of the master server: every time the master server propagates N bytes of data to the slave server, the master server will transfer its own replication offset + N slave server replication offset: every time the slave server receives N bytes of data propagated by the master server, it will have its own replication offset + N if the master and slave servers are in a consistent state, then their offsets are always the same, if the offsets are not equal. Then it means they are in an inconsistent state.

Copy backlog buffer

Copy a fixed-length FIFO queue maintained by the master server with a default size of 1MB. When the maximum length is reached, the first to join the queue will be popped to make room for the new element.

When the redis command propagates, it is sent not only to the slave server, but also to the copy backlog buffer.

When the slave server reconnects to the master server, the slave server sends its replication offset offset to the master server through the PSYNC command, and the master server decides to use partial or full synchronization based on the replication offset. If the data after the offset offset is still copying the backlog of buffers, partial synchronization is used, while full synchronization is used.

The book does not say how to judge. I guess it should be the master replication offset minus the slave replication offset. If it is greater than 1MB, it means that some data is not in the buffer overload area. )

Running ID of the server

When the server starts, it generates a 40-bit random character to run ID as the server.

When the slave server replicates the master server for the first time, the master server passes its own running ID to the slave server, and the slave server saves the running ID. When the server is disconnected and reconnected, the saved running ID will be sent. If the running ID saved from the slave server is the same as the running ID of the current master server, then partial synchronization will be attempted, and full synchronization will be performed if different.

The overall process of PSYNC

Heartbeat detection

During the command propagation phase, the slave server sends commands to the master server at a frequency of once per second by default:

REPLICONF ACK

Where replication_offset is the current replication offset from the server. Sending REPLICONF ACK commands has three functions for master and slave servers:

Detect the network connection status of the master-slave server.

Auxiliary implementation of the min-slaves option.

The detection command is missing.

Detect the network connection status of the master-slave server

The master and slave server can check whether the network connection between them is normal by sending and receiving REPLICONF ACK commands: if the master server does not receive a REPLICONF ACK command from the slave server for more than a second, then the master server knows that there is a problem between the master server and the slave server.

Auxiliary implementation of min-slaves options

The min-slaves-to-write and min-slaves-max-lag options of redis prevent the master and slave servers from executing write commands in an insecure situation.

Min-slaves-to-write 3min-slaves-max-lag 10

If configured as above, it means that if the number of slave servers is less than 3, or if the delay of all three slave servers is greater than or equal to 10 seconds, then the master server will refuse to execute the write command.

Detection command is missing

If the write command propagated by the master server to the slave server is lost halfway due to a network failure, when the slave server sends the REPLICONF ACK command to the master server, the master server will find that the current replication offset of the slave server is less than its own offset, so the master server can find the missing data from the slave server in the replication buffer according to the replication offset of the slave server. This data rewrite is sent to the slave server.

Master-slave replication summary

In fact, master-slave replication means backing up more data, because even if RDB and AOF are persisted, the whole machine on the master server may be dead, and master-slave replication can deploy the master-slave server on two different machines, so that even if the master server machine dies, you can manually switch to the slave server to continue service.

Sentinel

Although the master and slave can backup the data, when the master server hangs up, it needs to manually switch from the slave server to the master server. Sentinel can automatically switch the slave server to the master server when the master server dies.

The sentinel system can monitor all master and slave servers, assuming that server1 is now offline. When the offline duration of the server1 exceeds the limit set by the user, the sentinel system will fail over the server1:

First, the sentinel system selects one of the slave servers under server1 and upgrades the selected slave server to the new master server.

After that, the sentinel system sends new replication commands to all slave servers under server1, making them slaves to the new master server. When all the slave servers replicate the new master server, the failover operation completes.

In addition, sentinel monitors the offline server1 and sets it as a slave to the new master server when it comes back online.

Initialize sentinel statu

Struct sentinelState {char myid [config _ RUN_ID_SIZE+1]; / / current era, used to implement the failover uint64_t current_epoch; / / the key to all the master servers / / dictionaries monitored by this sentinel is the name of the master server / / the value of the dictionary is the pointer to the sentinelRedisInstance structure dict * masters; / / whether the TILT mode int tilt is entered / / number of scripts currently being executed int running_scripts; / / time to enter TILT mode mstime_t tilt_start_time; / / time to last execute time processor mstime_t previous_time; / / A fifo queue containing all user scripts that need to be executed list * scripts_queue Char * announce_ip; int announce_port; unsigned long simfailure_flags; int deny_scripts_reconfig; char * sentinel_auth_pass; char * sentinel_auth_user; int resolve_hostnames; int announce_hostnames;} sentinel

Initialize the masters property of the sentinel state

Masters records the relevant information of all the master servers monitored by sentinel, where the key of the dictionary is the name of the monitored server, and the value is that the monitored server corresponds to the sentinelRedisInstance structure. An instance of sentinelRedisInstance that is monitored by a sentinel server can be a master server, a slave server, or another sentinel instance.

Typedef struct sentinelRedisInstance {/ / identity value, record the type of instance, and the current status of the instance int flags; / / instance name / / master server name is set in the configuration file / / slave server and sentinel names are automatically set by sentinel in the format ip:port char * name; / / run id char * runid / / configuration era, which is used to implement the address sentinelAddr * addr; / * Master host of the failover uint64_t config_epoch; / / instance. * / after how many milliseconds the instance does not respond, it is judged as subjective downline mstime_t down_after_period; / / the number of support votes required for this instance to be objectively offline unsigned int quorum; / / to perform failover, and the number of slave servers can be synchronized to the new master server at the same time int parallel_syncs / / the maximum time limit for refreshing the failover status mstime_t failover_timeout; / / except for yourself, the sentinel / / key that monitors the master server is the name of sentinel and the format is ip:port / / value is the instance structure of sentinel corresponding to the key dict * sentinels; / /.} sentinelRedisInstance

Create a network connection to the primary server

The final step in initializing sentinel is to create a network connection to the monitored primary server, which creates two connections to the primary server.

Command connection: specially sends commands to the master server and receives command replies.

Subscription connection: dedicated to subscribing to the main server's _ sentinel_:hello channel.

Get master server information

By default, sentinel sends INFO commands to the monitored master server through a command connection every 10 seconds, and replies to obtain the current information of the master server. The following information can be obtained by replying.

Run_id of the primary server

The information of all slave servers under the master server.

According to this information, you can update the name dictionary and runid field under sentinelRedisInstance.

Get information from the server

Sentinel also creates command connections and subscription connections to the slave server.

By default, sentinel sends INFO commands to the slave server through a command connection every 10 seconds, and replies to obtain the current information from the server. The reply is as follows:

Run ID from the server

Role from the role of the server

Ip and port of the primary server

Connection status of the primary server master_link_status

Slave_priority from the priority of the server

Copy offset variables from the server

According to the reply message from info, sentinel can update the instance structure of the slave server.

Send information to the subscription connection of the master server and slave server

By default, sentinel sends commands to the monitored master and slave servers every 2 seconds.

Ip address of s_ip:sentinel

Port number of s_port:sentinel

Running id of s_runid:sentinel

S_epoch:sentinel 's current configuration era

M_name: name of the primary server

M_ip: the ip address of the primary server

M_port: the port number of the primary server

M_epoch: the current configuration era of the primary server

Sending messages to the sentinel_:hello channel will also be heard by other sentinel monitoring on the same server (including yourself).

Create command connections to other sentinel

Sentinel creates command connections to each other. Monitoring multiple sentinel of the same command will form an interconnected network.

No subscription connection is created between sentinel.

Detect subjective offline status

Once a second, sentinel sends ping commands to all instances with which it has created command connections (master server, slave server, other sentinel), and determines whether the instance is online by its reply.

Valid reply: the instance returns one of + PONG,-LOADING,-MASTERDOWN.

Invalid reply: reply other than the above three responses, or no reply within the specified period of time.

An instance continuously returns invalid replies to sentinel within down-after-milliseconds milliseconds. Then sentinel will modify the instance structure corresponding to this instance, open the SRI_S_DOWN flag in the flags attribute of the structure, and indicate that the instance has entered the subjective offline state. (down-after-milliseconds can be configured in the configuration file of sentinel)

Detect the objective offline status

When sentinel determines that a primary server is subjectively offline, in order to confirm whether the primary server is really offline, it will also ask other sentinel who are also monitoring the primary server to see if other sentinel also think that the primary server has gone offline. If the number exceeds a certain number, the primary server will be judged as an objective offline.

Ask other sentinel if they agree to the server going offline.

SENTINEL is-master-down-by-addr

Query through the SENTINEL is-master-down-by-addr command, and the meaning of the parameters is as shown below:

Receive SENTINEL is-master-down-by-addr command

When the other sentinel receives the SENTINEL is-master-down-by-addr command, it checks whether the primary server is offline based on the ip and port of the primary server, and then returns a reply from the Multi Bulk with three parameters.

Sentinel calculates the number of other sentinel agreed that the master server has been offline. After reaching the configured number, the SRI_O_DOWN ID of the flags attribute of the master server is turned on, indicating that the master server has entered an objective offline state.

Election leader sentinel

When a primary server is judged to be objectively offline, each sentinel monitoring the offline primary server will negotiate the election of a new leader sentinel, which will perform the failover operation.

After confirming that the primary server has entered the objective offline state, the SENTINEL is-master-down-by-addr command is sent again to elect the leader sentinel.

Election rules

Each of multiple online sentinel monitoring the same primary server can become the lead sentinel.

After each lead sentinel election, the configuration era (configuration epoch) value of all sentinel increments itself, regardless of whether the election is successful or not. (the configuration era is actually a counter)

In a configuration era, all sentinel have the opportunity to set a sentinel to a local sentinel, which cannot be changed once set in this configuration era.

All sentinel that finds that the master server is offline objectively will ask other sentinel to set themselves as local leader sentinel, that is, they will send SENTINEL is-master-down-by-addr command to try to get other sentinel to set themselves as local leader sentinel.

When one sentinel sends a SENTINEL is-master-down-by-addr command to another sentinel, if the value of the runid parameter is not *, but the runid of the source sentinel, it means that the target sentinel sets itself to the lead sentinel.

Sentinel sets the local header on a first-come-first-served basis. When the first one is set to local header sentinel, all other requests are rejected.

After receiving a SENTINEL is-master-down-by-addr command, the destination sentinel returns a command reply to the source sentinel. The leader_runid parameter and the leader_epoch parameter in the reply record the runid and configuration era of the local leader sentinel of the target sentinel, respectively.

After the source sentinel receives the reply, it will compare whether the returned configuration era is the same as its own configuration era. If so, continue to compare whether the runid of the returned local leader sentinel is the same as its own runid. If it is consistent, it means that the target sentinel has set itself to the local leader sentinel.

If a sentinel is set to a local lead sentinel by more than half of the sentinel, then it becomes the lead sentinel.

Leading sentinel needs more than half of the support, and can only be set once in each configuration era, so there will be only one lead sentinel in a configuration era.

If each sentinel is elected as the leader sentinel within a certain period of time (no one does not get more than half of the votes), then each sentinel will be re-elected after a certain period of time until the lead sentinel is elected.

Fail-over

Failover consists of the following three steps:

Among all the slave servers under the offline master server, select one to convert from the slave server to the master server.

Change all slave servers under the offline master server to copy the new master server.

The master server that has been offline is set as the slave server of the new server, and when the old master server comes online again, it becomes the slave server of the new master server.

Select a new primary server

Among all the slave servers under the offline master server, select a slave server and send a SLAVEOF no one command to the slave server to convert the slave server into the master server.

Rules for selecting a new primary server

The lead sentinel will save all the slave servers of the offline master server to a list, and then filter the list to select the new master server.

Delete all slave servers in the list that are offline or disconnected.

Delete all slave servers in the list that have not replied to the INFO command of the lead sentinel in the last five seconds

Delete all servers that are disconnected from offline servers for more than dwon-after-milliseconds * 10 milliseconds

Then the remaining slave servers in the list are sorted according to the priority of the slave servers, and the servers with the highest priority are selected.

If there are multiple slaves with the same highest priority, then sort according to the replication offset and select the slave server with the largest offset (the maximum replication offset also represents the latest data it stores).

If the replication offset is the same, sort it according to runid and select the slave server with the lowest runid

After sending the slaveof no one command, the leader sentinel sends the info command to the upgraded slave server once a second (usually every 10 seconds). If the reply role returned changes from the original slave to master, then the lead sentinel knows that the slave server has been upgraded to the master server.

Modify the replication destination from the server

Use the SLAVEOF command to copy the new master server from the slave server. When sentinel detects that the old master server is back online, it also sends a SLAVEOF command to make it a slave to the new master server.

Sentinel summary

Sentinel is actually a monitoring system, and after sentinel detects that the master server is offline, a leading sentinel can be selected through the election mechanism, and then the leading sentinel will switch a slave server under the offline master server to a master server instead of manually switching.

Cluster

Although Sentinel mode automatically switches between master and slave, there is still only one master server for write operations (of course, Sentinel mode can also monitor multiple master servers, but the client needs to achieve load balancing on its own). Officials have also provided their own way to implement the cluster.

Node

Each redis service instance is a node, and multiple connected nodes form a cluster.

CLUSTER MEET

Sending a CLUSTER MEET command to another node allows the node to shake hands with the target node, which can be added to the current cluster if the handshake is successful.

Start the node

When the redis server starts, it will decide whether to turn on the server cluster mode based on whether the cluster-enabled configuration option is yes.

Cluster data structure

Each node uses a clusterNode structure to record its own status, and a corresponding clusterNode structure is created for the other nodes in the cluster to record the status of the other nodes.

Typedef struct clusterNode {/ / the time the node was created mstime_t ctime; / / the name of the node char name [cluster _ node]; / / Node identification / / various identification values record the role of the node (such as master or slave node) / / and the current status of the node (online or offline) int flags / / the current configuration era of the node, which is used to implement the ip address of the failover uint64_t configEpoch; / / node char [net _ IP_STR_LEN]; / / to save the information about establishing the connected node clusterLink * link; list * fail_reports; / /.} clusterNode

ClusterLink holds the relevant information needed to connect nodes.

Typedef struct clusterLink {/ /... / the creation time of the connection mstime_t ctime; / / the node associated with this connection is null struct clusterNode * node; / /.} clusterLink.

Each node also holds a clusterState structure, which records the current state of the cluster from the perspective of the current node, such as whether the cluster is online or offline, how many nodes the cluster contains, and so on.

Typedef struct clusterState {/ / pointer to the current node clusterNode clusterNode * myself; / / the current configuration era of the cluster, which is used to implement the current state of the failover uint64_t currentEpoch; / / cluster. The number of nodes in the online or offline int state; / / cluster that handles at least one slot int size / / the list of cluster nodes (including myself nodes) / / the key of the dictionary is the name of the node, and the value of the dictionary is the clusterNode structure dict * nodes;} clusterState corresponding to the node.

Implementation of CLUSTER MEET command

CLUSTER MEET

Node A creates a clusterNode structure for node B and adds that structure to its own clusterState.nodes dictionary.

Node A then sends a MEET message to node B based on the IP address and port number given by the CLUSTER MEET command.

If all goes well, Node B will receive the MEET message from Node A, and Node B will create a clusterNode structure for Node An and add that structure to its own clusterState.nodes dictionary.

After that, node B will return a PONG message to node A.

If all goes well, node A will receive the PONG message returned by node B. through this PONG message, node A can know that node B has successfully received the MEET message it sent.

After that, node A will return a PING message to node B.

If all goes well, Node B will receive the PING message returned by Node A. Node B knows that Node A has successfully received the PONG message returned by Node A, and the handshake is complete.

Slot assignment

The entire database of the cluster is divided into 16384 slots, each key belongs to one of the 16384 slots, and each node in the cluster handles 0 or 16384 slots. When all slots have nodes in processing, the cluster is online, otherwise it is offline.

CLUSTER ADDSLOTS

CLUSTER ADDSLOTS...

You can assign specified slots to the current node through the CLUSTER ADDSLOTS command. For example, CLUSTER ADDSLOTS 0 1 2 3 4 can assign slots from 0 to 4 to the current node.

Record the slot assignment information of the node

The slots and numslot attributes of the clusterNode structure record which slots the node is responsible for handling:

Typedef struct clusterNode {unsigned char slotts [cluster _ SLOTS/8]; int numslots; /...} clusterNode

Slots: is a binary array containing a total of 16384 binary bits. When the binary value is 1, it means that the node is responsible for handling the slot, and if it is 0, the node does not handle the slot numslots:numslots attribute, then record the number of slots handled by the node, that is, the number of binary bits with a value of 1 in slots.

Propagate slot assignment information for nodes

In addition to recording the slots they are responsible for in clusterNode, the node also sends the slots array to other nodes in the cluster to tell the other nodes which slots they are currently responsible for.

Typedef struct clusterState {clusterNode * slots [cluster _ SLOTS];} clusterState

Slots contains 16384 items, each of which is a pointer to clusterNode, indicating that it is assigned to that node, or if it is not assigned to any node, then the pointer points to NULL.

Implementation of CLUSTER ADDSLOTS command

Execute commands in a cluster

When the client sends a database-related command to the node, the node receiving the command calculates which slot the database key to be processed by the command belongs to and checks whether the slot is assigned to it.

If assigned to itself, the node executes the command directly. If not, the node returns a MOCED error to the client, directs the client to the correct node, and sends the executed command again.

The calculation key belongs to that slot.

CRC16 (key) calculates the checksum of the CRC16 of the key key, and & 16383 takes the remainder and calculates the integer between 0 and 16383 as the slot number of the key.

Determine whether the slot is handled by the current node

After calculating the slot number I to which the key belongs, the node can determine whether the slot number is handled by itself.

If clusterState.slots [I] equals if clusterState.myself, then the node in charge of itself can execute the command directly.

If it is not equal, then you can get clusterState.slots [I] to point to the ip and port of if clusterNode, and return a MOVED error to the client, directing the client to the node responsible for the slot.

MOVED errors are not printed in cluster mode, but are automatically redirected directly.

Re-slice

Redis cluster reassignment allows any number of slots that have been assigned to one node to be assigned to another node, and the key-value pairs to which the relevant slots belong are moved from the source node to the target node.

The resharding operation is carried out online. In the process of resharding, the cluster does not have to go offline, and both the source node and the target node can continue to process command requests. The resharding of the redis cluster is performed by redis-trib. The steps for resharding are as follows:

Redis-trib sends a CLUSTER SETSLOT IMPORTING command to the target node to prepare the target node to import the key-value pair of slot slot from the source node.

Redis-trib sends a CLUSTER SETSLOT MIGRTING command to the source node to prepare the source node to migrate the key-value pairs belonging to the slot slot to the target node.

Redis-trib sends a CLUSTER GETKEYSINSLOT command to the source node to get the key names of up to count key-value pairs that belong to the slot.

For each key name obtained in step 3, redis-trib sends a MIGRTING 0 command to the source node to migrate the selected key-value pair from the source node to the target node.

Repeat steps 3 and 4 until all key-value pairs that belong to the slot slot saved by the source node are migrated to the target node.

Redis-trib sends CLUSTER SETSLOT NODE commands to any node in the cluster, assigning slots to the target node. This information is eventually sent to the entire cluster by message.

CLUSTER SETSLOT IMPORTING command implementation

Typedef struct clusterState {/ /... ClusterNode * importing_slots_ from [cluster _ SLOTS];} clusterState

Importing_slots_from records the slots that the current node is importing from other nodes. Importing_slots_ from [I] is not null, then it points to the clusterNode structure represented by the CLUSTER SETSLOT IMPORTING command.

CLUSTER SETSLOT MIGRTING command implementation

Typedef struct clusterState {/ /... ClusterNode * migrating_slots_ to [cluster _ SLOTS];} clusterState

Migrating_slots_to records the slots that the current node is migrating to other nodes. If migrating_slots_ to [I] is not null, it points to the clusterNode structure represented by the target node.

ASK error

During the re-slicing, in the process of migrating the slot from the source node to the target node, part of the key-value pair that may belong to the slot is stored in the source node, while the other part is saved in the target node.

The client sends a command related to the database key to the source node, and it happens that the slot is being migrated.

The source node now looks for the specified key in its own database and, if found, executes it directly.

If not, the node checks migrating_slots_ to [I] to see if the key is migrating, and if it is migrating, it returns an ask error that directs the client to the target node.

ASKING

When the client receives an ask error, it executes the ASKING command before sending the command to the target node. The ASKING command opens the REDIS_ASKING identity of the client that sent the command. Generally speaking, if the key sent by the client does not belong to its own responsibility, it will return a MOVED error (the slot only migrates part of the slot, so the slot does not belong to the target node), but it will also check importing_slots_from [I]. If it shows that the node is importing slot I, and the client that sent the command has the REDIS_ASKING ID, then it will make an exception to execute the command.

Cluster failover

The failover effect of the cluster is similar to that of the Sentinel mode, which also upgrades the slave node to the master node. When the old master node comes online again, it will become the slave node of the new master node.

Fault detection

Each node in the cluster will periodically send PING messages to other nodes in the cluster to check whether the other node is online. If the PONG message is not received within a specified period of time, the node will be marked as suspected offline. Find the clusterNode structure of the node in the clusterState.nodes dictionary and change the flags attribute to the REDIS_NODE_PFAIL identity.

Each node in the cluster will send messages to exchange the status of each node in the cluster. For example, the master node A knows that the master node B thinks that the master node C has entered the suspected offline state, and the master node A will find the clusterNode structure of node C in the clusterState.nodes dictionary and add the referral report of the master node B to the fail_reports list of the clusterNode structure.

Each referral report is represented by a clusterNodeFailReport structure

Typedef struct clusterNodeFailReport {struct clusterNode * node; / / the last time mstime_t time;} clusterNodeFailReport received the offline report

If more than half of the master nodes responsible for processing slots in a cluster report a master node X as suspected offline. Then the primary node X will be marked offline. A node that marks primary node X as offline broadcasts a FAIL message about primary node X to the cluster. All nodes that receive this FAIL message mark the primary node X as offline.

Fail-over

When a slave node finds that the master node it is replicating has entered the offline state, the slave node will begin to fail over the offline master node.

Copy all the slave nodes of the offline master node, and one master node will be selected.

The selected slave node executes the SLAVEOF no one command to become the new master node.

The new master node removes all slot assignments to the offline master node and assigns them all to itself.

The new master node broadcasts a PONG message to the cluster, which lets other nodes in the cluster know immediately that the node has changed from a slave node to a master node. This master node has taken over the slot handled by the offline node.

The new master node begins to receive command requests related to the slot it is responsible for processing, and the failover is completed.

Elect a new master node

The new master node is elected.

The configuration era of the cluster is a self-increasing counter with an initial value of 0.

When a node in the cluster starts a failover operation, the configuration era of the cluster is incremented by 1.

For each configuration era, each master node responsible for the processing slot in the cluster has a chance to vote, and the first slave node that asks for a vote from the master node will get the vote from the master node.

When the slave node finds that the master node it is replicating has entered the offline state, the slave node will broadcast a CLUSTERMSG_TYPE_FAILOVER_AUTH_REQUEST message to the cluster, asking all the master nodes that receive the message and have the right to vote to vote to the slave node.

If a master node has the right to vote (it is in charge of processing slots) and the master node has not yet voted for another slave node, the master node will return a CLUSTERMSG_TYPE_FAILOVER_AUTH_ACK message to the slave node requesting the vote, indicating that the master node supports the slave node as the new master node.

Each slave node participating in the election will receive CLUSTERMSG_TYPE_FAILOVER_AUTH_ACK messages and count how many master nodes support it according to how many such messages they have received.

If there are N voting master nodes in the cluster, then when a slave node collects more than or equal to N / 2 + l support votes, the slave node will be elected as the new master node.

Because in each configuration era, each master node with voting rights can only vote once, if there are N master nodes to vote, then there will be only one slave node with more than or equal to N / 2 + l support votes. this ensures that there will be only one new master node.

If enough votes are not collected from the nodes in a configuration era, the cluster enters a new configuration era and elects again until a new master node is selected.

The process of electing the master node is very similar to that of electing the leader sentinel.

Data loss

Loss of master-slave replication data

The master-slave replication is performed asynchronously. It is possible that part of the data in master is lost before it can be synchronized to the slave database, and then the master is dead.

Cerebral fissure

A brain fissure means that a master machine is suddenly out of the normal network and cannot connect to other slave machines, but the master is actually still running. At this point, the Sentinel may think that master is down, then start the election and switch other slave to master. At this time, there will be two master in the cluster, which is called brain fissure.

At this point, although a slave has been switched to master, the client may not have time to switch to the new master and continue to write data to the old master.

When master is restored again, it will be attached to the new master as a slave, and its own data will be emptied and re-copied from the new master, resulting in data loss.

Configuration to reduce data loss

Min-slaves-to-writ 1min-slaves-max-lag 10

The above configuration indicates that if at least one slave server does not give itself an ack message for more than 10 seconds, the master will no longer execute the write request.

Inconsistency between master and slave data

When the slave database delays the execution of synchronization commands due to network reasons or high complexity command blocking, which leads to data synchronization delay, resulting in inconsistency between master and slave databases.

Thank you for your reading. The above is the whole content of "how to analyze the highly available solutions in redis". Let's get started. I believe that the editor will certainly bring you better quality articles. Thank you for your support to the website!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.