How to understand Redis Sentinel Technology 10/30 Update SLTechnology News&Howtos

How to understand Redis Sentinel Technology

2025-10-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article is about how to understand Redis Sentinel technology, the editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

The Sentinel, which is based on Redis master-slave replication, will be introduced by the editor. Its main function is to solve the automation problem of master node failure recovery and further improve the high availability of the system.

Note: the content is based on Redis version 3.0.

I. function and structure

1. Action

Before introducing Sentinel, let's review the technologies related to the high availability of Redis from a macro point of view. They include persistence, replication, sentinels and clusters, and the main roles and problems to be solved are:

Persistence: persistence is the simplest highly available method (sometimes not even classified as a highly available means). Its main function is to back up the data, that is, to store the data on the hard disk, ensuring that the data will not be lost due to the exit of the process.

Replication: replication is the basis of highly available Redis, and both Sentinels and clusters achieve high availability on the basis of replication. Replication mainly realizes the multi-machine backup of data, as well as load balancing and simple fault recovery for read operations. The defect is that the fault recovery cannot be automated, the write operation cannot be load balanced, and the storage capacity is limited by a single machine.

Sentinel: on the basis of replication, Sentinel achieves automated fault recovery. The defect is that the write operation cannot be load balanced, and the storage capacity is limited by a single machine.

Cluster: through clustering, Redis solves the problem that write operations cannot be load balanced and storage capacity is limited by a single machine, and achieves a relatively perfect high availability solution.

The details can be reviewed:

Let's go back to the sentinels.

Redis Sentinel, or Redis Sentinel, was introduced in version 2.8 of Redis. The core function of the Sentinel is the automatic failover of the primary node. The following is a description of the Sentinel function in the official Redis documentation:

Monitoring (Monitoring): the Sentinel constantly checks whether the master node and slave node are functioning properly.

Automatic failover (Automatic Failover): when the master node does not work properly, the Sentinel starts an automatic failover operation, which upgrades one of the slaves of the failed master node to the new master node and causes the other slave nodes to copy the new master node.

Configuration provider (Configuration Provider): the client obtains the primary node address of the current Redis service by connecting to the sentinel during initialization.

Notification (Notification): the sentry can send the result of the failover to the client.

Among them, the monitoring and automatic failover functions enable the sentry to detect the failure of the primary node in time and complete the transfer, while the configuration provider and notification functions need to be reflected in the interaction with the client.

Here is an explanation of the use of the word "client" in the article: in the previous article, whenever you access the Redis server through API, it will be called the client, including Redis-cli, Java client Jedis, and so on. For ease of distinction, the client in this article does not include Redis-cli, but is more complex than Redis-cli: Redis-cli uses the underlying interfaces provided by Redis, and the client encapsulates these interfaces and functions to make full use of the configuration provider and notification features of the Sentinel.

two。 Architecture

A typical Sentinel architecture diagram is as follows:

It consists of two parts:

Sentinel node: the Sentinel system consists of one or more Sentinel nodes, which are special Redis nodes and do not store data.

Data nodes: both master and slave nodes are data nodes.

II. Deployment

This section will deploy a simple Sentinel system consisting of 1 master node, 2 slave nodes and 3 Sentinel nodes. For convenience, all of these nodes are deployed on a single machine (LAN IP:192.168.92.128), distinguished by port numbers, and the configuration of the nodes is as simple as possible.

1. Deploy master and slave nodes

The master-slave node in the Sentinel system is the same as the ordinary master-slave node configuration and does not require any additional configuration. The following are the configuration files for master node (port=6379) and 2 slave nodes (port=6380/6381), both of which are relatively simple and will not be described in detail:

# redis-6379.conf port 6379 daemonize yes logfile "6379.log" dbfilename "dump-6379.rdb" # redis-6380.conf port 6380 daemonize yes logfile "6380.log" dbfilename "dump-6380.rdb" slaveof 192.168.92.128 6379 # redis-6381.conf port 6381 daemonize yes logfile "6381.log" dbfilename "dump-6381.rdb" slaveof 192.168.92.128 6379

After the configuration is complete, start the master node and the slave node in turn:

Redis-server redis-6379.conf

Redis-server redis-6380.conf

Redis-server redis-6381.conf

After the node is started, connect the master node to see if the master-slave status is normal, as shown in the following figure:

two。 Deploy Sentinel Node

The Sentinel node is essentially a special Redis node.

The configuration of the three sentinel nodes is almost exactly the same. The main difference lies in the different port numbers (26379 / 26380 / 26381). The configuration and startup mode of the node are described below by taking 26379 nodes as an example. The configuration part is as simple as possible. More configurations will be described later:

# sentinel-26379.conf port 26379 daemonize yes logfile "26379.log" sentinel monitor mymaster 192.168.92.128 6379 2

Among them, sentinel monitor mymaster 192.168. The meaning of the 92.128 6379 2 configuration is that the Sentinel node monitors 192.168.92.128 mymaster,*** 6379 as the primary node. The meaning of the name of the primary node is related to the failure determination of the primary node: at least 2 Sentinel nodes are required to agree to determine the primary node failure and fail over.

There are two ways to start a Sentinel node, and the two functions are exactly the same:

Redis-sentinel sentinel-26379.conf

Redis-server sentinel-26379.conf-sentinel

After configuring and booting in the above way, the entire Sentinel system is started. You can verify this by connecting to the Sentinel node through Redis-cli, as shown in the following figure: you can see that the 26379 Sentinel node is already monitoring the mymaster master node (that is, 192.168.92.128Plus 6379) and has found two slave nodes and two other Sentinel nodes.

At this point, if you look at the configuration file of the Sentinel node, you will find some changes, take 26379 as an example:

Among them, dir only explicitly declares the directory where the data and logs are located (only logs in the context of the Sentinel); known-slave and known-sentinel show that the Sentinel has found slave nodes and other sentinels; the parameters with epoch are related to the configuration era (the configuration era is a counter starting from 0, and every time a Sentinel election is held, it will be + 1. * * Sentinel election is an operation in the failover phase, which will be described later in the principles section.

3. Demonstrate failover

Among the four roles of Sentinel, the configuration provider and notification require the cooperation of the client. This article will describe in detail how the client accesses the Sentinel system in the next chapter. This section demonstrates the monitoring and automatic failover capabilities of the sentry in the event of a failure of the primary node.

Step1: first, kill the primary node using the kill command:

Step2: if you immediately use the info Sentinel command in the Sentinel node to check, you will find that the primary node has not been switched over, because it will take some time for the Sentinel to find that the primary node has failed and transferred.

Step3: after a period of time, perform an info Sentinel check in the Sentinel node again and find that the primary node has been switched to 6380 nodes.

But at the same time, it can be found that the Sentinel node thinks that the new master node still has two slave nodes, because while switching 6380 to the master node, the Sentinel sets the 6379 node as its slave node; although the 6379 slave node has been hung up, however, because the Sentinel does not objectively offline the slave node (its meaning will be introduced in the principle section), it is considered that the slave node always exists. When the 6379 node is restarted, it automatically becomes the slave node of the 6380 node. Let's verify it.

Step4: restart 6379 nodes and you can see that 6379 nodes become the slave nodes of 6380 nodes.

Step5: during the failover phase, the configuration files of both the Sentinel and the master-slave node are rewritten.

For the master-slave node, it is mainly the change of slaveof configuration: the new master node has no slaveof configuration, and its slave node slaveof the new master node.

For the sentinel node, in addition to the change of the master-slave node information, the epoch will also change. You can see that the parameters related to the era are all + 1 in the following figure.

4. Summary

In the process of building the Sentinel system, there are several points to pay attention to:

The master-slave node in the Sentinel system is no different from the ordinary master-slave node. Fault detection and transfer are controlled and completed by the Sentinel.

The Sentinel node is essentially a Redis node.

For each sentinel node, only the monitoring master node needs to be configured to automatically discover other sentinel nodes and slave nodes.

During the sentinel node startup and failover phase, the configuration files of each node are rewritten (Config Rewrite).

In the example in this chapter, a sentinel monitors only one primary node; in fact, a sentry can monitor multiple primary nodes, which can be achieved by configuring multiple sentinel monitor.

Client access to the Sentinel system

While the previous section demonstrated the two main roles of sentinels: monitoring and automatic failover, this section combines the client side to demonstrate the other two roles of sentinels: configuration providers and notifications.

1. Code example

Before introducing the principle of the client, take the Java client Jedis as an example to demonstrate how to use it: the following code can connect to the Sentinel system we just built and perform various read and write operations:

Public static void testSentinel throws Exception {String masterName = "mymaster"; Set sentinels = new HashSet; sentinels.add ("192.168.92.128 Jedis jedis 26379"); sentinels.add ("192.168.92.128 Jedis jedis 26380"); sentinels.add ("192.168.92.128 Jedis jedis 26381"); JedisSentinelPool pool = new JedisSentinelPool (masterName, sentinels); / / A lot of work has been done in the initialization process Jedis jedis = pool.getResource; jedis.set ("key1", "value1"); pool.close;}

(note: only how to connect the sentinel is demonstrated in the code, exception handling, resource shutdown, etc.)

two。 Client principle

The Jedis client provides good support for Sentinels. As the above code shows, we just need to provide the Jedis with the Sentinel node collection and masterName to construct the Jedis SentinelPool object; then we can use it like a normal Redis connection pool: get the connection through pool.getResource and execute the specific command.

In the whole process, our code does not need to explicitly specify the address of the master node to connect to the master node; without any indication of failover in the code, we can automatically switch the master node after the Sentinel completes the failover. This can be done because the relevant work has been done in JedisSentinelPool's constructor, which mainly includes the following two points:

Traverse the Sentinel node to get the information of the master node: traverse the Sentinel node and obtain the information of the master node through one of the Sentinel nodes + masterName. This function is achieved by calling the sentinel get-master-addr-by-name command of the Sentinel node. An example of this command is as follows:

Once you get the master node information, stop traversing (so generally traversing to * sentinel nodes, the loop stops).

Increase the monitoring of the sentry: in this way, when a failover occurs, the client can receive a notification from the sentry, thus completing the handover of the primary node. The specific approach is to use the publish and subscribe function provided by Redis to open a separate thread for each Sentinel node, subscribe to the + switch-master channel of the Sentinel node, and reinitialize the connection pool when a message is received.

3. Summary

Through the introduction of the client principle, we can deepen the understanding of the Sentinel function, as follows:

Configuration provider: the client can obtain the master node information through the sentinel node + masterName, where the sentry plays the role of the configuration provider.

It is important to note that the Sentinel is only a configuration provider, not an agent. The difference between the two is:

If it is a configuration provider, the client will establish a connection to the master node directly after obtaining the master node information through the sentinel, and subsequent requests (such as set/get) will be sent directly to the master node.

If it is an agent, every request from the client is sent to the sentry, which processes the request through the master node.

As an example, it is well understood that the role of a sentry is to configure a provider, not an agent. In the previously deployed Sentinel system, modify the configuration file of the Sentinel node as follows:

Sentinel monitor mymaster 192.168.92.128 6379 2

Change to

Sentinel monitor mymaster 127.0.0.1 6379 2

Then, if you run the client code on another machine in the LAN, you will find that the client cannot connect to the master node; this is because the Sentinel, as the configuration provider, queries the address of the master node through it to 127.0.0.1 Redis 6379, and the client will establish a Redis connection to 127.0.0.1 PVL 6379, which naturally cannot be connected. If the sentinel had been an agent, the problem would not have arisen.

Notification: after the failover is completed, the sentry node will send the new master node information to the client so that the client can switch the master node in time.

IV. Basic principles

The basic methods of Sentinel deployment and use are introduced earlier, and this part introduces the basic principles of Sentinel implementation.

1. Commands supported by Sentinel nod

As a Redis node running in special mode, the command supported by Sentinel node is different from that of ordinary Redis node. In operation and maintenance, we can query or modify the Sentinel system through these commands, but more importantly, the Sentinel system can not achieve various functions such as fault discovery and failover without the communication between Sentinel nodes. a large part of the communication is achieved through the commands supported by the Sentinel node. The main commands supported by the Sentinel node are described below:

Basic query:

Through these commands, you can query the topology, node information, configuration information and so on of the sentinel system.

Info sentinel: get the basic information of all the master nodes monitored.

Sentinel masters: get the details of all the master nodes that are monitored.

Sentinel master mymaster: get the details of the monitored master node mymaster.

Sentinel slaves mymaster: gets the details of the slave node of the monitored master node mymaster.

Sentinel sentinels mymaster: get the details of the sentinel node of the monitored primary node mymaster.

Sentinel get-master-addr-by- name mymaster: obtain the address information of the monitored master node mymaster, which has been described earlier.

Sentinel is-master-down-by-addr: Sentinel nodes can use this command to ask whether the master node is offline or not, so as to determine whether the primary node is offline objectively.

Add / remove monitoring of the primary node:

Sentinel monitor mymaster2 192.168.92.128 16379 2: it is exactly the same as the sentinel monitor function in the configuration file when deploying the Sentinel node and is not detailed.

Sentinel remove mymaster2: cancels the monitoring of the primary node mymaster2 by the current Sentinel node.

Forced failover:

Sentinel failover mymaster: this command forces a failover of the mymaster, even if the current primary node is running well; for example, if the machine of the current primary node is about to be scrapped, it can be failed over in advance through the failover command.

two。 Basic principles

The key to the principle of the Sentinel is to understand the following concepts:

Scheduled tasks: each Sentinel node maintains 3 scheduled tasks. The functions of scheduled tasks are as follows: to obtain the master-slave structure of * * by sending info commands to master-slave nodes; to obtain the information of other sentinel nodes by publishing and subscribing; and to determine whether to go offline by sending ping commands to other nodes for heartbeat detection.

Subjective offline: in the scheduled task of heartbeat detection, if other nodes do not reply for more than a certain period of time, the sentinel node will take it off subjectively. As the name implies, the subjective downline means that a sentinel node "subjectively" judges the downline; the subjective downline corresponds to the objective downline.

Objective offline: after subjectively deactivating the primary node, the sentry node will ask other sentinel nodes about the status of the primary node through the sentinel is-master-down-by-addr command; if it is judged that the number of sentinels offline of the primary node reaches a certain value, the primary node will be objectively offline.

It should be noted that objective offline is the only concept of the master node; if the slave node and the sentry node fail, after being subjectively offline by the sentry, there will be no subsequent objective offline and failover operations.

Election Sentinel Node: when the master node is judged to be offline objectively, each Sentinel node will negotiate and elect a Sentinel node, and the sentry node will fail over it.

All sentinels monitoring the master node may be selected as * *, and the algorithm used in the election is Raft algorithm. The basic idea of Raft algorithm is that Sentinel A sends an application to B to become * in a round of elections. If B has not agreed to other Sentinels, it will allow A to become *. The specific process of the election is not described in detail here. Generally speaking, the selection process of the sentry is very fast, and whoever completes the objective offline first can generally become a *.

Failover: the elected sentry starts the failover operation, which can be divided into three steps:

Select a new master node among the slave nodes: the principle of selection is to filter out the unhealthy slave node first; then select the slave node with priority * * (specified by slave-priority); if the priority cannot be distinguished, select the slave node with the replication offset * *; if still unable to distinguish, select the slave node with the lowest runid.

Update the master-slave status: use the slaveof no one command to make the selected slave node the master node, and the slaveof command to make other nodes their slave nodes.

Set the offline master node (that is, 6379) as the slave node of the new master node, and when 6379 comes back online, it will become the slave node of the new master node.

Through the above key concepts, we can basically understand the working principle of the Sentinel. For a more vivid illustration, the following figure shows the log of the Sentinel node, including from the node startup to the completion of the failover.

V. suggestions on configuration and practice

1. Configuration

Several configurations related to Sentinels are described below.

Configure 1:sentinel monitor {masterName} {masterIp} {masterPort} {quorum}

Sentinel monitor is the core configuration of the Sentinel, which is explained in the previous description of the deployment of the Sentinel node, where masterName specifies the name of the master node, masterIp and masterPort specify the address of the master node, and quorum is the threshold for determining the number of Sentinels that are objectively offline of the master node: when the number of Sentinels offline of the master node reaches quorum, the master node is objectively offline. It is recommended that the value be half the number of sentinels plus 1.

Configure 2:sentinel down-after-milliseconds {masterName} {time}

Sentinel down-after-milliseconds is related to the judgment of subjective downline: the sentry uses the ping command to detect the heartbeat of other nodes, and if the other nodes do not reply beyond the time allocated by down-after-milliseconds, the sentry will take them off subjectively. This configuration is effective for the subjective downline decision of master node, slave node and sentry node.

The default value of down-after-milliseconds is 30000, that is, 30s, which can be adjusted according to different network environments and application requirements: the higher the value, the looser the judgment of subjective downline. The advantage is that the possibility of misjudgment is smaller, but the disadvantage is that the time for fault detection and failover is longer, and the waiting time for the client is also longer. For example, if the application requires high availability, the value can be appropriately reduced to complete the transfer as soon as possible when a failure occurs; if the network environment is relatively poor, the threshold can be appropriately raised to avoid frequent misjudgments.

Configure 3:sentinel parallel-syncs {masterName} {number}

Sentinel parallel-syncs is related to replication of slave nodes after a failover: it specifies the number of slaves to initiate a replication operation to a new master node at a time. For example, suppose that after the master node switch is complete, three slave nodes will initiate replication to the new master node; if parallel-syncs=1, the slave node will start replication one by one; if parallel-syncs=3, the three slave nodes will start replication together.

The higher the value of parallel-syncs, the faster the replication time of the slave node, but the greater the pressure on the network load and hard disk load of the master node; it should be set according to the actual situation. For example, if the load of the master node is low and the slave node has higher requirements for service availability, you can increase the value of parallel-syncs appropriately. The default value for parallel-syncs is 1.

Configure 4:sentinel failover-timeout {masterName} {time}

Sentinel failover-timeout is related to the judgment of failover timeout, but this parameter is not used to judge the timeout of the entire failover phase, but the timeout of several sub-stages. For example, if the time for the master node to promote the slave node exceeds timeout, or the time for the slave node to initiate a replication operation (excluding the time for replicating data) to the new master node exceeds timeout, it will lead to the failure of the failover timeout.

The default value for failover-timeout is 180000, or 180s; if it times out, the value will double next time.

Configuration 5: in addition to the above parameters, there are some other parameters, such as parameters related to security verification, which are not introduced here.

two。 Practical suggestions

There should be more than one Sentinel node. On the one hand, increase the redundancy of the sentinel node to prevent the sentry itself from becoming a highly available bottleneck; on the other hand, reduce the misjudgment of the downline. In addition, these different sentinel nodes should be deployed on different physical machines.

The number of sentinel nodes should be odd so that sentinels can make "decisions" through voting: election decisions, objective offline decisions, and so on.

The configuration of each sentinel node should be consistent, including hardware, parameters, etc.; in addition, all nodes should use ntp or similar services to ensure accurate and consistent time.

Sentinel's configuration provider and notification client functions can only be implemented with the support of the client, such as the Jedis; mentioned above. If the library used by the developer does not provide the corresponding support, it may need to be implemented by the developer himself.

When nodes in the Sentinel system are deployed in Docker (or other software that may perform port mapping), you should pay special attention to the fact that port mapping may cause the Sentinel system to fail to function properly because the work of the Sentinel is based on communication with other nodes, while the port mapping of Docker may prevent the Sentinel from connecting to other nodes. For example, Sentinels discover each other and rely on their declared IP and port. If one Sentinel An is deployed in a port-mapped Docker, then other Sentinels cannot connect to A using the port claimed by A.

On the basis of master-slave replication, Sentinel introduces automatic failover of master node, which further improves the high availability of Redis. However, the defect of Sentinel is also obvious: Sentinel cannot automatically fail over slave node. In read-write separation scenario, slave node failure will lead to unavailability of read service, which requires us to do additional monitoring and switching operations on slave node.

The above is how to understand Redis Sentinel technology, the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.