What is Cluster and why Cluster is needed in Redis 04/15 Update SLTechnology News&Howtos

What is Cluster and why Cluster is needed in Redis

2025-04-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail what is Cluster and why Cluster is needed in Redis. The content of the article is of high quality, so the editor will share it with you for reference. I hope you will have some understanding of the relevant knowledge after reading this article.

The editor will deeply disassemble all aspects of the cluster, such as node, slot assignment, command execution, refragmentation, redirection, failover, message and so on.

The goal is to master what is Cluster? Cluster fragmentation principle, client location data principle, failover, choose the master, what scenarios to use Cluster, how to deploy the cluster … ... [toc]

Why do you need Cluster

Brother 65: brother Ma, since I used the Sentinel cluster you mentioned to achieve automatic failover, I can finally be happy with my girlfriend and I am not afraid of Redis downtime late at night.

Recently, however, there is a nasty problem. Redis needs to save 8 million key-value pairs, which takes up 20 GB of memory.

I used a 32-gigabyte memory host deployment, but the Redis response was sometimes very slow, and I used the INFO command to look at the latest_fork_usec metrics (the most recent fork was time-consuming) and found it particularly high.

It is mainly caused by the Redis RDB persistence mechanism. Redis will Fork the child process to complete the RDB persistence operation, and the time spent by fork is positively related to the amount of Redis data.

When Fork executes, it will block the main thread. Due to the large amount of data, the blocking main thread is too long, so there appears the appearance of slow Redis response.

65 Brother: with the expansion of the scale of business, the amount of data is getting larger and larger. It is difficult to expand the hardware of a single instance when upgrading the master-slave architecture, and saving a large amount of data will lead to slow response. Is there any way to solve it?

Save a large amount of data, in addition to using large memory hosts, we can also use slice clusters. As the saying goes, "the fire is high when people pick up materials." one machine can't keep all the data, so share it with more than one machine.

The use of Redis Cluster cluster mainly solves all kinds of slow problems caused by large amount of data storage, and it is also convenient for horizontal expansion.

The two schemes correspond to two expansion schemes for the increase of Redis data: vertical expansion (scale up) and horizontal expansion (scale out).

Vertical scaling: upgrade the hardware configuration of a single Redis, such as increasing memory capacity, disk capacity, and using a more powerful CPU.

Horizontal expansion: increase the number of Redis instances horizontally, and each node is responsible for part of the data.

For example, if you need a server resource with 24 GB disk and 150 GB memory, there are two options:

When facing millions and tens of millions of users, a scale-out Redis slicing cluster will be a very good choice.

65 Brother: what are the advantages and disadvantages of these two schemes?

Vertical scaling is simple to deploy, but when the amount of data is large and persistence is implemented using RDB, it can cause blocking and slow response. In addition, due to hardware and cost, the cost of expanding memory is too high, such as expanding to 1T of memory.

Horizontal expansion is easy to expand without worrying about the hardware and cost limitations of a single instance. However, the slice cluster will involve the distributed management of multiple instances, so we need to solve how to distribute the data to different instances reasonably, and at the same time, let the client access the data on the instance correctly.

What is a Cluster cluster

Redis cluster is a distributed database scheme, which manages data through sharding (a practice of divide-and-conquer idea) and provides replication and failover functions.

The data is divided into 16384 slots, and each node is responsible for part of the slot. The information of the slot is stored in each node.

It is decentralized, as shown in the figure, the cluster consists of three Redis nodes, each node is responsible for part of the data of the entire cluster, and each node may be responsible for more or less different data.

The three nodes are connected to each other to form a peer-to-peer cluster, and they exchange cluster information with each other through the Gossip protocol. Finally, each node stores the slots allocation of other nodes.

Start by sending a message

Technology is not omnipotent, and programmers are not the best. We must figure it out and don't think that "Laozi is the best in the world". Once we have this consciousness, it may hinder our growth.

Technology is to solve problems, if a technology can not solve problems, then this technology is worthless.

Don't show off your skills, it's pointless.

Cluster installation

Click-> "Redis 6.x Cluster Cluster Building" to view

A Redis cluster is usually composed of multiple nodes (node). At the beginning, each node is independent of each other, and they are all in a cluster that contains only their own. In order to build a truly working cluster, we must connect the independent nodes to form a cluster containing multiple nodes.

The work of connecting each node can be done with the CLUSTER MEET command: CLUSTER MEET.

Sending a CLUSTER MEET command to a node node allows the node node to shake hands with the nodes specified by ip and port (handshake). When the handshake is successful, the node node will add the nodes specified by ip and port to the current cluster of the node node.

It is as if the node node said, "Hey, brother of ip = xx,port = xx, if you want to join the" codebyte "technology group, join the cluster and find a way for the god to grow up. Follow the" codebyte "official account and reply" add group ". If you are a brother, come with me!"

For detailed steps on Redis Cluster cluster building, please click "read the original text" in the lower left corner at the end of the article or click-> "Redis 6.x Cluster Cluster Building". For official details on Redis Cluster, please see: redis.io/topics/clus …

Principle of Cluster implementation

65 Brother: after the data is sliced, the data needs to be distributed on different instances. how do the data correspond to the instances?

Since the beginning of Redis 3.0, the official Redis Cluster solution has implemented slicing clusters, which implements the rules of data and instances. The Redis Cluster scheme uses hash slots (Hash Slot, which I'll call Slot directly) to handle the mapping between data and instances.

Follow "Code Byte" to enter the journey of exploring the principle of Cluster implementation. ...

Divide the data into multiple parts and store them on different instances

The entire database of the cluster is divided into 16384 slots (slot). Each key in the database belongs to one of the 16384 slots, and each node in the cluster can handle 0 or up to 16384 slots.

The mapping process between Key and hash slot can be divided into two steps:

According to the key of the key-value pair, the CRC16 algorithm is used to calculate a value of 16 bit.

The value of 16 bit is modulated to 16384, and the hash slot corresponding to key is obtained with numbers from 0 to 16383.

Cluster also allows users to force a key to hang on a specific slot, which forces the slot in which the key hangs to be equal to the slot where the tag is located by embedding the tag tag in the key string.

Mapping between Hash slot and Redis instance

65 Brother: how do hash slots map to Redis instances?

In the sample deployment cluster, created through cluster create, Redis automatically distributes 16384 hash slots evenly over the cluster instance, such as N nodes, with the number of hash slots on each node = 16384 / N.

In addition, you can connect 7000, 7001, and 7002 nodes to a cluster through the CLUSTER MEET command, but the cluster is still offline because none of the three instances handles any hash slots.

You can use the cluster addslots command to specify the number of hash slots on each instance.

65 Brother: why do you have to make it manually?

Those who can do more work. The configuration of Redis instances in the cluster is different. If you bear the same pressure, it will be too difficult for junk machines. Let the powerful machines support more.

For the cluster of three instances, assign hash slots to each instance through the following instruction: instance 1 is responsible for 0-5460 hash slots, instance 2 is responsible for 5461-10922 hash slots, and instance 3 is responsible for 10923-16383 hash slots.

Redis-cli-h 172.16.19.1-p 6379 cluster addslots 0 cluster addslots 5460 cluster addslots cli-h 172.16.19.2-p 6379 cluster addslots 5461 10922 cluster addslots 172.16.19.3-p 6379 10923 19i 16383

The mapping of key values to data, hash slot and Redis instances is as follows:

The key "codebytes" and "awesome" of Redis key-value pairs are calculated by CRC16, and then the total number of hash slots is 16394, and the modulus results are mapped to instance 1 and instance 2 respectively.

Keep in mind that when 16384 slots are fully allocated, the Redis cluster will work properly.

Replication and failover

65 Brother: how to achieve high availability of Redis cluster? Are Master and Slave still separate from reading and writing?

Master is used for processing slots, and Slave nodes synchronize master node data through "Redis master-slave architecture data synchronization".

When Master goes offline, Slave continues to process the request in place of the master node. There is no read-write separation between master and slave nodes, and Slave is only used as a highly available backup for Master downtime.

Redis Cluster can set several slave nodes for each master node, and when a single master node fails, the cluster will automatically promote one of the slave nodes to the master node.

If a master node does not have a slave node, the cluster will be completely unavailable when it fails.

However, Redis also provides a parameter cluster-require-full-coverage that allows some nodes to fail, and other nodes can continue to provide external access.

For example, the 7000 master node is down, and 7003 of the slave becomes the Master node to continue to provide services. When the offline node 7000 comes online again, it will become the current 70003 slave node.

Fault detection

65 Brother: in "Redis High availability: the principle of Sentinel Sentinel Cluster", I know that Sentinels achieve automatic failover by monitoring, automatically switching the main library, and notifying the client. How can Cluster achieve automatic failover?

Just because a node thinks that a node is missing does not mean that all nodes think it is missing. The cluster believes that a node needs to be switched between master and slave only when most of the slot nodes responsible for processing the node decide that a node has gone offline.

Redis cluster nodes use Gossip protocol to broadcast their own state and their cognitive changes to the whole cluster. For example, if a node finds that a node has lost contact (PFail), it will broadcast this message to the entire cluster, and other nodes can receive this missing message.

About the Gossip protocol, you can read an article by Gokong: "virus invasion depends entirely on distribution."

If a node receives that the number of missing nodes (PFail Count) has reached most of the cluster, it can mark the node as determined offline status (Fail), and then broadcast to the entire cluster, forcing other nodes to accept the fact that the node has gone offline, and immediately switch between master and slave to the missing node.

Fail-over

When a Slave discovers that its master node has entered the offline state, the slave node will begin to fail over the offline master node.

Select a node to be the new master node from the offline Master and the node's Slave node list.

The new master node revokes all slot assignments to the offline master node and assigns these slots to itself.

The new master node broadcasts a PONG message to the cluster, which lets other nodes in the cluster immediately know that the node has changed from a slave node to a master node, and that the master node has taken over the slot that was originally handled by the offline node.

The new master node begins to receive command requests related to the processing slot, and the failover is completed.

Select the main process

65 Brother: how is the new master node elected?

The configuration era + 1 of the cluster is a self-ever counter with an initial value of 0 and + 1 for each failover.

The slave node that detected the downlink of the master node broadcasts a CLUSTERMSG_TYPE_FAILOVER_AUTH_REQUEST message to the cluster, requiring all the master nodes that receive the message and have the right to vote to vote for the slave node.

If the master node has not yet voted for another slave node, the master node will return a CLUSTERMSG_TYPE_FAILOVER_AUTH_ACK message to the slave node requesting the vote, indicating that the master node supports the slave node to become the new master node.

The slave node participating in the election will receive the CLUSTERMSG_TYPE_FAILOVER_AUTH_ACK message. If the collected vote > = (NUnip 2) + 1 is supported, then the slave node will be elected as the new master node.

If enough votes are not collected from the node in a configuration era, the cluster enters a new configuration era and elects again until a new master node is selected.

Similar to Sentinel, both are implemented based on the Raft algorithm, and the process is shown in the figure:

Is it feasible to save the relationship between key-value pairs and instances with a table?

Brother 65, let me test you: "the Redis Cluster scheme assigns key-value pairs to different instances by means of hash slots. This process requires CRC calculation of the key of key-value pairs and modular mapping of the total number of hash slots to the instance." If you use a table to record the corresponding relationship between the key-value pair and the instance (for example, key-value pair 1 on instance 2, key-value pair 2 on instance 1), then you don't have to calculate the correspondence between key and hash slot, just look up the table, why doesn't Redis do that? "

If you use a global table record, you need to modify the table if the relationship between the key-value pair and the instance changes (resharding, instance addition or subtraction). If it is a single-threaded operation, all operations should be serial, the performance is too slow.

In the case of multithreading, locking is involved. in addition, if the amount of data of the key-value pair is very large, the storage space required to store the table data related to the instance of the key-value pair will be very large.

In hash slot calculation, although the relationship between hash slot and instance time is also recorded, the number of hash slot is much less, only 16384, and the overhead is very small.

How the client locates the instance where the data resides

65 Brother: how does the client determine which instance the accessed data is distributed on?

Redis instances send their own hash slot information to other instances in the cluster through Gossip protocol, thus realizing the diffusion of hash slot allocation information.

In this way, each instance in the cluster has all the mapping information between the hash slot and the instance.

When slicing the data, the key is calculated by CRC16 to calculate a value and then the 16384 module is taken to get the corresponding Slot. This calculation task can be performed when the sending request is executed on the client.

However, after locating the slot, you need to further navigate to the Redis instance where the Slot is located.

When the client connects to any instance, the instance responds to the mapping between the hash slot and the instance to the client, and the client caches the hash slot and instance mapping information locally.

When the client requests, it calculates the hash slot corresponding to the key, locates the mapping information of the hash slot instance through the local cache to the data instance, and then sends the request to the corresponding instance.

65 Brother: what if the mapping between hash slots and instances is changed due to the addition of new instances or the redistribution of load balancers?

The instances in the cluster pass messages to each other through the Gossip protocol to get the latest hash slot allocation information, but the client is not aware of it.

Redis Cluster provides a redirection mechanism: the client sends the request to the instance, which does not have the corresponding data, and the Redis instance tells the client to send the request to another instance.

65 Brother: how does Redis tell the client to redirect access to the new instance?

There are two situations: MOVED error and ASK error.

MOVED error

MOVED error (load balancer, data has been migrated to other instances): when the client sends an operation request for a key-value pair to an instance, and the slot of the key is not its own responsibility, the instance will return a MOVED error directing it to the node in charge of the slot.

GET official account: error MOVED 16330 172.17.18.2 MOVED 6379

This response indicates that the hash slot 16330 where the key-value pair requested by the client is migrated to the instance 172.17.18.2 at port 6379. This allows the client to establish a connection with 172.17.18.2 GET 6379 and send a request for it.

At the same time, the client updates the local cache to update the correspondence between the slot and the Redis instance correctly.

ASK error

65 Brother: what if a slot has a lot of data and some of it is migrated to a new instance and some of it has not been migrated?

If the requested key is found in the current node, execute the command directly, otherwise the ASK error response will be required. If the slot migration is not completed, if the Slot of the key you need to access is being migrated from instance 1 to instance 2, instance 1 will return an ASK error message from the client: the key requested by the client is being migrated to instance 2. Send an ASKING command to instance 2 first. Then send the operation command.

GET official account: error ASK 16330 172.17.18.2 ASK 6379

For example, if the client requests to locate slot 16330 of key = "official account: codebyte" on instance 172.17.18.1, node 1 executes the command directly if it can find it, otherwise it responds to the ASK error message and directs the client to the target node 172.17.18.2 being migrated.

Note: the ASK error instruction does not update the hash slot allocation information cached by the client.

So when the client requests the data of Slot 16330 again, it will send the request to the 172.17.18.1 instance first, but the node will respond to the ASK command and ask the client to send a request to the new instance.

The MOVED instruction updates the client local cache so that subsequent instructions are sent to the new instance.

How big can the cluster be set?

65 Brother: with Redis Cluster, I am no longer afraid of a large amount of data, can I expand infinitely?

The answer is no, the official scale of Redis Cluster given by Redis is 1000 instances.

65 Brother: what on earth limits the size of the cluster?

The key lies in the communication overhead between the instances. Each instance in the Cluster cluster stores all the hash slot and instance relationship information (the table that Slot maps to the node), as well as its own state information.

Each instance propagates the data of nodes through Gossip protocol between clusters. The Gossip protocol works roughly as follows:

Randomly select some instances from the cluster and send PING messages to the selected instances at a certain frequency to detect the status of the instances and exchange information with each other. The PING message encapsulates the status information of the sender, the status information of some other instances, Slot and instance mapping table information.

After receiving the PING message, the instance responds to the PONG message, which contains the same information as the PING message.

Through the Gossip protocol between clusters, each instance can obtain the status information of all other instances after a period of time.

So when new nodes join, node failures and Slot mapping changes can be completed through PING,PONG message propagation synchronization of cluster status in each instance.

Gossip message

The message structure sent is made up of the clusterMsgDataGossip structure:

Typedef struct {char nodename [cluster _ NAMELEN]; / 40-byte uint32_t ping_sent; / / 4-byte uint32_t pong_received; / / 4-byte char IP [net _ IP_STR_LEN]; / / 46-byte uint16_t port; / / 2-byte uint16_t cport; / / 2-byte uint16_t flags; / / 2-byte uint32_t notused1; / / 4-byte} clusterMsgDataGossip

So if each instance sends a Gossip message, it needs to send 104 bytes. If the cluster is 1000 instances, sending one PING message per instance will take up approximately 10KB.

In addition, when propagating Slot mapping tables between instances, each message contains a Bitmap of 16384 bit in length.

Each bit corresponds to a Slot, and a value of 1 indicates that the Slot belongs to the current instance, and the Bitmap occupies 2KB, so a PING message is about 12KB.

PONG is the same as PING messages, sending and returning two messages add up to 24 KB. With the increase of cluster size, more and more heartbeat messages will occupy the network communication bandwidth of the cluster and reduce the cluster throughput.

Communication frequency of the instance

Brother 65: brother Ma, the frequency of sending PING messages will also affect the cluster bandwidth, right?

After the instance of Redis Cluster is launched, five instances are randomly selected from the local instance list by default, and then one of these five instances is found to have not received a PING message for the longest time, and the PING message is sent to the instance.

65 Brother: five are randomly selected, but there is no guarantee that the instances selected are the instances that have not received PING communication for the longest time in the whole cluster. Some instances may not have received a message all the time, resulting in the cluster information maintained by them has expired a long time ago. What should I do?

This is a good question. Redis Cluster instances scan the list of local instances every 100 ms. When an instance is found, the last time it received a PONG message > cluster-node-timeout / 2. Then immediately send a PING message to the instance to update the cluster status information of the node.

When the size of the cluster becomes larger, it will further lead to the increase of network communication delay between instances. It may cause more PING messages to be sent frequently.

Reduce the communication overhead between instances

Each instance sends a PING message per second, and reducing this frequency may cause the status information of each instance in the cluster to not be propagated in time.

Every 100 ms checks whether the PONG message received by the instance exceeds cluster-node-timeout / 2. This is the default periodic detection task frequency of the Redis instance, and we will not easily modify it.

Therefore, you can only modify the value of cluster-node-timeout: the heartbeat time in the cluster to determine whether the instance has failed. The default is 15s.

Therefore, in order to avoid too many heartbeat messages occupying the cluster broadband, set the cluster-node-timeout to 20 seconds or 30 seconds, so that the PONG message receiving timeout will be alleviated.

However, you can't set it too big. All will cause the instance to fail, but you have to wait for the cluster-node-timeout to detect the fault, which affects the normal service of the cluster.

Summary

The Sentinel cluster automatically fails over, but when the amount of data is too large, it takes too long to generate RDB. When Fork executes, it will block the main thread. Due to the large amount of data, the blocking main thread is too long, so there appears the appearance of slow Redis response.

The use of Redis Cluster cluster mainly solves all kinds of slow problems caused by large amount of data storage, and it is also convenient for horizontal expansion. When facing millions and tens of millions of users, a scale-out Redis slicing cluster will be a very good choice.

The entire database of the cluster is divided into 16384 slots (slot). Each key in the database belongs to one of the 16384 slots, and each node in the cluster can handle 0 or up to 16384 slots.

Redis cluster nodes use Gossip protocol to broadcast their own state and their cognitive changes to the whole cluster.

After the client connects to any instance of the cluster, the instance sends the hash slot and instance mapping information to the client, and the client saves the information, which is used to locate the key to the corresponding node.

The cluster can not increase indefinitely, because the cluster propagates the cluster instance information through the Gossip protocol, so the communication frequency is the main reason to limit the size of the cluster, and the frequency can be adjusted mainly by modifying cluster-node-timeout.

On what is Cluster and why Redis need Cluster to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.