What does Redis cluster mean? 07/06 Update SLTechnology News&Howtos

What does Redis cluster mean?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Editor to share with you what Redis cluster refers to, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to understand it!

This paper mainly introduces the cluster around the following aspects.

Cluster brief introduction Cluster function configuration Cluster manual and automatic failover principle this paper implements the environment centos 7.3redis 4.0redis working directory / usr/local/redis all operations are simulated in the virtual machine 1. Cluster introduction

Cluster is to solve the memory limit and concurrency problems in master-slave replication. If your current cloud service memory is 256GB, redis will no longer be able to provide services when this memory is reached. At the same time, if the amount of data can reach this point, the amount of data written will be very large, which can easily cause buffer overflow, resulting in unlimited full replication of slave nodes, resulting in master-slave failure.

Then we need to change the master-slave mode of the single machine to many-to-many mode and all the master nodes will be connected to communicate with each other. This way can not only share stand-alone memory, but also distribute requests to improve the availability of the system.

As shown in the figure: when there are a large number of requests to write, instructions will no longer be sent to a single master node, but will be diverted to each master node to share memory and avoid a large number of requests.

So how do instructions be shunted and stored? We need to take a look at the cluster storage structure.

Second, the role of the cluster distributes the storage capacity of the stand-alone, and it can also be easily expanded. The access request of shunt stand-alone machine improves the availability of the system.

How to understand the sentence to improve the availability of the system, let's take a look at the following figure. When master1 goes down, the impact on the system will not be so great, and normal services can still be provided.

At this time, someone will ask how the cluster works when master1 goes down. This question will be answered in the failover below. And this problem will be explained in detail in the principle chapter.

Third, cluster storage structure 1. Storage structure

Stand-alone storage is when the user initiates a request to store the key directly into their own memory. The storage structure of the cluster is not that simple, first of all, what needs to be done after the user initiates a key instruction.

Through CRC16 (key), we will calculate a value and use this value to take module 16384, and we will get a value. We first think that the value 28 is the spatial location where key is saved.

So now the question is, which redis storage space should this key be stored in?

In fact, redis has divided the storage space into 16384 parts after the start of the cluster, and each host keeps some of it.

It should be noted that the number I give to each redis storage space is equivalent to a small storage space (technical term "hash slot"). You can understand it as the number in a building, a building is the entire storage space of redis, and the number of each house is equivalent to a storage space. This storage space will have a certain area to store the corresponding key, not the location after the above image.

The arrow pointing to 28 means that 28 will be stored in this area, and the house may store 29, 30, 31, and so on.

At this point, the problem comes, what if we add or reduce a machine? If you look at the picture and speak, you can use the picture to explain it and try not to use words.

After adding a new machine, a certain amount of slot will be taken out of the other three storage spaces and allocated to the new machine. Here you can set how many slots you want to put for the new machine.

Similarly, after reducing one machine, the removed slots will be reassigned to other existing machines, just like the new nodes, the node receiving slots can be specified.

The so-called addition or removal of nodes is to change the location where the slots are stored. Once we understand the storage structure of the cluster, we need to explain another problem: how does the cluster design internal communication? Here comes a value, get a key, where to get the data, and let's see the following question.

two。 Communication design

Each node in the cluster sends ping messages to other nodes in a certain period of time, and the other nodes return pong in response. After a period of time, all nodes will know the slot information of all nodes in the cluster.

The following figure has three nodes, so the 16384 hash slots will be divided into three parts.

They are 0-5500, 5501-11000 and 11001-16384 respectively.

When a user initiates a key request, how does the cluster handle the request?

The black box in the following figure represents the slot information of all nodes in the cluster, and there is a lot of other information in it.

As shown in the figure, after the user initiates a request for key,redis to receive, the slot location of the key is calculated, and the corresponding node is found according to the slot location.

If the access slot is in the node itself, then the key corresponding data will be returned directly.

Otherwise, a moved redirection error is replied and the correct node is returned to the client.

Then resend the key instruction

4. Configure cluster 1. Modify the configuration file

Just pay attention to the configuration information in the click circle.

Cluster-enabled yes: enable cluster mode cluster-config-file nodes-6379.conf: cluster configuration file clustre-node-timeout 10000: node timeout. Set it to 10s2 for testing purposes. Build the configuration file of 6 nodes and start it all.

Kaka provides you with a command that can easily replace the file sed's 6379-redis.conf 6379Universe 6380 Universe g '6379-redis.conf > 6380-redis.conf.

Create profiles for six different ports in this way

Casually open a configuration file to check whether the replacement is successful or not. In order to view the log information conveniently, all use the foreground startup. And check whether the services are started normally, execute the command ps-ef | grep redis

You can see that there is an extra cluster logo after startup, which represents a node in the cluster. All nodes have been started, and the instructions for cluster startup need to be based on ruby (Kaka uses redis version 4.0). Next, install it together.

3. Install ruby

Execute the command wget https://cache.ruby-lang.org/pub/ruby/2.7/ruby-2.7.1.tar.gz

Decompress: tar-xvzf ruby-2.7.1.tar.gz decompresses according to the downloaded version

Install:. / configure | make | make install these three instructions in one fell swoop.

Check the ruby and gem versions: ruby-v

4. Start the cluster

The cluster execution command is at / usr/local/redis/src/redis-trib.rb

Note that if you need to use the redis-trib.rb command directly, you need to ln to the bin directory, otherwise you must use the. / redis-trib.rb method.

If you follow the steps, there will be an error executing gem install redis. Unfortunately, there will also be an error here. Then you need to install yum install zlib-devel and yum install openssl-devel

After the installation is complete, execute ruby extconf.rb and make at / ruby-2.7.1/ext/openssl and / ruby-2.7.1/ext/zlib, respectively. | make install

Then, after performing the gem install redis, OK, I will come back to execute. / redis-trib.rb create-- replicas 1 127.0.0.1 replicas 6379 127.0.0.1 redis-trib.rb create 6380 127.0.0.1 "Information interpretation"

Create a cluster and assign hash slots to 6 nodes. The last three nodes are configured to display the hash slot information and node ID of each node to the slave nodes of the first three nodes. In the last step, you need to enter yes to go to the data directory to view the changes in the configuration file. The main information of the configuration file is that each master node divides slots to "view the running log of the host node".

The main information given here is that the cluster status changed:ok cluster status is normal.

5. Cluster setup and data acquisition

When setting the data directly, the error will be reported, and the slot location of the name key after conversion is 5798 and the ip address and port number are given. Need to use the command redis-cli-c

When setting the value, it is prompted that the slot redirected to 5798 will then get the data, and the nodes will be automatically switched.

5. Failover 1. Cluster is offline from the node

According to the cluster startup information above, it is known that port 6383 is the slave node of 6379.

The next step is to take 6383 offline to view the log information of 6379.

6379 reports that connection 6383 is missing and marks fail to indicate that it is not available. The cluster is still working normally at this time.

"Summary: offline from the node has no effect on the cluster." when port 6383 is online, all nodes will clear the fail tag.

two。 The cluster master node is offline.

Manually offline master node 6379 to view the log information of slave node 6383

At this time, the 6383 nodes will continue to connect 6379 for a total of 10 times. Then why 10 times? It is decided according to the parameter cluster-node-timeout 10 that we configured. Here we have a message that is to connect once a second.

The failover begins until the time expires.

At this time, 6383 was competent in the failover election, turned over and the slave sang the song and became the master node. At this point, take a look at the node information of the cluster and command cluster nodes.

You will find that there are four master nodes here, but one of them is offline "6379 of the original master node online".

After 6379 is launched, all nodes will also clear the fail information.

And the node information will also change, at this time the 6379 will be changed to the 6383 slave node.

3. New primary node

Execute the new command on two new ports 6385 and 6386. / redis-trib.rb add-node 127.0.0.1 6385 127.0.0.1 6379, this is the message sent here

Execute the add-node command, with the first parameter being the ip+ port of the new node, and the second parameter being the node in the existing cluster. According to the figure below, we can see that the new nodes already exist in the cluster.

"Note: although 6385 has become a node in the cluster, it is different from other nodes. It has no data, that is, no hash slot. "then we need to assign some hash slots in the cluster to this new node, and only after the allocation is finished will this node become the real master node."

Execute the command. / redis-trib.rb reshard 127.0.0.1 purl 6385

Will prompt how many hash slots to transfer and fill in the id of the receiving node

The final step asks whether to transfer from all nodes: Kaka uses all

Use instructions: cluster nodes view, this node of 6385 already has three ranges of hash slots

"the master node has been added, and then we need to configure a slave node 6386 for the master node 6385."

Command:. / redis-trib.rb add-node-- slave-- master-id dcc0ec4d0c932ac5c35ae76af4f9c5d27a422d9f 127.0.0.1 master-id dcc0ec4d0c932ac5c35ae76af4f9c5d27a422d9f 6386 127.0.1

Master-id is the id of 6385, the first parameter is the ip+ port of the new node, and the second is the specified ip+ port of the primary node

4. Manual failover

When you want to upgrade the master node in the cluster, you can manually fail over to the slave node to avoid affecting the availability of the cluster.

Execute the command on the slave node: cluster failover

"execution process"

If you look at the node information, you can see that 6386 has become the host point.

When the cluster failover instruction is sent to the slave node, the slave node sends the CLUSTERMSG_TYPE_MFSTART packet to the master node. The slave node requests the master node to stop access, so that the data offset between the two is consistent.

At this time, the client will not connect to our eliminated master node, at the same time, the master node sends the replication offset to the slave node, and the failover begins after the slave node gets the replication offset, and then notifies the master node to switch the configuration. When the client is unlocked on the old master, it reconnects to the new master node.

VI. Principles of failover

In the above, we tested the failover, after the master node went offline, the slave node became the master node, and then analyzed the process.

1. Fault discovery to confirmation

Each node in the cluster periodically sends ping messages to other nodes, and the receiver replies with pong.

If the ping message continues to fail during the time of cluster-node-timeout, the receiver's node will be marked as pfail, that is, subjectively offline.

This offline state is not very familiar. Yes, this is a bit similar to the Sentinel's judgment of whether the primary node is abnormal. When a sentinel finds that there is a problem with the primary node, it will also mark the primary node objective offline (s_down). I suddenly found that I had strayed from the subject and was embarrassed.

When it comes to Sentinels, when a Sentinel thinks that the primary node is abnormal, mark the subjective offline, but how can other Sentinels agree? you can't just say what you say. Will try to connect the abnormal master node, when more than half of the sentinels think that the master node is abnormal, it will directly let its master node offline objectively.

Similarly, the cluster will not judge its status as offline just because a node is offline. Nodes in the cluster will directly spread through Gossip messages. Nodes in the cluster will continue to collect offline feedback from failed nodes and store them in the local downline report of failed nodes. When more than half of the cluster master nodes are marked as subjective offline, the status is changed to objective offline.

Finally, a fail message is broadcast to the cluster, informing all nodes to mark the failed node as objective offline.

For example, node A sends ping to node B after the communication anomaly marks node B as pfail, then node A will continue to send ping to node C and carry node B's pfail information, and then node C will save the fault of node B to the offline report. When the number of offline reports is more than half of the number of master nodes with hash slots, an objective offline attempt will be made.

two。 Failure recovery (from the node to turn over from the slave to sing)

When the fault node is defined as an objective offline, all the slave nodes of the fault node bear the responsibility of fault recovery.

Fault recovery is that the slave node will perform the fault recovery process after finding its host point offline objectively through a scheduled task.

"1. Qualification check. "

All slave nodes will check the last connection time with the master node. If the disconnection time is greater than cluster-node-time*cluster-slave-validity-factor, they are not eligible for failover.

"2. Prepare for the election. "

Let's first talk about why there is a time to prepare for the election.

There are multiple slave nodes after the qualification check, so you need to use different delayed election times to support the priority. The priority here is based on the replication offset. The larger the offset and the smaller the delay of the failed master node, the more likely it is to have the opportunity to replace the primary node.

The main role is to ensure that the node with the best data consistency gives priority to initiating the election.

"3. Vote in the election. "

The voting mechanism of the redis cluster does not use slave nodes to elect leaders, which should not be confused with sentinels. The voting mechanism of the cluster is carried out by the host point that holds the slot.

The slave node of the failed node broadcasts an FAILOVER_AUTH_REQUEST packet to all master nodes that hold slots to request a vote.

When the master node replies to the FAILOVER_AUTH_ACK vote, it cannot vote for other slave nodes during the period of NODE_TIMEOUT * 2.

After getting more than half of the votes from the node, the failure recovery phase will be carried out.

"4. Failover "

Successfully elected slave node cancels replication and becomes master node

Delete the slot of the failed node and delegate the slot of the failed node to yourself

Broadcast its own pong message to the cluster, notifying the change of the host point and taking over the slot information of the failed node.

The background of the ssh you want!

A redis Sentinel article that took two nights to finish, but your focus is not on the article itself, ah! Xiaobian's heart hurts.

In order to meet your requirements, Kaka will talk about how to set up a bright blind background. The tool that Kaka uses is xsheel to open the tool selection option and then to see that there is a window transparent to set xsheel transparency. That's right! You're right. This is the desktop background. Are you ready to start setting it up? Will you come back and finish reading the article after you have finished the setting? Kaka also needs all kinds of gods to give technical points to supplement and identify mistakes.

The above is what the Redis cluster refers to all the content, thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.