What is the principle of Redis hash fragmentation? 04/29 Update SLTechnology News&Howtos

What is the principle of Redis hash fragmentation?

2025-04-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "what is the principle of Redis hash slicing". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Cluster fragmentation mode

If Redis only uses the replication function as the master and slave, then when the amount of data is huge, a single machine may not be able to bear a piece of data, not to mention that both the master and slave have to keep a complete copy of the data. In this case, data slicing is a very good solution.

Redis's Cluster is used to solve this problem. It provides two main functions:

Automatically slice the data and fall on each node

Even if some nodes in the cluster fail or fail to connect, you can still continue to process commands.

For the second point, its function is a bit similar to Sentienl's failover (you can read the previous Sentinel article), which is not detailed here. The following is a detailed understanding of the slot slicing principle of Redis. Before that, understand the distributed simple hash algorithm and consistent hash algorithm to help understand the role of slots.

Simple hash algorithm

Suppose there are three machines, and the algorithm for which the data falls is

C = Hash (key) 3

For example, if the hash value of key An is 4% 3% 1, it falls on the second machine. If the hash value of Key ABC is 115.11% 332, it falls on the third machine.

Using such an algorithm, it is assumed that the amount of data is so large that an additional machine is needed. An originally fell on the second machine, but now it falls on the first machine according to the algorithm 4% 4% 0, but there is no A value on the first machine at all. Such an algorithm can cause a lot of cache penetration and avalanches when machines are added or reduced.

Consistent hash algorithm

In 1997, Karger et al of Massachusetts Institute of Technology proposed a consistent hash algorithm to solve the problem of distributed cache.

In the consistent hash algorithm, the whole hash space is a virtual ring.

Suppose there are four nodes Node A, B, C, D, and after the hash calculation of the ip address, their locations are as follows

There are four storage objects, Object A, B, C, and D. after hashing Key, their locations are as follows

For each Object, its real storage location is the first storage node found clockwise. For example, the first node Object A finds clockwise is Node A, so Node An is responsible for storing Object A Magneto object B in Node B.

The consistent hash algorithm is about this, so how fault-tolerant and scalable is it?

Assuming that the Node C node is down and Object C's storage is lost, the latest node it finds clockwise is Node D. In other words, Node C is dead, and only data in the range from Node B to Node C is affected, and the data will be transferred to Node D for storage.

By the same token, suppose you now have a large amount of data and you need to add a node called Node X. The location of Node X is directly from Node B to Node C, so only the data between Node B and Node X will be affected, which will fall back on Node X.

Therefore, the consistent hash algorithm has very good support for fault tolerance and scalability. However, the consistent hashing algorithm also has a serious problem, that is, data skew.

If there are too few nodes in the fragmented cluster and the distribution is uneven, the consistent hashing algorithm will appear that some nodes have too much data and some nodes have too little data. In other words, there is no way to control the allocation of data stored by the node. As shown in the following figure, most of the data is on A, while B has less data.

Hash slot

Redis clusters (Cluster) do not choose the above consistent hash, but adopt the concept of SLOT. The main reason is that as mentioned above, the consistent hashing algorithm is not very friendly to the control of data distribution and node location.

First of all, hash slot is actually two concepts, the first is the hash algorithm. Redis Cluster's hash algorithm is not a simple hash (), but a crc16 algorithm, a check algorithm.

The other is the concept of slot, the rule of space allocation. In fact, the essence of hash slot and consistent hash algorithm are very similar, the difference is the definition of hash space. The space of consistent hash is a ring, and the node distribution is based on the ring, so the data distribution can not be well controlled. The slot space of Redis Cluster is custom allocated, which is similar to the concept of Windows disk partition. This kind of partition can be customized in size and location.

Redis Cluster contains 16384 hash slots, and each Key will fall on a specific slot after calculation, and the slot belongs to which storage node it is up to the user to define and allocate. For example, if the machine hard disk is small, it can allocate less slots, and the large hard disk can be allocated a little more. If the hard drives of the nodes are all the same, they can be distributed equally. Therefore, the concept of hash slot solves the disadvantage of consistent hash very well.

In addition, in terms of fault tolerance and scalability, representation, like consistent hashing, is the transfer of the affected data. The hash slot is essentially the transfer of the slot, transferring the slot responsible for the fault node to other normal nodes. The same is true of expansion nodes, transferring slots on other nodes to new nodes.

It is important to note, however, that the Redis cluster does not automatically transfer and dispatch slots, but requires manual configuration. Therefore, the high availability of Redis clusters depends on the automatic failover between master-slave replication and master-slave replication of nodes.

Cluster building

With the simplest example, putting aside the content of highly available master-slave replication level transfer, we will focus on how the Redis cluster is built and how the slots are allocated, in order to deepen the understanding of the principle and concept of Redis cluster.

Redis.conf configuration

First find redis.conf and enable the cluster function.

Cluster-enabled yes is off by default. To enable cluster and make redis part of the cluster, you need to open it manually.

Then configure the configuration file for cluster

Each cluster node has a cluster configuration file, which is mainly used to record node information, automatically generated and managed by the program, without human intervention. The only thing to note is that if you are running multiple nodes on the same machine, you need to change this configuration to a different name.

To facilitate construction, all Redis instances are on the same machine, so after changing different cluster config names, copy three copies of redis.conf configuration to start three cluster instances (cluster requires at least three master nodes).

Cluster Association > redis-server / usr/local/etc/redis/redis-6379.conf-port 6379 & > redis-server / usr/local/etc/redis/redis-6380.conf-- port 6380 & > redis-server / usr/local/etc/redis/redis-6381.conf-- port 6381 &

The function of the & symbol is to make the command execute in the background, but the log executed by the program is still printed in the console. You can also configure deamonize yes in redis.conf to let Redis run in the background.

Connect to the 6379 Redis instance and view the cluster scope through cluster nodes.

The same is true for other instances. Currently, 6379, 6380, and 6381 are in their respective clusters, and there is only one cluster on their own.

On 6379, use the cluster meet command to establish a link with 6380, 6381.

127.0.0.1 cluster meet 6379 > cluster meet 127.0.0.1 6380 127.0.1 6381

You can see that the cluster already contains 6379, 6380, and 6381 nodes. Log in to other nodes to view the same result. Even if there is no direct manual association between 6380 and 6381, in the cluster, once a node finds an unassociated node, it will automatically shake hands with it.

Slot allocation

Check the status of the cluster through the cluster info command

The status of state is fail and has not been enabled yet. Take a look at the official instructions.

The node can accept requests only if state is ok. If there is only one slot that is not allocated, then this state is fail. A total of 16384 slots need to be allocated to make the cluster work properly.

Next, 6379 is assigned a slot of 05000, 6380 is assigned a slot of 500110000, and 6381 is assigned a slot of 1000116383.

> redis-cli-c-p 6379 cluster addslots {0.5000} > redis-cli-c-p 6380 cluster addslots {5001.10000} > redis-cli-c-p 6381 cluster addslots {10001.. 16383}

Look at cluster info again.

State has allocated all the slots for ok,16384. Now the cluster is working properly.

Effect test

Log on to any instance and remember to add the parameter-c to enable the client in cluster mode, otherwise it will not work properly.

Redis-cli-c-p 6380

Try set and get operations.

As you can see, the Redis cluster calculates the card slot in which the key falls, and then forwards the command to the node responsible for the card slot for execution.

Use the cluster keyslot command to calculate which slot the key is in, and then figure out which node it will jump to.

This is the end of the content of "what is the principle of Redis hash slicing". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.