Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is hash slot?

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

Today, I would like to talk to you about what hash slot is, many people may not know much about it. In order to make you understand better, the editor has summarized the following contents for you. I hope you can get something from this article.

In the distributed cluster, how to ensure that the same request falls on the same machine, and the subsequent cluster machine can divide the request equally as much as possible, and it can have the least impact on the original cluster when the capacity is expanded or downwards.

Round robin algorithm: data is directly mapped to real nodes after mod, which results in a close relationship between the number of nodes and data, and the lack of flexible expansion in the later stage.

Consistent hashing algorithm: add an additional layer of virtual mapping layer, data and virtual node mapping, virtual node and real node remapping.

The method of consistent hash or hash slot is generally used. The implementation of consistent hash ketama algorithm needs to recalculate nodes in the case of capacity expansion or down, which may have some impact on the previous allocation. Therefore, hash slot can be introduced, that is, some hash slot intervals correspond to a machine. In the case of capacity expansion or downmachine, you can change a certain hash slot interval. The change is relatively small and has less impact on the previous allocation.

Virtual bucket is a compromise between modularization and consistent hash.

A fixed number of nodes is used to avoid the inflexibility of modeling.

A configurable mapping node is used to avoid some of the effects of consistent hashing.

Let's take a look at the basic model of hash slot:

A virtual bucket layer is introduced between the record and the physical machine, and the record is mapped to the virtual bucket through the hash function, and there is a many-to-one relationship between the record and the virtual bucket; the second layer is the mapping between the virtual bucket and the physical machine, which is also a many-to-one relationship, that is, a physical machine corresponds to multiple virtual buckets, and this layer relationship is realized through memory tables. Compared with the abstract model section, key-partition is implemented through the hash function, and partition-machine is implemented through memory tables. Note: couchbase uses this technology.

Key pair virtual bucket layer

The virtual bucket layer uses a preset fixed number, for example, it can be preset Number1024. It means that after that, the maximum capacity of the distributed cluster is expanded to 1024 nodes, and the advantage is that the value after mod is constant (very important), which ensures that the first-layer mapping is not affected by the changes of the actual nodes. As for the maximum quantity, it can be pre-defined according to the needs of the implementation.

Virtual bucket to actual node

For example, configure node mapping at the beginning of the project:

The numbers of Redis Server1 corresponding buckets are 0 to 500.

The numbers of the Redis Server2 corresponding buckets are 500 to 1024.

After the amount of cache data increases, new nodes need to be added, and the number of virtual buckets corresponding to nodes needs to be reassigned before adding. For example, adding server3 and configuring the corresponding bucket numbers from 400 to 600 have no effect on key mapping virtual bucket layer at all. In fact, the real data of mod 400s to 600s are still on the other two nodes, and there will be an unhit effect after the request comes. This requires that before adding a new node, you need to copy the 400 to 600 numbered data of the other two stations to the new node in the background, and then add the configuration to the mapping after completion. Because the new request will hit the new node, the 400 to 600 numbered data of the other two sets are useless and need to be deleted. This approach can maximize (100%) to ensure that the dynamic expansion will have no impact on the cache system, and the specific implementation details need to be further studied.

This idea is also used in the design of redis cluster.

Instead of using traditional consistent hashes to allocate data, Redis clusters allocate data in another way called hash slot. Redis cluster allocates 16384 slot by default. When we set a key, we use the CRC16 algorithm to get the slot we belong to, and then assign the key to the nodes in the hash slot. The specific algorithm is: CRC16 (key)% 16384.

So, let's assume that there are now three nodes that have formed a cluster, namely: a, B, and C. they can be three ports on a machine or three different servers. Then, if 16384 slot are allocated by using hash slot, the slot intervals borne by each of their three nodes are:

Node A covers 0mur5460

Node B covers 5461Mui 10922

Node C covers 10923Mel 16383.

This practice of distributing hash slots to different nodes makes it easy for users to add or remove nodes from the cluster. For example:

If the user adds a new node D to the cluster, the cluster only needs to move some slots in nodes A, B, C to node D.

For example, the way I want to add a new node, D _ MagneRedis cluster, is to take part of the slot from the front of each node to D. It would look something like this:

Node A covers 1365-5460

Node B covers 6827-10922

Node C covers 12288-16383

Node D covers 0-1364, 5461-6826, 10923-12287.

Similarly, if the user wants to remove node A from the cluster, the cluster only needs to move all the hash slots in node A to node B and node C, and then remove node A that is blank (without any hash slots).

Because moving a hash slot from one node to another does not cause node blocking, whether it is adding a new node or removing an existing node, or changing the number of hash slots a node contains, will not cause the cluster to go offline.

In addition, there is another question: why is the number of hash slots fixed at 16384? (https://github.com/antirez/redis/issues/2576)

Due to the use of the CRC16 algorithm, this algorithm can produce 2 ^ 16-1 times 65535 values, but why is the number of hash slots set to 16384?

Normal heartbeat packets carry the full configuration of a node, that can be replaced in an idempotent way with the old in order to update an old config. This means they contain the slots configuration for a node, in raw form, that uses 2k of space with 16k slots, but would use a prohibitive 8k of space using 65k slots.

At the same time it is unlikely that Redis Cluster would scale to more than 1000 mater nodes because of other design tradeoffs.

So 16k was in the right range to ensure enough slots per master with a max of 1000 maters, but a small enough number to propagate the slot configuration as a raw bitmap easily. Note that in small clusters the bitmap would be hard to compress because when N is small the bitmap would have slots/N bits set that is a large percentage of bits set.

To sum up:

1. The heartbeat information of a node in redis needs to carry all the configuration information of that node, and the memory consumed by the number of 16K slots is 2K, but if 65K slots are used, this part of the space will reach 8K, and the heartbeat information will be very large.

2. It is almost impossible to have more than 1000 master nodes in a Redis cluster.

3. In the configuration information of the master node of Redis, the hash slot it is responsible for is saved in the form of a bitmap. In the process of transmission, bitmap will be compressed, but if the filling rate of bitmap slots / N is very high, the compression ratio of bitmap is very low, so N represents the number of nodes. If the number of nodes is very small, and the number of hash slots is large, the compression ratio of bitmap is very low. And 16K slots when the main node is 1000, it is just reasonable, not only ensure that each node has enough hash slots, but also can make good use of bitmap.

4. 16384 is selected because crc16 outputs the result of 16bit, which can be regarded as a number distributed between 0-2 ^ 16-1. The author of redis tests found that the number calculated by 2 ^ 14 will distribute the key evenly between 0-2 ^ 14-1, so he chose this value.

Finally, post the source code that calculates hash slot in redis to see the effect.

# include # include # include "crc16.h" unsigned int keyHashSlot (char * key, int keylen) {int s, e; / * start-end indexes of {and} * / std::cout

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report