What is the principle of consistent Hash 07/09 Update SLTechnology News&Howtos

What is the principle of consistent Hash

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "what is the principle of consistent Hash". Friends who are interested may wish to take a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "what is the principle of consistent Hash"?

I. Preface

When solving the problem of load balancing in distributed systems, the Hash algorithm can be used to make a fixed part of the requests fall on the same server, so that each server regularly processes part of the requests (and maintains the information of these requests), which plays the role of load balancing.

However, the ordinary remainder hash (hash (for example, user id)% server machines) algorithm has poor scalability, and the mapping relationship between user id and server will fail massively when new server machines are added or offline. Consistent hash is improved by using hash ring.

II. Overview of consistent Hash

In order to intuitively understand the principle of consistent hash, this paper illustrates it with a simple example, assuming that there are four servers with the address of ip1,ip2,ip3,ip4.

Consistent hash is to first calculate the hash values corresponding to four ip addresses.

Hash (ip1), hash (ip2), hash (ip3), hash (ip3). The calculated hash value is a direct value of the 0 ~ largest positive integer. These four values are shown in the following figure on the consistent hash ring:

Image.png

The hash ring starts clockwise from the integer 0 to the largest positive integer, and the hash value calculated based on the four ip must fall to some point on the hash ring, so we map the four ip of the server to the consistent hash ring.

When the user makes a request on the client, first calculate the routing rule (hash value) according to hash (user id), then see where the hash value falls on the hash ring, and find the nearest ip clockwise as the routing ip according to the position of the hash value on the hash ring.

Image.png

As shown in the figure above, the user1,user2 request will be processed by the server ip2, the User3 request will be processed by the server ip3, the user4 request will be processed by the server ip4, and the user5,user6 request will be processed by the server ip1.

Let's consider what happens when the ip2 server goes down.

When the server of ip2 goes down, the consistent hash ring looks like the following figure:

Image.png

According to the clockwise rule, the request of user1,user2 will be processed by the server ip3, while the processing server corresponding to the request of other users remains unchanged, that is, only the mapping relationship of some users previously processed by ip2 has been destroyed, and the request it is responsible for processing is delegated by the next node clockwise.

Let's consider what happens when new machines are added.

When an ip5 server is added, the consistent hash ring is roughly as shown below:

Image.png

According to the clockwise rule, the previous user5 request should be handled by the ip1 server, but now it is handled by the new ip5 server, and the request processing server of other users remains the same, that is, some of the requests from the clockwise nearest server of the new server will be replaced by the new server.

Third, the characteristics of consistent hash

Monotonicity (Monotonicity), monotonicity means that if some requests have been hashed to the corresponding server for processing, and when a new server is added to the system, it should be guaranteed that the original request can be mapped to the original or new server, but not to the original server. By adding the server ip5 above, it can be proved that after the addition of ip5, the user6 that was originally processed by ip1 is still processed by ip1, and the user5 that was processed by ip1 is now processed by the new ip5.

Spread: in a distributed environment, the client may not be aware of the existence of all servers when requesting, and may only know some of them. In the client's view, some of the servers he sees will form a complete hash ring. If multiple clients treat part of the server as a complete hash ring, it may result in requests from the same user being routed to different servers for processing. This situation should obviously be avoided because it does not guarantee that requests from the same user will fall on the same server. The so-called dispersion refers to the severity of the above situation. A good hash algorithm should avoid minimizing dispersion as much as possible. Consistent hash has very low dispersion.

Balance: balance means load balancing, which means that requests from the client after hash should be able to be distributed to different servers. Consistency hash allows each server to process requests, but there is no guarantee that each server handles roughly the same number of requests, as shown in the following figure

Server ip1,ip2,ip3 falls on the consistent hash ring after hash. From the distribution of hash values in the figure, we can see that ip1 is responsible for processing about 80% of requests, while ip2 and ip3 are only responsible for processing about 20% of requests. Although all three machines are processing requests, it is obvious that the load of each machine is uneven, which is called the tilt of consistent hash. The emergence of virtual nodes is to solve this problem.

5. Virtual node

The problem of consistent hash tilting mentioned in the previous section will occur when there are few server nodes. One solution is to add more machines, but there is a cost to add machines, so add virtual nodes, such as the above three machines. The figure of the consistent hash ring after each machine introduces one virtual node is as follows:

Image.png

Ip1-1 is the virtual node of ip1, ip2-1 is the virtual node of ip2, and ip3-1 is the virtual node of ip3.

It is known that when the number of physical machines is M and the virtual node is N, the number of nodes on the actual hash ring is M N. For example, when the hash value calculated by the client is between ip2 and ip3 or between ip2-1 and ip3-1, the ip3 server is used for processing.

6. Uniform hash

In the previous section, the graph after using the virtual node looks more balanced, but if the algorithm for generating the virtual node is not good enough, you will probably get the following ring:

Image.png

It can be seen that after the introduction of one virtual node to each service node, the situation is better than that before the introduction of the virtual node, but it is not balanced.

The consistent hash of equilibrium should be as follows:

Image.png

The goal of uniform hash is that if there are N servers and M hash values for clients, then each server should handle about N users. That is, the load of each server should be balanced as much as possible.

VII. Summary

Consistent hash plays an important role in distributed systems. Both distributed cache and load balancing strategy of distributed Rpc framework are used.

The server handles the pseudo code of the client write request:

Public void handleWrite (WriteMessage wm) {

/ / get key from write message

String key=wm.getKey ()

/ / calculate the node number that actually processes the written message

Int handlerID=key.hashCode () ClusterNodeNum

/ / determine whether the written message is processed by this server

If (MyClusterID==handleID) {

/ / the write message is processed by this server and written to disk

WriteMessage (wm)

} else {

/ / send the write message to the corresponding server for processing

SendMessage (handleID,wm)

}

/ / notify the client that the write was successful

SendtoClient ("write success")

}

The server handles the pseudo code of the client read request:

Public void handleRead (ReadMessage rm) {

/ / get key from read message

String key=rm.getKey ()

/ / calculate the node number that actually processes the read message

Int handlerID=key.hashCode () ClusterNodeNum

/ / used to store read data

String readData= ""

/ / determine whether the read message is processed by this server

If (MyClusterID==handleID) {

/ / the read message is processed by this server and written to disk

ReadData=readMessage (rm)

} else {

/ / send the read message to the corresponding server for processing

ReadData=readMessage (handleID,rm)

}

/ / notify the client that the read was successful

SendtoClient (readData)

}

The client transfers the pseudo code on the server side:

Public static void main (String [] args) {

Random rnd=new Random ()

/ / list of all servers

String [] servers=new String [] ("192.168.10.0", "192.168.10.1", "192.168.10.2", "192.168.10.3", "192.168.10.4")

/ / Random algorithm selects a server for load balancing

Client client = new Client (servers.nextInt (servers.length)])

/ / write data to the server

Client.write (new WriteMessage ("key1", "value1"))

/ / read data from the server

String value=client.read (new ReadMessage ("key1"))

}

At this point, I believe you have a deeper understanding of "what is the principle of consistent Hash". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.