How to implement consistent Hash algorithm with Java 07/12 Update SLTechnology News&Howtos

How to implement consistent Hash algorithm with Java

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article introduces the knowledge of "how to use Java to achieve consistent Hash algorithm". Many people will encounter such a dilemma in the operation of actual cases, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Access Model of distributed Cache Cluster

Now that Redis is usually used for distributed caching, let's take Redis as an example:

If the business of our system is developing rapidly and there is a lot of data to be cached, so we have made a highly available redis cluster composed of three sets of master-slave replicated redis, how to route requests on different redis clusters, this is a common routing algorithm that we need to consider:

Random algorithm: each time the request is randomly sent to one of the Redis clusters, the advantage of this algorithm is that the request will be evenly distributed to each group of Redis cluster. The disadvantage is also obvious. Because the request is randomly distributed, in order to improve the hit rate of the cache, the same data needs to exist in each cluster, which will result in data redundancy and waste of storage space.

Hash algorithm: for the problem of random algorithm, we can consider the Hash algorithm. For example: now there are three groups of redis clusters, and we can model the hash value of each cached key. The formula: the value of index=hash (key)% 3 index corresponds to three groups of clusters, so we can ensure that the same request is distributed to the same redis cluster every time. There is no need for data redundancy, which perfectly solves the shortcomings of the random algorithm.

However, the hash algorithm also has some disadvantages: it has poor support for fault tolerance and scalability. For example, when one of our three groups of redis clusters goes down, the number available in the redis cluster becomes 2 and the formula becomes index=hash (key)% 2, and the node positions of all data caches change, resulting in a straight drop in the cache hit rate.

Similarly, when we need to extend a new set of redis machines, calculating the formula index=hash (key)% 4, a large number of key will be relocated to other servers, resulting in a decline in cache hit rates.

In order to solve the problem of fault tolerance and scalability of hash algorithm, consistent hash algorithm comes into being.

The specific algorithm process of consistent hashing algorithm

First construct an integer ring of length 2 ^ 32-1 (called consistent hash ring), then name each group of redis clusters, and calculate where each cluster should be placed according to the hash value of the name.

The hash value is calculated according to the key of the cached data, and the calculated hash value is also distributed on the consistent hash ring. If there are five data that need to be cached, the corresponding key are key1, key2, key3, key4 and key5, respectively. The breakdown after calculating the hash value is as follows

Then follow the hash ring to find the reids cluster clockwise and store the data on the nearest cluster.

Finally, all key4 and key5 are stored in cluster 2MagneKey1, and key3 are stored in cluster 1MagneKey2 and cluster 3.

Fault tolerance

Continuing with the above example, let's take a look at the fault tolerance of the consistent hash algorithm. If cluster 1 kneels, only key1 and key3 will affect the data, and the location of other data storage will not be affected. When caching key1 and key3 again, the data will be stored on cluster 3 according to clockwise search.

Flexibility

If we need to add another set of redis cluster 4 on the current basis, it is between cluster 1 and cluster 2 according to the location after the name hash

After the addition of redis cluster 4, only key1 data is affected, and other data is not affected.

Data skew problem

Through the verification of fault tolerance and scalability, it is proved that the consistent hash algorithm can solve the problem of Hash algorithm, but is the current algorithm perfect? Let's continue to look at the example of fault tolerance just now. If you join cluster 1, all the data that originally fell on cluster 1 will fall directly on cluster 3. If the configuration of each group of redis clusters is the same, then the pressure on cluster 3 will increase, and uneven data distribution will cause data skew.

What's going on?

The foreigner's brain is good, and the solution is to add a virtual layer, if each group of clusters is assigned two virtual nodes.

Cluster virtual node cluster 1vnode1, vnode2 cluster 2vnode3, vnode4 cluster 3vnode5, vnode6

The next step is to put the virtual node on the consistent hash ring, find the virtual node clockwise when caching the data, and store the data to the redis cluster according to the corresponding relationship between the virtual node and the actual cluster, so that the data will be evenly distributed to each cluster.

At this time, if there is a problem with a group of redis clusters, then the key on this group of clusters will be distributed to other clusters relatively evenly.

Judging from the above results, as long as there are more virtual nodes corresponding to each cluster, the data of each physical cluster will be distributed more evenly, and the impact of new or decreasing physical clusters will be minimized. However, if too many virtual nodes will affect the performance of lookup, too little data will be uneven, so how much is appropriate? The suggestion based on the experience of some great gods is 150 virtual nodes.

Implementation of consistent Hash algorithm in Java version

The idea of implementation: every time we add a physical node, we generate the name of the virtual node according to the name of the physical node, calculate the hash value of the name of the virtual node, and then store the hash value as key and the physical node as value in Map. Here we choose to use TreeMap because we need key to be stored sequentially. When calculating which physical node the data key needs to be stored in, first calculate the hash value of key, and then call TreeMap.tailMap () to return a subset of map that is larger than the value of hash. If the subset is empty, you need to return the first element of TreeMap. If it is not empty, then take the first element of the subset.

No nonsense, just go to the code, No BB. Show me the code

Core code:

Test the code:

Test the deletion of the node node3, and compare the impact of the hit rate by adding the following code:

Execution result:

Test the addition of node node5, and compare the impact of the hit rate by adding the following code:

Execution result:

Other usage scenarios

Looking at the figure above, in order to maximize the local cache hit rate of the application during the distribution of Nginx requests, we want to forward the same request to the same application server according to the URL or URL parameters of the request. At this time, you can also choose to use the consistent hash algorithm. For specific configuration, please refer to the official document: https://www.nginx.com/resources/wiki/modules/consistent_hash/

This is the end of the content of "how to implement consistent Hash algorithm with Java". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.