Rule processing of distributed cache load balancing: improvement of consistent hash in virtual nodes using fixed hash algorithm to balance load 07/08 Update SLTechnology News&Howtos

Rule processing of distributed cache load balancing: improvement of consistent hash in virtual nodes using fixed hash algorithm to balance load

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

In the large-scale cache application, the distributed cache system arises at the historic moment. How is key-value evenly distributed into the cluster? The most conventional way is to take the mold by hash. For example, if the appropriate number of machines available in the cluster is N, then the data request with a key value of K should simply be routed to the machine corresponding to hash (K) mod N. However, in some rapidly developing web systems, this solution still has some defects. With the increase of system access pressure, the cache system has to increase the corresponding speed and data carrying capacity of the cluster by adding machine nodes. Adding machines means that according to the way of hash modeling, at this moment when adding machine nodes, a large number of cache misses, the cache data needs to be re-established, or even the overall cache data migration, which will instantly bring extremely high system load to DB, and the setting will lead to DB server downtime.

If it is not cached data, but persistent data, then when the capacity is expanded, most of the data will have to be migrated (the cardinality N of the module has changed), which is also unbearable.

Consistent hash load balancing

A consistent hash is introduced to solve the problem of instantaneous increase of load caused by the above increase or decrease of machines.

By being responsible for each area within an integer range, the load of the node responsible area will not migrate on a large scale with the increase or decrease of nodes.

But the simplest consistent hash, when increasing or decreasing physical machines, seems to have to double the number of nodes or subtract half of the nodes to ensure the load balance of each node.

Improvement of consistent Hash by Virtual Node

For the problem of uneven load distribution of consistent hash, it is proposed that the virtual node improves the consistent hash.

Four physical nodes can become many virtual nodes, each of which supports a segment on a continuous hash ring. At this time, if a physical node is added, many virtual nodes will be added accordingly, and these new virtual nodes are relatively evenly inserted into the whole hash ring, so that the pressure of the existing physical nodes can be well shared. If a physical node is reduced, many of the corresponding virtual nodes will fail, so there will be many remaining virtual nodes to undertake the work of the previous virtual nodes, but for the physical nodes, the increased load is relatively balanced.

Therefore, the problem of load imbalance when adding or decreasing nodes can be solved by means of a physical node corresponding to a large number of virtual nodes, and the virtual nodes of the same physical node are evenly distributed as far as possible.

As for how many virtual nodes a physical node corresponds to in order to achieve a better equilibrium effect, there is a graph

The x-axis represents the multiple of virtual nodes (scale) that need to be expanded for each physical server, and the y-axis is the actual number of physical servers. You can see that when the number of physical servers is very small, larger virtual nodes are needed, otherwise, fewer nodes are needed. As can be seen from the figure, when there are 10 physical servers, almost 100,200 virtual nodes need to be added for each server to achieve real load balancing.

Custom calculation method for mapping tables and rules

Mapping represents the method of determining the data source according to the look-up table method of the value of the sub-library and sub-table field, which is generally used for the special processing of hot data, or to supplement the rules that do not completely conform to the law in some scenarios.

The final sub-library can be calculated through a custom function implementation. For example, suppose the id module is divided into four libraries, but for some hot id, we want to separate them to another library, so this can be done with an expression similar to the following:

If (id in hotset) {

Return nodes

}

Return hash (id)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.