In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
Today, I will talk to you about how to analyze the consistency HASH algorithm. Many people may not know much about it. In order to let everyone know more, Xiaobian summarizes the following contents for everyone. I hope you can gain something according to this article.
Research on Consistent HASH Algorithm 1. Introduction
When studying Ceph CRUSH algorithm, I saw that there was an article saying that it was a special consistency HASH algorithm, so I began to study consistency HASH algorithm to prepare in advance, and found that the concept was really close, the difference was that the mapping methods of virtual nodes and physical nodes were different, which was the core algorithm of Ceph, which was very critical.
2. Background of consistent HASH and its advantages
In distributed systems, HASH algorithm is often used for data distribution, in order to distribute data evenly to each node and share the pressure, especially in cache systems. A typical design is as follows:
2.1 scene description
In order to improve system performance and solve the performance bottleneck of database, N cache servers are set to isolate application and database, and each cache server is responsible for 1/N data of database. When an application accesses data, it calculates hash values according to keywords, and finds cache nodes where the data is located according to certain rules.
2.2 simplest model
The simplest way is to use hash (key) mode N to calculate the server location where the file falls. Then in the Find Cache Server step, you need to add a touch operation. The problem is:
Cluster adds machines, calculation formula becomes hash (key)/(N + newAddedCount)
Cluster exits machine, calculation formula becomes hash (key)/(N-removedCount)
As the denominator of the calculation formula changes, the calculated value will also change. The cache will be invalidated if the wrong node is found through the touch operation. We need a way to quickly rebuild caches on healthy machines or new machines when scaling up or removing failed machines from a cluster that only causes a small amount of data to fail.
2.3 consistent HASH model
Based on this idea, some professionals have made clear requirements and formed a paper consistency HASH paper.
2.3.1 Consistency HASH Basic Design Ideas
Consistent hashing constructs a hash ring, maps server nodes to the ring, maps obj to the ring, places them in the same space (0~2^32 - 1), and for each object, clockwise searches for the first>= hash (obj) node, which is the target system it wants to store.
! [In the example above, according to the consistency HASH rule, the result after allocation is as follows:
obj 1~obj 3 belong to Cache Server B;
obj 4~obj 7 belong to Cache Server C;
obj 8~obj 11 belong to Cache Server A. It is the result of analyzing according to the above rules.
However, there is a problem with uneven data distribution. As you can see in the figure below, a lot of data will fall on server A, but only a small amount of data will fall on server B and C. The data distribution is clearly unbalanced.
In order to solve the problem of data imbalance, virtual nodes are introduced into consistent HASH, and objects are evenly mapped to virtual nodes, and then virtual nodes are mapped to physical nodes.
By setting up a large number of virtual nodes, the data is evenly distributed on the virtual nodes, and finally the effect of evenly distributed on the physical nodes is achieved. The key points here are: by configuring virtual nodes for each physical node, all virtual nodes can be evenly distributed on the hash ring. Look at the hash function used here. Also look at the number of virtual nodes set by physical nodes.
Here, a consistent hash is one that meets our expectations:
1. Balance: The results after hash can be evenly distributed. For example, in storage, the data can be evenly distributed to each node, and there will be no situation where the amount of data in individual nodes is particularly small and the number of individual nodes is particularly large.
Monotonicity: When a node is added or removed, it is either mapped to the original location or mapped to a new node; it will not map to an invalid node;
2.3.2 Data Distribution Equilibrium Test
In order to test the characteristics of consistent hash algorithm and the influence of virtual nodes on data distribution balance, I implemented a consistent hash algorithm in C++ and carried out statistical experiments.
Under the same test data and the same physical node, test the distribution of data under different number of virtual nodes: Test sample: 10000 URL records, used as object name, hash function used as input of hash function: std: : hash () default in c ++11. The number of virtual nodes parameter in the data refers to the number of virtual nodes corresponding to each physical node.
1 Node 101.71.4.31: 80 764101.71.4.32: 80 2395101.71.4.33: 80 1478101.71.4.34: 80 786101.71.4.35: 80 457710 node 101.71.4.31: 80 1139101.71.4.32: 80 486 210 1.71 4.33: 80 1484 101 71 4.34: 80 1243 101 71 4.35: 80 1272100 node 101.71.4.31: 80 2646101.71.4.32: 80 2576101.71.4.33: 80 1260101.71.4.34: 80 705101.71.4.35: 80 2813512 Virtual node 101.71.4.31: 80 2015101.71.4.32: 80 2038101.71.4.33: 80 1948 101.71.4.34: 80 2128 101.71.4.35: 80 18711024 Virtual node 101.71.4.31: 80 2165101.71.4.32: 80 1389101.71.4.33: 80 2045 101 71 4.34: 80 2123 101 71 4.35: 80 22782048 node 101.71.4.31: 80 1976101.71.4.32: 80 1939101.71.4.33: 80 1892101.71.4.34: 80 2164101.71.4.35: 80 20294096 node 101.71.4.31: 80 1972101.71.4.32: 80 2139101.71.4.33: 80 2095101.71.4.34:80 1879101.71.4.35:80 1915
From the data, we can see that with more and more virtual nodes, the distribution of data is more and more balanced, but after a certain number of virtual nodes reach a bottleneck, it is impossible to achieve absolute balance.
2. By adding or removing nodes, check the movement of data: In the case of removing nodes using 1024 virtual nodes, 32 nodes are disabled, and the following statistical results are obtained:
101.71.4.31:80 2392101.71.4.33:80 2374101.71.4.34:80 2635101.71.4.35:80 2599
I compared the log files and found that 31, 33, 34, and 35 had their own data unchanged, and the extra data was 32, which was evenly distributed to the four machines that were still valid.
Add nodes. When 1024 virtual nodes are used, 36 nodes are added, and the following results are obtained:
101.71.4.31:80 1726101.71.4.32:80 1271101.71.4.33:80 1776101.71.4.34:80 1743101.71.4.35:80 1768101.71.4.36:80 1716
I compared the log files and found that the data on 31, 32, 33, 34, and 35 still belonged to him before, but some of them were migrated to machine 36.
Both experiments illustrate the monotonicity of consistent hashes.
2.3.3 Implementation of Consistent Hash Algorithm//Construct hash ring with virtual nodes, configure several virtual nodes for each real physical node, and sort for (RNode: rnodes){ for (int i = 1; i
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.