Example Analysis of CRUSH algorithm in Ceph 07/01 Update SLTechnology News&Howtos

Example Analysis of CRUSH algorithm in Ceph

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the example analysis of CRUSH algorithm in Ceph, which is very detailed and has certain reference value. Friends who are interested must finish it!

1. Introduction of bucket data structure.

Struct crush_bucket {

The ID number of _ _ S32 id; / / bucket. All bucket numbers in crushmap are negative.

Type of _ U16 type; / / bucket (uniform/list/tree/straw/straw2)

Algorithm used by _ _ U8 alg; / / bucket

Which hash algorithm is used in _ _ U8 hash; / / bucket

_ _ U32 weight; / / bucket weight value

The number of items contained in _ _ U32 size; / / bucket

_ _ S32 * items content contained in items; / / bucket

_ _ U32 perm_x; / / caches randomly arranged input values

_ _ U32 perm_n; / / cache the number available in the random arrangement result perm

_ _ U32 * perm; / / A pair of cache random arrangement results of items in bucket

}

Second, uniform type.

1. The data structure of uniform type bucket.

Struct crush_bucket_uniform {

Struct crush_bucket h; / / General bucket definition

Weight values of all items in _ _ U32 item_weight; / / uniform bucket (bucket of uniform type whose weight values of all items are the same)

}

2. Analysis of uniform type bucket algorithm.

Enter parameters:

1) crush_bucket_uniform structure

2) the input value x of the operation to be performed

3) enter the copy position r of the value x

Output parameters:

1) item in bucket calculated by uniform algorithm

Algorithm analysis:

1) for the first execution of the uniform algorithm, a random number is calculated by the hash () function, and then a random item position is obtained by taking a module from the bucket.h.size. Then the random position is written to bucket.h.perm [0] and the bucket.h.item specified by the random number is returned.

2) for the subsequent execution of the uniform algorithm, because there are already random numbers in bucket.h.perm [] and the number of random numbers available in bucket.h.perm [] is saved in bucket.h.perm_n, for copy position r, if r%bucket.h.size=bucket.h.perm_n, it is necessary to execute the hash () function to calculate a random number and then bucket.h.size-bucket.h.perm_n to get I. Then the contents of the bucket.h.perm_n+i and bucket.h.perm_n locations in bucket.h.perm [] are exchanged, and then the bucket.h.perm_n++, executes repeatedly until r%bucket.h.size

< bucket.h.perm_n。最后从bucket.h.perm[r%bucket.h.size]获取随机数，之后返回该随机数指定的bucket.h.item；

Flow chart of uniform type algorithm

3. Applicable conditions of uniform algorithm.

1) the weights of all items in bucket are the same, that is to say, uniform algorithm does not consider weights.

2) the uniform algorithm is suitable for the situation where the item in bucket does not change frequently. If it changes frequently, the bucket.h.size will change, resulting in the need for bucket.h.perm_n recalculation and large-scale data migration.

Third, list type.

1. The data structure of list type bucket.

Struct crush_bucket_list {

Struct crush_bucket h; / / General bucket definition

Weight value of each item in _ _ U32 * item_weight; / / bucket

The sum of the weights from the first item to the I item in _ _ U32 * sum_weight; / / bucket

}

2. Analysis of list type bucket algorithm.

Enter parameters:

1) crush_bucket_list structure

2) the input value x of the operation to be performed

3) enter the copy position r of the value x

Output parameters:

1) item in bucket calculated by list algorithm

Algorithm analysis:

1) starting from the last item in bucket, traverse the entire items in reverse, and execute the hash () function to calculate a random number w

2) multiply the random number w by bucket.sum _ weight [I], and then move the result 16 bits to the right

3) judge the result and bucket.item_weight [I] after moving 16 bits to the right. If the result is less than bucket.item_weight [I], return bucket.h.items [I] directly, otherwise reverse traverse the next item in bucket.

4) if all the results after moving 16 bits to the right have heavy rain bucket.sum_weight [I], then bucket.h.items [0] will be returned finally.

List type algorithm schematic

Flow chart of list type algorithm

3. Applicable conditions of list algorithm.

1) it is suitable for cluster expansion type. When item is added, optimal data movement is produced. Because when an item node is added to the list bucket, it will be added to the head section, so the sum_weight of the other nodes will not change, just add the sum of sum_weight and weight on the old_head to the sum_weight of the new_head. In this way, there is no need to move data between other item, and the data on other item only needs to be compared with head. If the calculated w value is less than the weight of head, it needs to be moved to head, otherwise it will be saved on the original item. In this way, we can get the best and least data movement.

2) A disadvantage of list bucket is that when searching item nodes, the time complexity of sequential search is O (n).

Fourth, tree type.

1. The data structure of tree type bucket.

Struct crush_bucket_tree {

Struct crush_bucket h; / / General bucket definition

_ _ U8 num_nodes; / / record the number of all nodes in the node_weights (including binary tree intermediate nodes and leaf nodes)

_ _ U32 * node_weights; / / in addition to the weight value of item in bucket, node_weights also contains the weight value of a binary tree structure, where item in bucket is the leaf node of the tree, and the weight value of the middle node of the binary tree is the sum of the weight values of the left and right bytes.

}

2. Analysis of tree type bucket algorithm.

Enter parameters:

1) crush_bucket_tree structure

2) the input value x of the operation to be performed

3) enter the copy position r of the value x

Output parameters:

1) item in bucket calculated by tree algorithm

Algorithm analysis:

1) find the root node of the binary tree, that is, n=bucket.num_nodes > > 1

2) determine whether the current node is a leaf node, if not, get the weight value of the corresponding intermediate node on the binary tree from bucket.node_ weights [n], then execute the hash () function to a random number, then multiply the random number by the weight value of the intermediate node, and then move 32 bits to the right. Compare the adjusted weight value with the weight value of the left subtree of the intermediate node. If it is less than the weight value of the left subtree, the traversal starts from the left subtree, otherwise the traversal starts from the grapefruit tree.

3) if the current node reaches the leaf node, the item specified by the leaf node is returned, that is, bucket.h.items [n > > 1]

Tree type algorithm schematic

Flow chart of tree type algorithm

3. Applicable conditions of tree algorithm.

1) the time complexity of positioning data using tree algorithm is O (logn), which makes it suitable for managing much larger number of devices or nested buckets.

2) Tree buckets is a kind of buckets suitable for any situation, which has both high performance and excellent recombination efficiency.

Fifth, straw type.

1. The data structure of straw type bucket.

Struct crush_bucket_straw {

Struct crush_bucket h; / / General bucket definition

_ _ U32 * item_weights; / / Save the weight values of all item in bucket

_ _ U32 * straws; / / saves the weight value calculated based on the item weight value

2. Analysis of straw type bucket algorithm.

Enter parameters:

1) crush_bucket_straw structure

2) the input value x of the operation to be performed

3) enter the copy position r of the value x

Output parameters:

1) item in bucket calculated by straw algorithm

Algorithm analysis:

1) iterate through all the items in backet sequentially, execute hash () algorithm to calculate a random number for each item in bucket, and then multiply the random number with bucket.staws [I] to get a modified random value.

2) compare the modified random values of all items in bucket calculated by hash () algorithm and find the maximum modified random values.

3) return the item where the maximum modified random value is located, that is, bucket.h.item [high]

Straw algorithm schematic diagram

3. Analysis of the generation process of straw array.

1) according to the bucket.item_ weights [bucket.h.size] array, generate an array reverse [bucket.h.size] sorted by items weight values from smallest to largest, and reverse [bucket.h.size] holds the subscript of item weight values in order.

2) initialize the variables used to calculate the straw algorithm.

Numleft=bucket.h.size

Straw=1.0

Lastw=0

Wbelow=0

3) iterate through the entire items and calculate the straw value corresponding to items according to the following algorithm.

A) for bucket.item_ weights [claims [I]] = 0, then Straw [weights [I]] = 0

B) set the strain value. Bucket.straw [cake [I]] = straw*0x1000

C) variable item1

D) calculate the wbelowvalue. Wbelow= (bucket.item_ weights [I-1])-lastw) * numleft

E) variable numleft-1

F) calculate the wnextvalue. Wnext=numleft* (bucket.item_ weights [I]-bucket.item_ weights [Revese [I-1])

G) calculate the pbelows. Pbelow=wbelow/ (wbelow+wnext)

H) calculate the strain value. Straw=pow (1xxxxxxxxxxxxxx)

I) calculate the lastw value. Lastw=bucket.item_ weights [I-1]

For all items in bucket, the greater the weight, the greater the calculated straw value.

From the point of view of the algorithm, calculating the straw value of item is related to the weight value of item and the weight value before item, so when modifying the weight value of an item in bucket, it will affect the straw value of the item and the items before and after the item.

4. The applicable conditions of straw algorithm.

1) consider the influence of weight on data distribution, that is, for item with high weight value, the probability of being selected is greater, and the more data fall on item with high weight value.

2) buckets of straw type can provide the best solution for data movement between subtrees.

6. Straw2 type.

1. The data structure of straw2 type bucket.

Struct crush_bucket_straw2 {

Struct crush_bucket h; / / General bucket definition

_ _ U32 * item_weights; / / Save the weight values of all item in bucket

}

2. Analysis of straw2 type bucket algorithm.

Enter parameters:

1) crush_bucket_straw2 structure

2) the input value x of the operation to be performed

3) enter the copy position r of the value x

Output parameters:

1) item in bucket calculated by straw2 algorithm

Algorithm analysis:

1) iterate through all the items of the whole bucket to get the corresponding weight value of items

2) for the non-zero weight value, first execute the hash () function to calculate a random number, then use the random number as a parameter to call the result of the minimum exponential random variable distribution algorithm, and then subtract 0x1000000000000, and finally divide the result by the item weight value to get the corresponding final value of item (draw). For a zero weight value, the final value (draw) is the minimum

3) compare the final value (draw) calculated by all the items in the whole bucket, and take the item with the highest final value, that is, bucket.h.items [high]

From the point of view of the algorithm, calculating the straw value of item is only related to the weight value of item and has nothing to do with the weight value of other items in bucket. Therefore, when you modify the weight value of an item in a bucket, it will not affect the straw value of other items in that bucket.

3. Applicable conditions of straw2 algorithm.

2) buckets of straw type can provide the best solution for data movement between subtrees.

3) suitable for scenarios with dynamic changes of items weights in bucket

The above is all the content of the article "sample Analysis of CRUSH algorithm in Ceph". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.