Example Analysis of ceph-pg Hash 07/09 Update SLTechnology News&Howtos

Example Analysis of ceph-pg Hash

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Editor to share with you the example analysis of ceph-pg hash, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

Preface

Introduction to ceph

Ceph is a unified distributed storage system, which is designed to provide better performance, reliability and scalability. The bottom rados distributed storage is the essence of ceph. Ceph encapsulates the object, block and file storage interface on the upper layer based on rados. The client can connect directly to the rados cluster through the librados library provided by ceph, and the waiting object obtains the specified machine through calculation, and then sends the data to the machine for storage. Because it is all through computing, rados does not need to maintain a central table to store the location of objects. This computational approach is not only fast, but also saves nodes' resources. The following figure shows the overall framework of ceph:

How data is mapped

Ceph pools all storage resources in the cluster, and objects are directly mapped to the underlying osd through computing. In order to better manage and map data, ceph has the following concepts. -object: the object that the user needs to store, such as document, video, audio, and so on. The user needs to specify a unique object name when storing the object. -pool: resource pool, which is a virtual concept. A cluster can be divided into multiple pool, or a single pool can be used, but there must be a pool. Users can specify different CRUSH rules for different pool or different data redundancy policies for different pool. -pg: the full name is placement groups. A pool corresponds to multiple pg. Through the hash object, it is stored in a specific pg in the pool. When you create a pool, plan the number of pg. The number of pg can only be increased, not reduced. -osd: the final data is stored on a specific osd. Normally, an osd will manage a disk.

The mapping logic architecture of the object is as follows:

Pg hash algorithm

Pg hash algorithm code:

/ * x: object key value b: pg number bmask: mask * / static inline int ceph_stable_mod (int x, int b, int bmask) {if ((x & bmask))

< b) return x & bmask; else return x & (bmask >

> 1);}

Object corresponds to different pg through hash, and ceph uses mask instead of modularization operation. For example, if the highest bit corresponding to the number of PG is n, its mask is 2 ^ n-1. When mapping an object to PG, you can directly use object & (2 ^ n-1). This direct and way is actually taking the last n bits of object binary as pg number, which is not only efficient, but also fast. One may ask where this n comes from, but n is actually the 1 in the highest bit of 1 in the binary representation of PG numbers. If the number of PG is set to an in ceph, then the program will get a minimum n to expand the capacity of a 16. According to the description of the above figure, the data of 1x4 within the cluster will be migrated from 8 to > 12, that is, the data with pg number of 0x3 will be migrated to 8x11, while the data of 4x7 will not be migrated (because of re-hash it still falls to the interval of 4x7). After the data migration is completed, the number of pg can be changed from 12 to 16, in which case the data of 1ap4 will still be migrated (data with pg of 4x7 will be migrated to 12315). In this way, the total data migration amount of the data migrated in two steps is the same, both of which are 1 / 2, but the data migration can be alleviated over a period of time if the data is migrated step by step.

If it is not for double capacity expansion, the situation may be more complicated, and we can also adopt this solution of increasing the migration time in exchange for less data migration in a short period of time.

The disadvantages of direct modeling

Some people may ask why we don't use simple modeling, because the amount of data migrated in this way cannot be controlled during expansion.

Let: the pg number is a, the pg number needs to be expanded to b, the least common multiple of an and b is d, and the key value of an object is Q, thus the following two equations can be obtained: ca + e = Q, e

< a, q < d;ub + v = q, v < b, q < d;因为最小公倍数为 d, 如果 c != 0 && u != 0, 所以 ca != ub, 所以 e != v当 c == u == 0的情况下, e == v,所以需要迁移数据比例为: (d-a)/d 由此可以得到直接取模的方式无法控制迁移的数据量，例如 pg 数从 8 ->

12, if you take the model directly, you need to migrate the data of (24-8) / 24 = 2Univer 3.

The above is all the content of this article "sample Analysis of ceph-pg Hash". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.