How to use Ceph erasure codes 04/28 Update SLTechnology News&Howtos

How to use Ceph erasure codes

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article is about how to use Ceph erasure code. Xiaobian thinks it is quite practical, so share it with everyone for reference. Let's follow Xiaobian and have a look.

1. Principle of erasure code

Erasure coding (EC) is a coding fault-tolerant technology, which was first used in the communication industry to solve the problem of partial data loss in transmission. The basic principle is to segment the transmitted signal, add a certain check and then correlate the segments. Even if part of the signal is lost in the transmission process, the receiver can still calculate the complete information through the algorithm. In data storage, erasure codes divide data into fragments, spread and encode redundant data blocks, and store them in different locations, such as disks, storage nodes, or other geographic locations. If it is necessary to distinguish strictly, in fact, according to the different functions of error control, it can be divided into three types: error detection, error correction and erasure.

·The error detection code only has the function of identifying the error code and has no function of correcting the error code.

·Error correction codes not only identify error codes, but also correct error codes.

Erasure codes not only have the function of identifying and correcting error codes, but also delete information that cannot be corrected when the error code exceeds the correction range.

From the basic form of erasure code, it is k data blocks +m check block structure, where k and m values can be set according to certain rules, can be expressed by the formula: n=k+m. The variable k represents the value of the original data or symbol. The variable m represents the value of additional or redundant symbols added after a fault to provide protection. The variable n represents the total value of symbols created after the erasure coding process. When less than m memory blocks (data blocks or check blocks) are damaged, the whole data block can be obtained by calculating the data on the remaining memory blocks, and the whole data will not be lost.

Taking k=2, m=1 as an example, we will introduce how to store an object named cat.jpg in Ceph in the form of erasure code, assuming that the content of the object is ABCDEFGH. After uploading cat.jpg to Ceph, the client invokes a corresponding erasure code algorithm in the main OSD to encode and calculate the data: the original ABCDEFGH is split into two fragments, corresponding to stripe fragment 1(ABCD) and stripe fragment 2(EFGH) in Figure 11-2, and then another check stripe fragment 3(WXYZ) is calculated. According to the rules specified by crushmap, the three fragments are randomly distributed on three different OSD to complete the storage operation of this object. as shown in the figure.

Let's take a look at how to read data using erasure codes, again using cat.jpg as an example. After the client initiates a request to read cat.jpg, the main OSD of the PG where the object is located will initiate a read request to other associated OSDs, for example, the main OSD is OSD1 in the figure. When the request is sent to OSD2 and OSD3, it happens that OSD2 fails to respond to the request at this time, resulting in that only the strip fragments of OSD1 (the content is ABCD) and OSD3 (WXYZ) can be obtained finally. At this time, OSD1, as the main OSD, will perform erasure code decoding operation on the data fragments of OSD1 and OSD3, and calculate the fragment content (i.e. EFGH) on OSD2. Then reassemble the new cat.jpg content (ABCDEFGH) and finally return the result to the client. The whole process is shown in the figure.

Although erasure codes can provide data reliability similar to that of replicas and reduce redundant data overhead, they can increase the available space of storage devices as a whole. However, erasure codes bring a lot of extra overhead, mainly a large number of calculations and high network load, advantages and disadvantages. Especially in the case of a hard disk failure, reconstructing the data is very expensive CPU resources, and calculating a data block requires reading a large amount of data and transmitting it over the network. Compared with duplicate data recovery, erasure code data recovery brings a huge burden to the network. Therefore, the use of erasure codes is a big test of hardware performance, which needs to be noted. In addition, it should be noted that RBD block devices cannot be newly created in the storage resource pool established by using erasure codes.

After Ceph is installed, there is Default Rule by default. This Rule defaults to three copies of reading and writing at the Host level. The advantages of replica technology are high reliability, excellent read and write performance, and fast replica recovery. However, the cost pressure brought by replica technology is relatively high, especially in the case of three-copy data, the cost per TB of data is more than 3 times the bare capacity of the hard disk (including the node CPU and memory sharing overhead). Erasure codes have high availability similar to replica codes, reduce redundant data overhead, and bring a lot of computation and high network load.

II. Erasure Code Practice

Erasure codes are implemented by creating Ceph pools of type erasure. These pools are created based on an erasure code profile in which erasure code eigenvalues are defined. We will now create an erasure code profile and create an erasure code pool from this profile. The following command creates an erasure code profile called Ecprofile, which defines the characteristic values k=3 and m=2, which represent the number of data blocks and check blocks, respectively. Therefore, each object stored in the erasure code pool will be divided into 3(i.e. k) data blocks and 2(i.e. m) additional check blocks, for a total of 5 blocks (k+m). Finally, these 5(i.e. k+m) blocks will be distributed over OSD in different fault areas.

1. Create an erasure code configuration file:

# ceph osd erasure-code-profile set Ecprofilecrush-failure-domain=osd k=3 m=2

2. View configuration files

# ceph osd erasure-code-profile ls

Ecprofile

default

# ceph osd erasure-code-profile get Ecprofile

crush-device-class=

crush-failure-domain=osd

crush-root=default

jerasure-per-chunk-alignment=false

k=3

m=2

plugin=jerasure

technique=reed_sol_van

w=8

We'll also look at Ceph's default configuration file in passing

# ceph osd erasure-code-profile get default

k=2

m=1

plugin=jerasure

technique=reed_sol_van

3. Create a new Ceph pool of erasure type based on the erasure code configuration file generated in the previous step:

# ceph osd pool create Ecpool 16 16 erasureEcprofile

pool 'Ecpool' created

Check the state of the newly created pool and you will find that the pool size is 5(k+m), that is, the erasure size is 5. Therefore, the data will be written into five different OSDs:

# ceph osd dump | grep Ecpool

pool 8 'Ecpool' erasure size 5 min_size 4crush_rule 3 object_hash rjenkins pg_num 16 pgp_num 16 last_change 231 flagshashpspool stripe_width 12288

5. Now we create a file and put it in the erasure code pool.

# echo test > test

# ceph osd pool ls

Ecpool

# rados put -p Ecpool object1 test

# rados -p Ecpool ls

object1

6. Check the OSDmap of EC pool and object1. The output of the command clearly shows the OSDID of each block of the object. As explained in step 1), object1 is divided into 3(m) data blocks and 2(k) extra check blocks, so the 5 blocks are stored on completely different OSDs of the Ceph cluster. In this demo, object1 is always stored in these five OSDs: osd.5, osd.1, osd.3, osd.2, osd.4.

# ceph osd map Ecpool object1

osdmap e233 pool 'Ecpool' (8) object'object1' -> pg 8.bac5debc (8.c) -> up ([5,1,3,2,4], p5) acting([5,1,3,2,4], p5)

III. Erasure code test

First, let's close an OSD.

# systemctl stop ceph-osd@3

Stop osd.3 and check the OSDmap of EC pool and object1. You should note that osd.3 here becomes NONE, which means osd.3 is unavailable in this pool:

# ceph osd map Ecpool object1

osdmap e235 pool 'Ecpool' (8) object'object1' -> pg 8.bac5debc (8.c) -> up ([5,1,NONE,2,4], p5) acting ([5,1,NONE,2,4],p5)

2. Let's close an osd again

# systemctl stop ceph-osd@5

Stop osd.5 and check the OSDmap of EC pool and object1. You should note that osd.5 here becomes NONE, which means osd.5 is not available in this pool:

# ceph osd map Ecpool object1

osdmap e237 pool 'Ecpool' (8) object'object1' -> pg 8.bac5debc (8.c) -> up ([NONE,1,NONE,2,4], p1) acting([NONE,1,NONE,2,4], p1)

3. We download files from the erasure code pool

## rados get -p Ecpool object1 /tmp/wyl

Thank you for reading! About "how to use Ceph erasure code" this article will be shared here, I hope the above content can be of some help to everyone, so that everyone can learn more knowledge, if you think the article is good, you can share it to let more people see it!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.