Example Analysis of EC for Ceph distributed Storage of erasure codes 11/02 Update SLTechnology News&Howtos

Example Analysis of EC for Ceph distributed Storage of erasure codes

2025-11-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article shares with you the content of the sample analysis of EC for Ceph distributed storage of erasure codes. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

1. Copy orEC?

Is it difficult to put a file on disk? It's not hard. Just put it in.

What if it's a very important document? It's not difficult. Just put more copies.

Do you still remember how to treat the graduation thesis? one is saved in the computer, one in the flash drive, another in the network disk, and even one in several network disks. I was trembling in my heart for fear of months of hard work. ) in vain. This is the copy storage.

Assuming that our storage is carried out in the form of three copies, we can calculate that the actual utilization rate is a very low value, 33.3%. Well, is there any better way? yes, it is the erasure code scheme.

Using erasure codes to store files is divided into three steps:

Divide a file into K blocks

The K blocks are connected in a certain way to generate M check blocks.

When some data blocks are lost, the check blocks are used to recalculate the missing data blocks.

As an example, the utilization rate of the erasure code scheme is 62.5 when the value of KQuery M is 5 and 3. In the case that the loss of three data blocks can also be tolerated, the capacity utilization of the erasure code scheme is nearly twice as high as that of the replica scheme.

You can see that the key point is how to calculate the check block and how to use it for data recovery, and then we will focus on these two parts, which is the principle of EC.

If we are afraid that one of the data of D1 and d2 may be lost, then we need to use these three data to generate a new check data through calculation. To put it bluntly, one of the easiest ways

Save a value added by the three directly, so when one of the data is lost, you can recover it by subtracting the other two data from C1.

At this point, the simplest EC algorithm that can allow a data loss is successfully constructed.

EC algorithm which can lose two data

Similar to losing a piece of data, it's easy to think of directly constructing another check block, such as

But is this really okay? if you think about it, it's a very stupid thing to do. If you really lose two pieces of data, these two identical equations can't solve two unknowns at all, because there are two unknowns but only one valid equation. So what is the relationship between the coefficient vectors of these two equations? here we introduce a definition of linear algebra, and they are linearly related. That is, one of them can be represented linearly by the other.

Knowledge: linear correlation. Students who don't understand it will go back to the furnace by themselves.

So, we can change the coefficient of the c2 equation, take any one, and make it linearly independent of the other, for example:

Obviously, we can easily recover the two missing data (that is, solve those two values) by using the method of solving the binary first-order equation that we learned in the first year of junior high school.

Generalization of EC algorithm

If we write the above equation in the form of matrix multiplication, as follows

Through the above analysis, we can summarize with a very simple (low) mathematical idea that when the row vectors of the generating matrix (that is, the coefficient matrix of the check formula) are not related, the generated parity data can be recovered when the data is lost. At this point, if we recall the knowledge of linear algebra as a freshman, it requires that the generated matrix be invertible.

Knowledge points: inverse matrix.

If the multiplication of two matrices is equal to the unit matrix, they are called inverse matrices of each other.

Just now I casually took the coefficient of c2 as 7 63 (actually my UM account suffix. It happens to be feasible, but in the case of multiple, it is necessary to have a regular value of the generating matrix and ensure that the matrix is invertible. For example, we can use the following rule

F = [0forme 0fort0memores 0memores 0memores 0memores 1memores 1memores 1memores 1magicles 1pyroria 1pyroria 1pyrorogies 1pyroria 3pyroros 4pyroros 4pyrorts 4pyroros 9pens 16pence25]; F_INV = inv (F); inverse matrix of% finding F C = [7pt022ternt61ter197]; D = 5.0000 2.0000 8.0000 7.0000 0

As you can see, the lost data has been perfectly restored.

Thank you for reading! On the "Ceph distributed storage erasure code EC example analysis" this article is shared here, I hope the above content can be of some help to you, so that you can learn more knowledge, if you think the article is good, you can share it out for more people to see it!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.