What is the new feature of HDFS3.x EC 07/06 Update SLTechnology News&Howtos

What is the new feature of HDFS3.x EC

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Editor to share with you what the new feature of HDFS3.x EC is, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

EC introduction

Erasure Coding is abbreviated to EC, Chinese name: erasure code

EC (erasure code) is a kind of coding technology. Before HDFS, this coding technology was most widely used in cheap disk redundant array (RAID) (RAID introduction: big data preliminary knowledge-storage disk, disk redundant array RAID introduction). RAID implements EC through striping technology, which is a technology that automatically balances the load of O to multiple physical disks. The principle is to divide a continuous piece of data into many small parts and store them on different disks, which enables multiple processes to access different parts of the data at the same time without disk conflicts (when multiple processes access a disk at the same time, disk conflicts may occur). And when you need to access this kind of data sequentially, you can get the maximum Icano parallelism. As a result, very good performance can be obtained. In HDFS, dividing continuous data into many small parts is called striping unit. For each stripe unit of the original data unit, a certain number of parity units are calculated and stored, and the process of calculation is called coding. Any error on the striping unit can be recovered by decoding calculation based on the remaining data and the parity unit.

Redundant Storage Strategy of HDFS data

The storage strategy of HDFS is replica mechanism, which improves the security of data storage, but also brings additional overhead. The default 3-replica scheme of HDFS has 200% additional overhead on storage space and other resources (such as network bandwidth). However, for data with relatively low IHDFS O activity, other block replicas are rarely accessed during normal periods. But it still consumes the same amount of resources as the first copy. Therefore, a major improvement in HDFS 3.x version is the use of erasure codes (EC) instead of copy mechanisms. Erasure codes provide the same fault tolerance as copy mechanisms with much less storage space. In a typical erasure code (EC) setting, the storage overhead does not exceed 50%.

Implementation principle of EC algorithm

There are many algorithms to implement EC. One of the more common algorithms is Reed-Solomon (RS), which has two parameters, denoted as RS (kforce m), k for data block and m for parity block. The maximum number of parity blocks (including data block and parity block) is tolerated by the number of parity blocks. The specific principle is explained by the following example:

We use RS (3prime2), which means that 3 raw data blocks and 2 check blocks are used

Example: there are three original data of 7, 8 and 9, and two check data 50 and 122 are calculated by matrix multiplication. At this time, the original data plus check data, a total of five data: 7, 8, 9, 50, 122, you can lose two at will, and then recover through the algorithm

Matrix multiplication

GT is the generating matrix, and the generating matrix of RS (kjime m) is the matrix of m rows and k columns.

Data represents raw data, and 7, 8, and 9 represents raw data blocks.

Parity represents check data, and 50122 represents check data block

So for three original data blocks, if two check blocks are used, EC coding takes up a total of five blocks of disk space, which is equivalent to the fault tolerance of six blocks occupied by the two-copy mechanism.

Application scenarios of EC

Integrating EC technology into HDFS can improve storage efficiency while still providing data persistence similar to traditional replica-based HDFS deployments. For example, a 3-copy file with 6 blocks will consume 6 * 3 = 18 disk space. However, when deployed with EC (6 data, 3 parity), it will consume only 9 blocks of disk space.

However, EC will use a lot of CPU resources during the coding process and data reconstruction, and most of the data are read remotely, so there will be a lot of network overhead.

Therefore, when the CPU resources are tight and the storage cost is low, the copy mechanism can be used to store the data. When there is a surplus of CPU resources and the storage cost is high, the EC mechanism can be used to store the data.

The structure of EC in HDFS

HDFS uses Online EC directly (writing data in EC format), avoiding the conversion phase and saving storage space. Online EC also enhances sequential I / O performance by utilizing multiple disk spindles in parallel. This is especially ideal in clusters with high-end networks. Second, it naturally distributes a small file to multiple DataNode without bundling multiple files into a coding group. This greatly simplifies file operations, such as deletion, disk quotas, and migration between namespaces.

In general HDFS clusters, small files can account for more than 3 to 4% of the total storage consumption. In order to better support small files, HDFS supports the EC scheme of bar layout (Striping Layout) in the first stage, and the HDFS continuous layout (Contiguous Layout) scheme is also under way.

Bar layout:

Advantages:

Less data is cached on the client

Applies regardless of file size

Disadvantages:

Can affect the performance of some location-sensitive tasks because blocks that were originally on one node are scattered across several different nodes

It is troublesome to convert and multi-copy storage strategy.

Continuous layout:

Continuous layout

Advantages:

Easy to implement

Convenient and multi-copy storage strategy for conversion

Disadvantages:

The client needs to cache enough data blocks

Not suitable for storing small files

In traditional mode, the basic unit of files in HDFS is block, while in EC mode, the basic unit of files is block group. Take RS (3 block group 2) as an example, each block group contains 3 data blocks and 2 parity blocks.

The main extensions that HDFS has made to introduce the EC pattern are as follows:

The NameNode:HDFS file is logically composed of block group, and each block group contains a certain number of internal blocks. in order to reduce the memory consumption of these internal blocks to NameNode, HDFS introduces a new hierarchical block naming protocol. The ID of block group can be inferred from the ID of any of its internal blocks. This allows management at the block group level rather than at the block level

Client: client read and write paths have been enhanced to process multiple internal blocks in block group in parallel

DataNode:DataNode runs additional ErasureCodingWorker (ECWorker) tasks for background recovery of failed erasure correction blocks. NameNode detects a failed EC block and selects a DataNode for recovery work. This process is similar to how to restore the block of a copy if it fails. Rebuild and execute three key task nodes:

Read data from the source node: read input data from the source node in parallel using a dedicated thread pool. Based on the EC policy, a read request is initiated for all sources and targets, and only a minimum number of input blocks are read for reconstruction.

Decoding data and generating output data: decoding new data and parity blocks from input data. All lost data is decoded together with the parity block.

Transfer the generated data block to the target node: after decoding is complete, the recovered block will be transferred to the target DataNodes.

Erasure strategy: files and directories in a HDFS cluster are allowed to have different replication and erasure policies to accommodate heterogeneous workloads. The erasure code strategy encapsulates how to encode / decode a file. Each policy is defined by the following information:

EC mode: this includes the number of data and parity blocks in the EC group (for example, 6 + 3), as well as codec algorithms (such as Reed-Solomon,XOR).

The size of the striped unit. This determines the granularity of stripe reads and writes, including buffer size and encoding.

We can define our own EC policy through the XML file, which must contain the following three parts:

Layoutversion: this indicates the version of the EC policy XML file format.

Schemas: this includes all user-defined EC schemas.

Policies: this includes all user-defined EC policies, each consisting of schema id and striped unit size (cellsize).

There is a sample XML file to configure the EC policy in the Hadoop conf directory, which you can refer to when configuring, and the file name is user_ec_policies.xml.template.

Hardware configuration of cluster

Erasure codes place additional requirements on the cluster in terms of CPU and network:

Encoding and decoding work consumes additional CPU on the HDFS client and DataNode.

Erasure code files are also distributed throughout the rack to achieve rack fault tolerance. This means that when reading and writing striped files, most operations are done on the rack. Therefore, the bisection bandwidth of the network is very important.

For rack fault tolerance, it is also important to have at least as many racks as the configured EC stripe width. "for the EC policy RS (6. 3), this means a minimum of 9 racks, ideally 10 or 11 racks, to handle planned and unplanned outages." For clusters with racks less than stripe width, HDFS cannot maintain rack fault tolerance, but will still try to distribute striped files across multiple nodes to preserve node-level fault tolerance.

Last

By default, all EC policies are disabled by HDFS, and we can enable the EC policy through the hdfs ec [- enablePolicy-policy] command based on the size of the cluster and the desired fault-tolerant attributes. For example, for a cluster with 9 racks, a policy such as RS-10-4-1024k will not retain rack-level fault tolerance, while RS-6-3-1024k or RS-3-2-1024k may be more appropriate.

Under the replica mechanism, we can set the replica factor and specify the number of replicas, but under the EC policy, it is meaningless to specify the replica factor because it is always 1 and cannot be changed through the relevant command.

The above is all the content of the article "what is HDFS3.x 's new feature EC?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.