Remember the block corrupt event of HDFS 07/06 Update SLTechnology News&Howtos

Remember the block corrupt event of HDFS

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

There are still the last two days of work. I will go home for the Spring Festival tomorrow night, but CDH suddenly reported a block missing error and checked it with hdfs fsck /. Our blocks have a total of 500W, and missing has nearly 100W. Oh, my God, but due to Hdfs's replication mechanism, as long as it is not all lost, it can be repaired. In this way, most of the blocks have been repaired, but there are still more than 3000 blocks lost in 3. After all, the status is corrupt. As a result, hourly reports and daily newspapers were directly affected, and many users' hive and impala queries directly reported incorrectly.

I hastened to find out the cause. I first looked at the alarm message about HDFS on CDH and found that there was an alarm from failover controller. You went to see the missing block and found that the block file was still not lost on the disk, but namenode could no longer sense the existence of the block. I don't know why.

We first look at the file where the corupt block is located according to the result of hdfs fsck /, and find that most of the files are less than 128m. Because our block size is set to 128m, all files less than 128m occupy only one block, and the block file is the same as the source file. To reduce the impact on users, we first move the files that have lost blocks to another directory on hdfs, and then record all the information of the file. Including: (permission list, owner,group, etc.). Then use the script to change a copy of the block of the file less than 128m to the name of the original file and re-put it, and then use the split command to do chown and chmod in batches of multiple processes. (for the above text processing, each of these commands is very tedious, plus loops, it is easy to make mistakes if you are not familiar with them, and high-risk operations. After that, it is normal for users to use this part of the data.

Because we only take these files MOVE away, so fsck will still detect corrupt block, never mind, at this time detected that the path of the file is already the path after our move. There are still files larger than 128m that have not been repaired, which is a bit troublesome. These files have been divided into multiple blocks on disk. If you want to repair, you need to manually assemble these block files into a file and upload them again. We casually looked at one of the remaining files, nearly 100 G, 700 or 800 blocks, N files, it was still very troublesome.

So let's go back to the reason, and first talk about the relevant principles:

Here's the point: now refine this step:

1.datanode first goes to the directories configured with the parameter dfs.datanode.data.dir to look for blocks. Normally, there will be N block files in each directory, and there will also be meta files corresponding to each block, such as blk_1141150594 and blk_1141150594_67410808.meta.

According to the meta file found, 2.DATANODE will record the corresponding block information and report it to namenode.

3. After receiving the report from datanode, namenode recorded the information in blocksmap.

As the manager of file directory and file allocation in HDFS, NameNode saves the most important information, which is the following two mappings:

File name = > data block

Block = > DataNode list

Where the file name = > data blocks are saved on disk (persistent), but the data block = > DataNode list is not saved on NameNode, which is established through DataNode escalation.

Refer to the introduction of blockmaps: http://www.cnblogs.com/ggjucheng/archive/2013/02/04/2889386.html (easy to understand)

When I read a blog post in the evening, it was the same mistake that triggered the switch of NAMENODE's dual active users, and then this problem occurred.

When Datanode incrementally reports the block-datanode to namenode (switched active namenode), edit log has not been synchronized from JournalNode. At this time, there is already a mapping of block-datanode (from the report of datanode just now) in namenode, but there is no mapping of block-file (from the edits file), which causes namenode to think that this block does not belong to any file and is defined as invalidate block. This can be found in the background log (before the background standby has completely become activenamenode, there will be a background log containing invalidate block. )

This is a problem encountered by buddies on the Internet, how to verify that we also have this problem?

Ask datanode to report again, using this command: hdfs dfsadmin [- triggerBlockReport [- incremental]]

For example, the parameter hdf dfsadmin datanode1:50020 plus-incremental is incremental report, and if you don't add it, it means full report.

We executed this command, but the number of corrupt block still hasn't decreased, so that's not the reason.

Finally, it is found that in the parameter dfs.datanode.data.dir configured on our datanode, the directory / data1 is gone, which may have been deleted by misoperation, so that datanode will not go / data1 to perceive block files at all, thus causing this problem.

After this incident, I found that sometimes some steps were ignored in the reverse process, making it impossible to find out the problem and cure such problems. A positive check may find out.

We overlooked a step yesterday, that is, to find out the path of each block file and do some statistics. If we see that it is all in the / data1 directory, we will certainly be able to find the problem.

Now think about it.

However, in the process of solving the problem, I really learned a lot about the internal mechanism of hdfs.

The format is a little messy. I'll revise it later and go home for the Spring Festival in the evening.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.