Hadoop (2.5 ~ 2.6) HDFS occasional heartbeat abnormalities and a large number of DataXceiver threads are shared by Blocked fault handling 04/29 Update SLTechnology News&Howtos

Hadoop (2.5 ~ 2.6) HDFS occasional heartbeat abnormalities and a large number of DataXceiver threads are shared by Blocked fault handling

2025-04-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

I. Summary

Recently, the company's Storm cleaning program reported that occasional anomalies in HDFS would cause data not to be written into HDFS, and in other Spark operations, various "all datanode bad.." would appear on the client side when pouring data into HDFS on a large scale. As well as the server side appears all kinds of timeout, it is worth noting that the problem is that the load of each datanode node is not high!

Second, fault analysis

First of all, when we see various timeOut... in the HDFS client What kind of waiting for reading and other information, our first reaction is why the datanode that makes up the pipeline can not receive the packet of the upstream datanode when writing data to the HDFS? At this time, some people will often say what to do with the client timeout, which is definitely not possible (because the load on each node is very low). In addition, if "All datanode bad.." appears This kind of mistake, we often first come up with two ideas: first: all datanode can not provide services, second: dfsClient and hdfs server DataXServer thread connection but there is no packet transmission for a long time, which causes the hdfs server to start the protection mechanism to automatically disconnect, resulting in.

For now "All datanode bad.." I can basically rule out the second case of this kind of problem. Further down, observe the Thread Dump information and heartbeat information of datanode in the platform monitoring system, and find a problem:

Reproduce the abnormal situation and observe the Thread Dump and heartbeat of all datanode:

This is very scary, the heartbeat reaches 30s!

Further analysis, directly reproduce the valley, use the jstack-l command to look at the specific Thread Dump information of datanode and find that the system calls the createTemporary and checkDirs methods in FsDataSetImp in high density:

Because the method under the action of coarse-grained object lock FsDatasetImpl is frequently called above, the sending heartbeat and the DataXceiver thread are dropped by blocked (because they are also under the action of coarse-grained object lock FsDatasetImpl). Looking at the Thread Dump information, the DataXceiver thread that handles the request by DataNode is blocked:

The sending heartbeat thread was dropped by blocked:

When the thread sending heartbeat is dropped by blocked, it is mainly due to the fact that datanode needs to obtain the resources of its node when sending heartbeat to namenode. Heartbeat is obtained by getDfUsed,getCapacity,getAvailable,getBlockPoolUsed and other methods (see FsDatasetImpl code):

These methods are under the scope of FsDatasetImpl object lock, so the heartbeat thread is dropped by blocked. Take a look at the getDfsUsed source code:

Through the above round of analysis, we can basically analyze the causes of the failure: a large number of files are written to hdfs at the same time, a large number of DataXceiver and sending heartbeat threads are dropped by Blocked, and the heartbeat is sometimes abnormal for about tens of seconds. A large number of DataXceiver threads are dropped by blocked and unable to provide services to the dataStreamer (sending packet to datanode) and ResponseProcessor (ack receiving datanode in pipeline) threads of each dfsClient and BlockReceiver threads of datanode fail to work properly, resulting in timeOut on the client. Or when dfsClient writes packet to hdfs, the datanode in the whole PipeLine cannot respond to the client request, and then the system starts pipeline fault tolerance, but each datanode is unable to provide service because a large number of DataXceiver is dropped by Blocked, which finally causes the client to report "All dataNode are bad....". And timoeut on the server side.

In other words, this is a big BUG in HDFS.

This is the bug in hadoop2.6, and the code uses very large-grained object locks (FsDatasetImpl), which results in lock exceptions during large-scale write operations. This bug appears in versions 2.5 and 2.6 (we use 2.6 in our new cluster), and the bug has been fixed in versions 2.6.1 and 2.7.0. Specific Patch information given by the government:

Https://issues.apache.org/jira/browse/HDFS-7489

Https://issues.apache.org/jira/browse/HDFS-7999

In fact, the specific fix is to decompose this large-grained object lock into multiple small-grained locks, and split the heartbeat thread that datande sends to namenode from the associated lock.

To further confirm that this is a bug in hadoop2.6, I compared the blocked and heartbeat interval of the 2.7.1bug heartbeat thread and the DataXceiver thread when writing multiple batches of files on a large scale by upgrading the test cluster to 2.7.1 (bug repair version). Here is the performance of hadoop2.7.1:

By writing multiple batches of files to hdfs on a large scale, the client did not report timeout and "All datanode bad..." after the test cluster was upgraded to hadoop2.7.1. Exception, and the server did not report a timeOut exception. In addition, by comparing with the chart shown in hadoop2.6.1 above, it is found that this bug has been resolved in 2.7.1.

Third, fault handling

The impact of this failure on our existing business is probably as follows:

A. Affect the data written to hdfs through storm at a certain point in time

B. When the hdfs exception trigger point is encountered at just the right time for job submission, the job attachment files cannot be uploaded to hdfs, resulting in job submission failure.

C. If the time in the hdfs anomaly is prolonged, it may cause the fault-tolerant triggers of the MR job to exceed 3 times, resulting in the job failure.

Specific solution: smooth upgrade to hadoop2.7.1 version without downtime

The specific upgrade steps follow http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html to ok.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.