Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to deal with HDFS problem

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "how to deal with HDFS problems". In daily operation, I believe many people have doubts about how to deal with HDFS problems. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the doubts about "how to deal with HDFS problems"! Next, please follow the editor to study!

1. Regular block scans cause dn heartbeat to time out and leave the cluster.

Hdfs has a directory scanning mechanism. By default, all block will be scanned once every 6 hours to determine whether it is consistent with the blockMap in memory. Referenc

Https://blog.cloudera.com/hdfs-datanode-scanners-and-disk-checker-explained/

In the case of a large number of small files, the characteristics are obvious when scanning-the iops of the disk is very high, but the throughput is very low. Of course, this is not the cause of datanode heartbeat timeout, the real reason is to deal with the results of the scan, for example, 20000 block inconsistencies are found after comparison, and a lock on the object FsDatasetImpl is constantly held when repairing these block. When the disk is slow, it may take 5 minutes or even 10 minutes to finish processing, thus blocking read, write and heartbeat threads all the time.

You can learn more about HDFS-14476 lock too long when fix inconsistent blocks between disk and in-memory, including some features, evidence, and block repair logic.

The solution is that we add a patch (2.10 and 3.x). When dealing with an exception block, take a 2-second break to process a normal request so that the datanode does not get stuck or even go offline.

The result of the repair is also obvious, datanode's heartbeat is much smoother.

2. Namenode migration is abolished, and the client cannot write

The method of online migration of namenode is summarized in the practice of abolishing non-stop service migration of HDFS namenode nodes in this paper. The idea of migration / abolishment is to keep namenode hostname unchanged and migrate standby by rolling.

However, in our migration practice, we found that after the hdfs namenode completed the migration, the cluster was normal, but the hdfs client access was abnormal. In a long task scenario such as yarn, the file will fail to read and write until yarn nodemanager is restarted.

The specific problems are as follows:

Client uses ConfiguredFailoverProxyProvider. After client is started, two namenode proxy of nn1,nn2 will be created based on the current inetsocket, which will not be recreated in case of any network anomaly.

Client's updateAddress method can detect a change in namenode ip, but since the exception is not caught, it should be normal to use the correct namenode ip in the next cycle, but when an exception is thrown, client reconnects to namenode. However, the above namenode proxy is still the old address, and the SetupConnection exception enters the updateAddress judgment logic, returns true to establish a connection, and falls into a dead end.

Recurrence step

Open a hdfsclient and write a file hdfs put for a long time

Update hdfs new namenode hostname-ip

Stop old nn2, start new nn2

Update the client's namenode hostname-ip (client is still working on the file)

Switch to the new namenode hdfs haadmin-failover nn1 nn2

At this point, you will find that client has been reporting an error.

During the yarn client startup cycle, even if a new file is written, it will still report an error.

Hit a patch to the ConfiguredFailoverProxyProvider, which is also judged by the updateAddress after the client failover, and re-createProxy if there is a change in ip. Verify that this patch is also valid. However, unified capture is better on the client side, because there are other types of HaProvider that may have the same problem.

The patch for this problem has been incorporated into Apache Hadoop 3.4.See HADOOP-17068 client fails forever when namenode ipaddr changed. The version we use is 2.6.0-cdh6.4.11, which is also integrated.

In addition to solving the root cause problem, you can also enable port forwarding on the old node during the namenode migration operation, and then restart yarn one by one to avoid causing large-scale failure.

3. File write failure due to imbalance of cluster dn

Phenomenon: when the cluster will be full, the batch machine mitigation space will be expanded. After running for 2 weeks, the client suddenly reported a file write failure.

Reason: when part of the datanode space in hdfs is full, the theory will automatically pick other available free nodes. Due to improper configuration of dfs.datanode.du.reserved, full nodes are still selected. Specifically, if the dfs.datanode.du.reserved is less than the partition block reserved, it will appear when the disk is full

Org.apache.hadoop.ipc.RemoteException (java.io.IOException): File / kafka/xxxtmp.parquet could only be replicated to 0 nodes instead of minReplication (= 1). \

There are 14 datanode (s) running and no node (s) are excluded in this operation.

Resolve:

After the expansion, run rebalance.

Modify the block reserved of the disk partition so that it is less than dfs.datanode.du.reserved. See hdfs datanode Non DFS Used and Remaining.

Add a single datanode capacity alarm

4. The speed of doing rebalance is very slow.

Start the rebalance command. / start-balancer.sh-threshold 10. If you need to improve the speed, you can modify the current limiting bandwidth hdfs dfsadmin-setBalancerBandwidth 52428800.

However, the number of blocks concurrency received simultaneously on datanode cannot be adjusted online (or can only be reduced). Adjust the default balance parameter of hdfs-site.xml and restart it.

Dfs.balancer.moverThreads=1000

Dfs.balancer.dispatcherThreads=200

Dfs.datanode.balance.max.concurrent.moves=50

If you try to execute with higher concurrency when starting balance, datanode will determine that there are not enough threads to receive block: IOException: Got error, status message Not able to copy block... Because threads quota is exceeded .

When move fails, the migration speed decreases exponentially, because move block failure defaults to sleep for a period of time.

. / start-balancer.sh-threshold 5\

-Ddfs.datanode.balance.max.concurrent.moves=20\

-Ddfs.datanode.balance.bandwidthPerSec=150000000\

-Ddfs.balancer.moverThreads=500\

-Ddfs.balancer.dispatcherThreads=100

5. Add disks to datanode online

Machines on Tencent Cloud can be directly attached to the new disk on the original datanode to quickly expand the capacity of the hdfs.

Add disks, no need to restart datanode. (if dfs.datanode.fsdataset.volume.choosing.policy is set to AvailableSpaceVolumeChoosingPolicy)

After mounting, first establish the hadoop data directory and modify the permissions

Add a new directory to hdfs-site.xml to configure dfs.datanode.data.dir

You can use the reconfig command to make it work: hdfs dfsadmin-reconfig datanode dn-x-x-x-x:50020 start

6. Namenode sets HA, but fails to switch successfully in case of failure

Symptom: active namenode memory failure, master / slave switching failed

Reason: dfs.ha.fencing.methods is set to ssh, but cannot log in to other namenode to execute fence

Solution: generate ssh key and login without password. Or change it to shell (/ bin/true) and force it. Note that after changing the fence mode, you need to restart zkfc.

7. Hdfs client input/output error

Symptom: execute the hdfs client command to report an error input/output error, try to copy the media directory of hadoop / jdk, and also find the file is corrupted. Sometimes you will find jvm core

Cause: there is a bad block on the disk, and the jar library of hdfs or jdk happens to be damaged. Through the observation of messages, we found that there is sda IO Input/Output Error.

Using badblocks-s-v-o bb.log / dev/sda, you can see which sectors the disk is damaged

Solution: copy a normal medium from another machine

8. Hdfs mistook data disk for data disk

The system disk is mistaken for dfs.datanode.data.dir, and after running for a period of time, this partition is easy to be full first.

This is a configuration problem, understand how datanode works, you can quickly move the block in this partition to the correct disk partition.

The way to do this is to stop datanode, copy / data block to another partition, and delete the configuration of / data. Because the location of block on datanode is reported to namenode every time you start, you can make a physical copy.

You can use the copy command cp-a / data/hadoopdata/current/BP-*-*/current/finalized/* / data1/hadoopdata/current/BP-*-*/current/finalized/, but you cannot copy the entire hadoopdata directory because the storageID in the VERSION file is different.

9. Client read and write exception when returning datanode service using decomiss method

Phenomenon: add datanode to exclude and retire nodes in the way of normal decomissing. The application layer reports that some of the spark tasks are abnormal and an error Unable to close file because the last block doest not have enough number of replicas is reported, but some other file read and write tasks in the cluster are normal.

Reason: the spark task frequently creates and deletes application directories. During decomissing, some nodes with low disk performance lead to busier disks, resulting in long heartbeats in last contact.

Solution: it has been verified that the way the direct kill datanode process is found does not affect the spark task. But one by one kill must be guaranteed, otherwise missing block will appear. (this may not be the best way to solve the problem, but it does work.)

10. Namenode editlog hasn't done checkpoint for a long time.

One of the functions of standby namenode is to periodically merge the editlog obtained from journalnode to generate new metadata fsimage, which is then pushed to active namenode.

When an exception occurs in standby namenode, such as process exit and software bug (for example, we have encountered IOException: No image directories available!), the editlog has not been merged for a long time. Once you need to switch or restart namenode, it may take too long to start, and serious editlog merging requires insufficient memory to start namenode.

If you run out of memory, one solution is to merge editlog with a temporary machine with high memory:

Stop standby and copy hdfs's software media and configuration files to a high-memory machine

Copy both the latest available fsimage_xxx in the dfs.namenode.name.dir directory and all edits_xxx-xxx after it

Starting the namenode process on a temporary machine will automatically load fsiamge from the corresponding directory and merge editlog

Prevention is more important than remedy. Be sure to monitor the TransactionsSinceLastCheckpoint on namenode. Our threshold is 5000000.

11. HDFS 3.x datanode has a large number of CLOSE-WAIT

This problem HDFS-15402 occurs during regular probes of the datanode http://127.0.0.1:50075/jmx jmx, and we have five clusters of hadoop 3.1.3 that have this problem. It works fine in hadoop 2.x.

The effect of too much close-wait on port 50075 is that a normal webhdfs will have 504Gateway-timeout.

[root@dn-9-4-xxx-yy / tmp] # ss-ant | grep: 50075 | grep CLOSE-WAIT | wc-l

16464

[root@dn-9-4-xxx-yy / tmp] # ss-ant | grep: 50075 | grep CLOSE-WAIT | head-3

CLOSE-WAIT 123 0 9.4.xxx.yy:50075 9.4.xxx.yy:39706

CLOSE-WAIT 123 0 9.4.xxx.yy:50075 9.4.xxx.yy:51710

CLOSE-WAIT 123 0 9.4.xxx.yy:50075 9.4.xxx.yy:47475

Lsof-iPUR 39706

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME

Java 134304 hdfs * 307u IPv4 429yy7315 0t0 TCP dn-9-4 dn-9-4-xxx-yy:39706 (CLOSE_WAIT)

Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name

Tcp 123 0 9.4.xxx.yy:50075 9.4.xxx.yy:39706 CLOSE_WAIT 134304/java

The CLOSE-WAIT status is that when the client (curl) initiates to close the tcp connection, the server (datanode) receives the FIN-ACK, but it never completes when the socket is closed. The normal process is to turn off socket completion, and then send FIN to the client

So the problem lies with datanode server, which has nothing to do with knox or haproxy client. And this problem is useless to adjust the os kernel parameters, unless kill datanode, otherwise the close-wait state will be permanent. Using kill_close_wait_connections.pl on the Internet, you can clean up these close-wait, and then the webhdfs request gets better.

At present, the way to avoid it is to no longer request datanode jmx for monitoring, but only obtain the indicators on namenode. Os level indicators are collected on datanode.

12. Knox cannot upload 8G files

We mentioned this question in the official jira KNOX-2139. When we use webhdfs with knox to upload files with a size of 8589934592 bytes, there will be (55) Send failure: Broken pipe, and only one empty file can be seen in hdfs. And it is necessary in version 1.1,1.2 of knox, and normal in version 0.8.

Simply debug the code, and knox gets a request contentLength of-1 if the request contentLength is other than 0prime8G.

We later used haproxy instead of knox to solve the slow upload speed of knox itself and the problem of this 8G file. Upload optimization in backup system: from knox to haproxy, our implementation is introduced.

However, in the latest version 1.4, the 8G problem disappeared again. According to the official recovery, it may be related to the upgrade of jetty.

13. Unable to load native-hadoop library for your platform

Unable to load native-hadoop library for your platform... Using builtin-java classes

There is often such a prompt when executing hdfs client commands, which is actually a clich é.

To put it simply, the system did not find the native hadoop library libhdfs.so, this library is written in C, the performance is better. It is missing but does not affect usage, because there are client libraries for java implementations in hadoop.

I sum up three reasons for this:

Libhdfs.so is not included in the hadoop installation package.

This situation accounts for a large part. Go to the directory ${HADOOP_HOME} / lib/native/, to see if there is a libhdfs.so,libhdfs.a,libhadoop.so,libhadoop.a. If not, you can re-create a complete binary package and copy the lib/native out for use

This kind of seeing is normal.

. / bin/hadoop checknative

20:13:39 on 20-05-14 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native

20:13:39 on 20-05-14 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library

Native library checking:

Hadoop: true / data1/hadoop-hdfs/hadoop-dist/target/hadoop-2.6.0-cdh6.4.11-tendata/lib/native/libhadoop.so.1.0.0

Zlib: true / lib64/libz.so.1

Snappy: true / data1/hadoop-hdfs/hadoop-dist/target/hadoop-2.6.0-cdh6.4.11-tendata/lib/native/libsnappy.so.1

Lz4: true revision:10301

Bzip2: true / lib64/libbz2.so.1

Openssl: true / usr/lib64/libcrypto.so

I can't. Just compile one on my own os.

Mvn clean package-Pdist,native-DskipTests-Dtar-Dbundle.snappy-Dsnappy.lib=/usr/local/lib

So file exists, but the path is incorrect

In the current version, the so library can be found in the default path. Most of the methods described in this Hadoop "Unable to load native-hadoop library for your platform" warning teach you how to set a path. The real reason is rarely because the path is wrong, but this answer is reliable https://stackoverflow.com/a/30927689, that is, in our case 3

Compiled version that relies on incomplete libraries on our os

Encountered this, the version of the glibc library is not enough:

$ldd lib/native/libhadoop.so

Lib/native/libhadoop.so: / lib64/libc.so.6: version `GLIBC_2.14' not found (required by lib/native/libhadoop.so)

Linux-vdso.so.1 = > (0x00007ffd1db6d000)

/ $LIB/libonion.so = > / lib64/libonion.so (0x00007f5bfd37d000)

Libdl.so.2 = > / lib64/libdl.so.2 (0x00007f5bfce40000)

Libpthread.so.0 = > / lib64/libpthread.so.0 (0x00007f5bfcc23000)

Libc.so.6 = > / lib64/libc.so.6 (0x00007f5bfc88f000)

/ lib64/ld-linux-x86-64.so.2 (0x00007f5bfd266000)

$strings / lib64/libc.so.6 | grep GLIBC_

You can see which versions of glibc are supported by the current system.

However, glibc installation and upgrade is risky, so be sure to test it first if you want to install version 2.14.

14. Dealing with missing blocks

The emergence of missing block in the hdfs cluster is nothing more than the block metadata information still recorded in namenode, but all copies are lost. It is possible if multiple machines are hung at the same time, or if the disks on multiple machines are damaged.

I have encountered 2 times of artificially generated missing blocks:

Kill a datanode process, missing block appears

First set the replication of all files to 1, then set it to 2 after a short period of time

Both cases can be regarded as bug, and the corresponding files really cannot be get down. But in part 1, the situation is fine. After excluding the log, it is found that the actual missing blocks has received the delete command, and after a period of time, the missing block usually disappears automatically. In the second case, the block is really accidentally lost, which is quite serious. Do not easily set replication to 1, and then change it back may lose block.

If you confirm that these missing block can be eliminated, you can handle them manually through the fsck command:

/ / if the number of missing blocks is not very large, you can directly delete one by one.

Hdfs fsck file_name-delete

/ / if there are too many missing blocks, you can get the corrupt block from namenode

Hdfs fsck /-list-corruptfileblocks-openforwrite | egrep-v'^\. + $'| egrep "MISSING | OPENFORWRITE" | grep-o "/ [^] *" | sed-e "/" > missing_blocks.txt "

15. Alarms that should be paid attention to

In fact, there are many problems, such as the user's supergroup permissions and the lack of rack-aware.sh files, which are not listed because of the limited space.

Problems will arise all the time, but if you monitor most of the scenes in time, you can find them in advance. Here are the key alarm indicators for sorting out and putting them online:

Datanode lastcontact

Datanode and namenode heartbeat monitoring. A long heartbeat means that the dn does not respond. By default, if the 10m30s does not respond, the dn will leave the cluster.

Namenode and datanode web probe

Namenode 50070 and datanode 50075 probe from the outside, and datanode automatically increases or subtracts according to the address in the include. We use a modified telegraf http_response plug-in to support dynamic reading of url, such as exec bash get_datanode_urls.sh

Dirctory max files

Alarm on the number of files in a single directory. By default, hdfs limits the maximum number of files in a single directory to 1 million, which is determined by the configuration item dfs.namenode.fs-limits.max-directory-items.

This indicator data comes from fsimage catalog portrait analysis.

Transactions not merged

The number of editlog that the standby has not scrolled. The lack of checkpoint for a long time will cause the next namenode startup to consume too much memory or even fail to start.

Missing blocks

Abnormal blocks number

Test write file

On the 2 namenode nodes, write files periodically using hdfs put/get. If you fail, you will give an alarm.

Non-active namenode

The hdfs cluster namenode has one and only one active and one standby. Other situation alarm

Cluster capacity

Monitoring the overall capacity of the cluster

Node usage, ioutil

Single datanode disk space usage warning, ioutil lasts for 5 minutes greater than 95% warning.

Failover occurs

Failover occurs in hdfs namenode

Namenode heap size

Namenode heap size usage ratio. The more blocks, the more memory is used.

At this point, the study on "how to deal with the HDFS problem" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report