Common problems in using Hadoop and their Solutions 10/21 Update SLTechnology News&Howtos

Common problems in using Hadoop and their Solutions

2025-10-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article focuses on "common problems and solutions in the use of Hadoop". Friends who are interested may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn the common problems and solutions when using Hadoop.

1:Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out Answer:

The program needs to open multiple files for analysis, the default number of the system is generally 1024, (you can see with ulimit-a) is enough for normal use, but for the program, it is too little. Modification method: modify 2 files.

/ etc/security/limits.conf vi / etc/security/limits.conf plus:

* soft nofile 102400 * hard nofile 409600

$cd / etc/pam.d/ $sudo vi login

Add session required / lib/security/pam_limits.so

2:Too many fetch-failures Answer:

The main reason for this problem is that the connectivity between nodes is not comprehensive enough. 1) check and / etc/hosts requires that the native ip corresponds to the server name

Required to include all servers ip + server name 2) check .ssh / authorized_keys

Requires public key to include all servers (including themselves)

3: processing speed is particularly slow map is fast but reduce is slow and repeated reduce=0% Answer: combine the second point, and then

Modify export HADOOP_HEAPSIZE=4000 in conf/hadoop-env.sh

4: error that can start datanode but cannot be accessed and cannot be ended

When reformatting a new distributed file, you need to delete the local file system path where dfs.name.dir, the namenode configured on your NameNode, is used to store NameNode persistent namespaces and transaction logs, and delete the local file system path of DataNode storage block data, the path of dfs.data.dir on each DataNode. In this case, the configuration is in

Delete on NameNode

/ home/hadoop/NameData, delete / home/hadoop/DataNode1 and / home/hadoop/DataNode2 on DataNode. This is because when Hadoop formats a new distributed file system, each stored namespace corresponds to the version of the creation time (you can see the VERSION file in the / home/hadoop / NameData/current directory, which records the version information). When reformatting new distributed system files, it is best to delete the NameData directory first. The dfs.data.dir for each DataNode must be deleted. Only in this way can the versions of the information recorded by namedode and datanode correspond.

Note: delete is a very dangerous action, can not be deleted without confirmation! Make a backup of all the deleted files!

5:java.io.IOException:

Could not obtain block: when this happens in blk_194219614024901469_1100 file=/user/hive/warehouse/src_20090724_log/src_20090724_log, most of the nodes are broken and there is no connection.

6:java.lang.OutOfMemoryError: Java heap space

This exception is obviously due to insufficient jvm memory, so you need to modify the jvm memory size of all datanode. Java-Xms1024m-Xmx4096m

In general, the maximum memory usage of jvm should be half of the total memory size, we use 8 GB of memory, so set to 4096m, this value may still not be the optimal value.

7: Namenode in safe mode solution

Bin/hadoop dfsadmin-safemode leave

8:java.net.NoRouteToHostException: No route to host j solution:

Sudo / etc/init.d/iptables stop

9: after changing namenode, running select in hive still points to the previous namenode address because: When youcreate a table, hive actually stores the location of the table (e.g.

Hdfs://ip:port/user/root/...) In the SDS and DBS tables in the metastore. So when I bring up a new cluster the master has a new IP, but hive's metastore is still pointing to the locations within the old

Cluster. I could modify the metastore to update with the new IP everytime I bring up a cluster. But the easier and simpler solution was to just use an elastic IP for the master

So replace all previous namenode addresses in metastore with existing namenode addresses

At this point, I believe you have a deeper understanding of "common problems and solutions in the use of Hadoop". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.