How to solve two hadoop configuration problems encountered by hive when querying hdfs data 07/19 Update SLTechnology News&Howtos

How to solve two hadoop configuration problems encountered by hive when querying hdfs data

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces how to solve the two hadoop configuration problems encountered when hive queries hdfs data. The article is very detailed and has a certain reference value. Interested friends must read it!

When querying hdfs data with hive, a simple statement, select * from logs where time = '2014-10-16, causes the hive to report an error with the following message

Error: java.io.IOException: Filesystem closed at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException (HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException (HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler (HadoopShimsSecure.java:256) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next (HadoopShimsSecure.java:171 ) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext (MapTask.java:198) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next (MapTask.java:184) at org.apache.hadoop.mapred.MapRunner.run (MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper (MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run (MapTask.java:342) At org.apache.hadoop.mapred.YarnChild$2.run (YarnChild.java:167) at java.security.AccessController.doPrivileged (Native Method) at javax.security.auth.Subject.doAs (Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1556) at org.apache.hadoop.mapred.YarnChild.main (YarnChild.java:162) Caused by: java.io.IOException: java.io.IOException: Filesystem closed At org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException (HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException (HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext (HiveContextAwareRecordReader.java:344) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext (CombineHiveRecordReader.java:101) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader .doNext (CombineHiveRecordReader.java:41) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next (HiveContextAwareRecordReader.java:122) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler (HadoopShimsSecure.java:254)... 11 moreCaused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen (DFSClient.java:707) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy (DFSInputStream . Java: 776) at org.apache.hadoop.hdfs.DFSInputStream.read (DFSInputStream.java:837) at java.io.DataInputStream.read (DataInputStream.java:100) at org.apache.hadoop.util.LineReader.fillBuffer (LineReader.java:180) at org.apache.hadoop.util.LineReader.readDefaultLine (LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine (LineReader.java:174) At org.apache.hadoop.mapred.LineRecordReader.next (LineRecordReader.java:209) at org.apache.hadoop.mapred.LineRecordReader.next (LineRecordReader.java:47) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext (HiveContextAwareRecordReader.java:339)... 15 more

Checked on the Internet, it probably means that when hadoop runs the mapreduce task, because multiple datanode nodes need to read hdfs filesystem, at this time, if one node shuts down the filesystem due to network or other reasons, while other nodes still use the filesystem in the cache, the IOException is triggered, and there are two solutions: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201207.mbox/%3CCAL=yAAE1mM-JRb=eJGkAtxWQ7AJ3e7WJCT9BhgWq7XDTNxrwfw@mail.gmail.com%3E.

Turn off jvm reuse (untried)

Turn off hdfs filesystem reuse. Add to the hadoop configuration file core-site.xml

Fs.hdfs.impl.disable.cache true

Restart the hadoop cluster and rerun the sql statement. IOException solves it, but a new problem arises.

Container [pid=820,containerID=container_1413449189065_0001_01_000008] is running beyond virtual memory limits. Current usage: 182.7 MB of 1 GB physical memory used; 2.3 GB of 2.1 GB virtual memory used. Killing container.Dump of the process-tree for container_1413449189065_0001_01_000008: |-PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME (MILLIS) SYSTEM_TIME (MILLIS) VMEM_USAGE (BYTES) RSSMEM_USAGE (PAGES) FULL_CMD_LINE |-820 319 820 820 (bash) 01 9695232 297 / bin/bash-c / opt/tools/jdk1.8.0_05/bin/java-Djava.net.preferIPv4Stack=true-Dhadoop.metrics.log.level=WARN-Xmx200m- Djava.io.tmpdir=/opt/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1413449189065_0001/container_1413449189065_0001_01_000008/tmp-Dlog4j.configuration=container-log4j.properties-Dyarn.app.container.log.dir=/opt/tools/hadoop-2.4.1/logs/userlogs/application_1413449189065_0001/container_1413449189065_0001_01_000008-Dyarn.app.container.log.filesize=0-Dhadoop.root.logger=INFO CLA org.apache.hadoop.mapred.YarnChild 172.17.0.10 38988 attempt_1413449189065_0001_m_000001_3 81 > / opt/tools/hadoop-2.4.1/logs/userlogs/application_1413449189065_0001/container_1413449189065_0001_01_000008/stdout 2 > / opt/tools/hadoop-2.4.1/logs/userlogs/application_1413449189065_0001/container_1413449189065_0001_01_000008/stderr |-826 820 820 (java) 2007 146 241448 1408 46477 / opt/ Tools/jdk1.8.0_05/bin/java-Djava.net.preferIPv4Stack=true-Dhadoop.metrics.log.level=WARN-Xmx200m-Djava.io.tmpdir=/opt/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1413449189065_0001/container_1413449189065_0001_01_000008/tmp-Dlog4j.configuration=container-log4j.properties-Dyarn.app.container.log.dir=/opt/tools/hadoop-2.4.1/logs/userlogs/application_1413449189065_0001/container_1413449189065_0001_ 0114000008-Dyarn.app.container.log.filesize=0-Dhadoop.root.logger=INFO CLA org.apache.hadoop.mapred.YarnChild 172.17.0.10 38988 attempt_1413449189065_0001_m_000001_3 8 Container killed on request. Exit code is 143Container exited with a non-zero exit code 143FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTaskMapReduce Jobs Launched: Job 0: Map: 2 Cumulative CPU: 6.02 sec HDFS Read: 282824 HDFS Write: 8022 FAILTotal MapReduce CPU Time Spent: 6 seconds 20 msec

This error means that virtual memory overflowed, while physical memory only used more than 180 MB. The solution is to turn off virtual memory detection in the configuration file yarn-site.xml

Yarn.nodemanager.vmem-check-enabled false yarn.nodemanager.pmem-check-enabled false

Restart hadoop and the problem is resolved.

These are all the contents of the article "how to solve two hadoop configuration problems encountered in hive querying hdfs data". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.