New features of Hadoop3.x 04/06 Update SLTechnology News&Howtos

New features of Hadoop3.x

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Hadoop 3.x Overview

Hadoop 3.x has a lot of enhancements and improvements, an upgrade from Hadoop 2.x (which is nonsense). It should be noted that in Hadoop 3.x, jdk 1.7 can no longer be used, but needs to be upgraded to jdk 1.8 and above. This is because Hadoop 2.0 was based on JDK 1.7, which stopped updating in April 2015, forcing the Hadoop community to redistribute a new Hadoop version based on JDK 1.8, which is Hadoop 3.x. Hadoop 3.x will adjust the schema architecture to make Mapreduce based on memory +io+ disk to process data together.

Hadoop 3.x introduces a number of important features and optimizations, including HDFS erasable encoding, multiple Namenode support, MR Native Task optimization, YARN cgroup-based memory and disk IO isolation, YARN container resizing, and more.

Hadoop 3.x official documentation address is as follows:

http://hadoop.apache.org/docs/r3.0.1/

Common improvements to new features in Hadoop 3.x

Hadoop Common improvements:

Streamlining the Hadoop kernel, including culling outdated APIs and implementations, replacing default component implementations with the most efficient implementations (e.g., changing the default FileOutputCommitter implementation to v2, abolishing hftp and replacing it with webhdfs, removing the Hadoop child implementation serialization library org.apache.hadoop.Records)

lasspath isolation to prevent conflicts between different versions of jar packages, such as google Guava when using Hadoop, HBase and Spark in a mixed way, it is easy to generate conflicts.（https://issues.apache.org/jira/browse/HADOOP-11656）

Shell script refactoring. Hadoop 3.0 refactored Hadoop's administrative scripts, fixed a number of bugs, added new features, and supported dynamic commands. The usage is consistent with previous versions. (https://issues.apache.org/jira/browse/HADOOP-9902) HDFS improvements for new features in Hadoop3.x

The biggest change in Hadoop 3.x is HDFS, HDFS is calculated by the nearest black block, according to the principle of nearest calculation, local black block, added to memory, first calculated, through IO, shared memory calculation area, and finally quickly formed calculation results.

HDFS supports erasure encoding of data, which allows HDFS to save half the storage space without reducing reliability.（https://issues.apache.org/jira/browse/HDFS-7285）

Multi-NameNode support, i.e. support for deployment of one active and multiple standby nameNodes in a cluster. Note: Multiple ResourceManager features are supported in hadoop 2.0.（https://issues.apache.org/jira/browse/HDFS-6440）

Official documentation on these two features is available at:

http://hadoop.apache.org/docs/r3.0.1/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html

http://hadoop.apache.org/docs/r3.0.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

YARN improvements for new features in Hadoop3.x Memory isolation and IO Disk isolation based on cgroups (https://issues.apache.org/jira/browse/YARN-2619) RM leader election with curator (https://issues.apache.org/jira/browse/YARN-4438) containerresizing (https://issues.apache.org/jira/browse/YARN-1197) Timelineserver next generation (https://issues.apache.org/jira/browse/YARN-2928)

Official file address:

http://hadoop.apache.org/docs/r3.0.1/hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html

MapReduce improvements for Hadoop 3.x new features

Tasknotive optimization. C/C++ map output collector implementations (including Spill, Sort, IFile, etc.) are added to MapReduce, which can be switched to by adjusting job level parameters. For shuffle-intensive applications, performance can be improved by about 30%.（https://issues.apache.org/jira/browse/MAPREDUCE-2841）

MapReduce memory parameters are automatically inferred. In Hadoop 2.0, setting memory parameters for MapReduce jobs is cumbersome and involves two parameters: mapreduce. {map memory.mb and mapreduce. {map java.opts, once set unreasonable, it will make memory resources waste seriously, for example, the former is set to 4096MB, but the latter is "-Xmx2g", then the remaining 2g can not actually be used by java heap. (https://issues.apache.org/jira/browse/MAPREDUCE-5785) Other new features of Hadoop3.x add new hadoop-client-api and hadoop-client-runtime components into a separate jar package to resolve dependency incompatibility issues. (https://issues.apache.org/jira/browse/HADOOP-11804) Supports Microsoft Azure distributed file system and Alibaba aliyun distributed file system

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.