What are the new features of the Hadoop3.x version 07/09 Update SLTechnology News&Howtos

What are the new features of the Hadoop3.x version

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "what are the new features of the Hadoop3.x version?" in the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Apache Hadoop 3.x

Apache Hadoop 3.x has made many significant improvements over previous major distributions (hadoop-2.x).

1. Minimum required Java version increased from Java 7 to Java 8

All Hadoop JAR has now been compiled for the runtime version of Java 8. Users who are still using Java 7 or earlier must upgrade to Java 8.

two。 Support for erasure codes in HDFS

Erasure code is a method of persistent data storage, which can save a lot of space. Compared with the 3x overhead of the standard HDFS copy mechanism, standard codes such as Reed-Solomon (10Jet 4) have 1.4 times the space overhead.

Because erasure codes incur additional overhead during reconstruction and remote reads are performed in most cases, erasure codes have traditionally been used to store colder, less frequently accessed data.

The network and CPU overhead of the erasure mechanism should be taken into account when deploying this feature.

For a more detailed introduction to erasure codes in HDFS, see my previous article: an in-depth analysis of the new feature of HDFS 3.x-erasure codes

3. Shell script rewriting

The Hadoop Shell script has been rewritten to fix many long-standing bugs and include some new features. Although Hadoop developers have been looking for compatibility, some changes may break existing installations.

4. Local optimization of MapReduce task

MapReduce adds support for local execution of the map output collector, which can improve performance by 30% or more for shuffle-intensive work.

5. Support for more than two NameNode

In previous versions, the high availability of HDFS supported up to two NameNode. In HDFS 3.x, the architecture can tolerate failures of any node in the system by copying edits to the required number of three JournalNode.

However, some deployments require higher fault tolerance. This new feature enables this, which allows users to run multiple standby NameNode. For example, by configuring three NameNode and five JournalNode, the cluster can tolerate two node failures, not just one node failure.

6. The default port for multiple services has been changed

Previously, the default port for multiple Hadoop services was in the Linux temporary port range (32768-61000). This means that at startup, the service sometimes fails to bind to the port because of a conflict with another application.

These conflicting ports have been moved out of the temporary range, and the specific port changes are as follows:

NameNode port: 50070-> 9870, 8020-- > 9820, 50470-- > 9871

Port of Secondary NameNode: 50091-> 9869, 50090-- > 9868

DataNode port: 50020-- > 9867, 50010-- > 9866, 50475-- > 9865, 50075-- > 9864

Port of Hadoop KMS: 16000-> 9600 (the HMasterport number of HBase conflicts with the port number of Hadoop KMS. Both use 16000, so Hadoop KMS is changed to 9600).

7. Support for Microsoft Azure data Lake and Ali Cloud object storage system file system connectors

Hadoop now supports integration with Microsoft Azure data Lake and Aliyun object storage systems as an alternative file system compatible with Hadoop.

8. Data intra-node balancer

A single DataNode can manage multiple disks. During a normal write operation, the disk will be filled evenly. However, adding or replacing disks can cause serious deviations within DataNode. The original HDFS balancer cannot handle this situation. In the new version of HDFS, there is balance function handling, which is called through hdfs diskbalancer CLI.

9. Federation based on HDFS Router

The federation based on HDFS routers adds a RPC routing layer that provides a federated view of multiple HDFS namespaces. This simplifies the access of existing HDFS clients to federated clusters.

10. YARN resource type

The YARN resource model has been generalized to support user-defined CPU and countable resource types outside of memory. For example, a cluster administrator can define resources, such as GPU, software licenses, or locally connected storage. YARN tasks can then be scheduled based on the availability of these resources.

This is the end of the content of "what are the new features of Hadoop3.x version". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.