Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Detailed explanation of Apache Hadoop version

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

Due to the chaotic and changeable version of Hadoop, the choice of version of Hadoop has been bothering many beginners. This article summarizes the version evolution process of ApacheHadoop and Cloudera Hadoop, and gives some suggestions for choosing the version of Hadoop.

1.Apache HadoopApache version derivation

As of December 23, 2012, the ApacheHadoop version is divided into two generations. We call the first generation Hadoop Hadoop 1.0 and the second generation Hadoop Hadoop2.0. The first generation of Hadoop consists of three major versions, 0.20.x, 0.21.x and 0.22.x, in which 0.20.x finally evolved into 1.0.x and became a stable version, while 0.21.x and 0.22.x are new major features such as NameNodeHA. The second generation of Hadoop contains two versions, 0.23.x and 2.x, which are completely different from Hadoop1.0. They are a new set of architecture, including HDFS Federation and YARN systems. Compared with 0.23.x, NameNodeHA and Wire-compatibility are added two major features.

After the general explanation above, you may understand that Hadoop distinguishes each version by major features. To sum up, the features used to distinguish Hadoop versions are as follows:

(1) Append supports file append, which is required if you want to use HBase.

(2) on the premise of ensuring the reliability of the data, RAID introduces a check code to compare the number of minority data blocks. Detailed links:

Https://issues.apache.org/jira/browse/HDFS/component/12313080

(3) Symlink supports links to HDFS files. For more information, please see https://issues.apache.org/jira/browse/HDFS-245.

(4) SecurityHadoop security. For more information, please see https://issues.apache.org/jira/browse/HADOOP-4487.

(5) for more information on NameNodeHA, please see https://issues.apache.org/jira/browse/HDFS-1064.

(6) HDFSFederation and YARN

It should be noted that Hadoop2.0 is mainly developed by hortonworks, an independent company from Yahoo.

Download the Apache version

(1) Notes for each version: http://hadoop.apache.org/releases.html.

(2) download the stable version: find an image and download the version under the stable folder.

(3) the most complete version of Hadoop: http://svn.apache.org/repos/asf/hadoop/common/branches/, which can be directly imported into eclipse.

2.Cloudera HadoopCDH version derivation

The current version management of Apache is quite chaotic, and various versions emerge one after another, which makes many beginners at a loss. By contrast, the Hadoop version management of Cloudera company is much more.

We know that Hadoop complies with the Apache open source protocol, and users can use and modify Hadoop for free, so there are many Hadoop versions on the market, one of which is the release of Cloudera, which we call CDH (ClouderaDistributionHadoop). So far, there are four versions of CDH, the first two of which are no longer updated, and the most recent two, which are CDH3 (based on the Apache Hadoop0.20.2 version) and CDH4 based on the Apache Hadoop2.0.0 version, correspond to Apache's Hadoop 1.0 and Hadoop2.0, respectively, which are updated at regular intervals.

Cloudera is divided into small versions by patch level. For example, a patch level of 923.142 means adding 1065 patch to the original Apache Hadoop0.20.2 (these patch are contributed by various companies or individuals and are all recorded on Hadoopjira), of which 923 are patch added by the last beta version, and 142are newly added patch after the stable version is released. Thus it can be seen that the higher the patchlevel, the more complete the function and the more bug is solved.

The Cloudera version is clearer, and it provides Hadoop installation packages for a variety of operating systems, which can be installed directly using the apt-get or yum commands, making it easier to install.

Download the CDH version

(1) introduction to the meaning of version:

Https://ccp.cloudera.com/display/DOC/CDH+Version+and+Packaging+Information

(2) View the features of each version:

Https://ccp.cloudera.com/display/DOC/CDH+Packaging+Information+for+Previous+Releases

(3) download each version:

CDH3: http://archive.cloudera.com/cdh/3/

CDH4: http://archive.cloudera.com/cdh5/cdh/4/

Note that the Hadoop package is in the top directory of these two links, not in a folder, and many people can't find the installation package when they go to the link!

3. How to choose the Hadoop version

The current version of Hadoop is confusing, leaving many users at a loss. In fact, there are only two versions of Hadoop: Hadoop1.0 and Hadoop2.0, in which Hadoop1.0 consists of a distributed file system HDFS and an offline computing framework MapReduce, while Hadoop2.0 contains a HDFS that supports NameNode scale-out, a resource management system YARN and an offline computing framework MapReduce running on YARN. Compared with Hadoop1.0,Hadoop 2.0, it is more powerful, has better scalability, performance, and supports a variety of computing frameworks.

When we decide whether to use some software for an open source environment, we usually need to consider the following factors:

(1) whether it is open source software, that is, whether it is free.

(2) whether there is a stable version, the official website of this general software will give instructions.

(3) whether it has been verified by practice, this can be known by checking whether some larger companies have been used in the production environment.

(4) whether there is strong community support, and when a problem arises, the solution can be quickly obtained through community, forum and other network resources.

Considering the above factors, let's take a look at the open source software Hadoop. For Hadoop2.0, it is currently unstable and cannot be used in a production environment, so if you are currently preparing to use Hadoop, you can only choose one version from Hadoop1.0, and as of now (December 23, 2012), the latest stable versions of Apache and Cloudera are Hadoop1.0.4 and CDH3U4, respectively, so you can choose one of them.

Summary

The above is the detailed explanation of the Apache Hadoop version introduced by the editor. I hope it will be helpful to you. If you have any questions, please leave a message for me, and the editor will reply you in time. Thank you very much for your support to the website!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report