Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Hadoop entry Literacy: introduction and selection of hadoop Distribution

2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

I. introduction of hadoop release

At present, there are many Hadoop distributions, including Intel, Huawei, Cloudera (CDH), Hortonworks, etc., all of which are derived from Apache Hadoop. The reason why there are so many versions is determined by Apache Hadoop's open source agreement: anyone can modify it and release / sell it as an open source or commercial product.

At present, there are three main free Hadoop versions, all of which are foreign manufacturers. They are:

Apache (the original version based on which all distributions are improved)

Cloudera version (Cloudera's Distribution Including Apache Hadoop, referred to as CDH)

Hortonworks version (Hortonworks Data Platform, referred to as "HDP")

For domestic users, most of them choose CDH version. The difference between Cloudera's CDH and Apache's Hadoop is as follows:

(1) CDH has a very clear division of Hadoop versions. So far, there are five versions of CDH, the first three of which are no longer updated, and the last two are CDH4 and CDH5,CDH4 based on Hadoop2.0,CDH5 and based on hadoop2.2/2.3/2.5/2.6. By contrast, the Apache version is much more confusing; at the same time, the CDH distribution is much more compatible, secure, and stable than Apache hadoop.

(2) CDH3 is the third version of CDH, which is based on Apache hadoop0.20.2 improvement and incorporates the latest patch,CDH4 version based on Apache hadoop2.0.0 improvement. CDH always and applies the latest Bug fix or Feature's Patch, and releases earlier than the same functional version of Apache hadoop, and updates faster than the official Apache.

(3) CDH supports Kerberos security authentication, while apache hadoop uses crude username matching authentication.

(4) the CDH documentation is perfect and clear, and many users who use the Apache version will read the documents provided by CDH, including installation documents, upgrade documents, etc.

(5) CDH supports installation of yum/apt package, RPM package, tar package and Cloudera Manager, while Apache hadoop only supports Tar package installation.

II. Introduction of CDH distribution

CDH is first of all 100% open source, based on the Apache protocol. Based on Apache Hadoop and related projiect development. Can do batch processing, interactive sql query and timely query, role-based access control. The most widely used Hadoop distribution in the enterprise.

Cloudera perfects the version of CDH, and provides tools for publishing, configuring, managing, monitoring and diagnosing hadoop, and provides a variety of integrated distributions on the official website. As shown in the following figure:

1. Download the CDH version only. At present, the latest version is CDH5.8.2, which can be downloaded freely and can be used without restriction.

2. Cloudera Express, which can be downloaded and used free of charge, includes CDH and Cloudera Manager (CM for short). CM provides cluster management functions, such as automatic deployment, centralized management, monitoring, diagnosis and so on. CM is a non-open source product, Cloudera provides limited functional use, the previous limit on the management of 50 data nodes, has been lifted this restriction, can add data nodes indefinitely.

3. Cloudera Enterprise is an official fee-based product. You can try out the full-featured version for 60 days for free. After expiration, you need a registration code to continue to use it, otherwise it will become the Cloudera Express version, including CDH and Cloudera Manager. Cloudera Enterprise has the same functions in publishing, configuring and managing, monitoring, diagnosing, and integrating. There is only a difference in advanced management functions, which Cloudera Enterprise has and Cloudera Express does not.

3. Download address of CDH distribution

You can go to the official website download page: http://www.cloudera.com/downloads.html, you can also know to download different versions at the following address:

Http://archive.cloudera.com/cdh/

Http://archive.cloudera.com/cdh5/

Http://archive.cloudera.com/cdh6/

IV. Dependence of CDH and operating system

The relationship between the CDH distribution and the operating system is as follows:

Experience recommendation:

Hadoop-2.3.0-cdh6.1.5 and previous versions. It is recommended that the linux operating system version is above Centos6.x.

For hadoop-2.5.0-cdh6.2.0 and later versions, it is recommended that the linux operating system version is above Centos7.x (not supported by Centos7.1/7.2,7.0).

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report