How to realize Hadoop under Cloudera 04/24 Update SLTechnology News&Howtos

How to realize Hadoop under Cloudera

2025-04-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

Editor to share with you how to achieve Hadoop under Cloudera, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

Preface

Hadoop is an open source distributed parallel programming framework that implements the MapReduce computing model. The concept of MapReduce comes from Google Lab. It is a programming model that simplifies parallel computing and is suitable for massive data processing on large-scale clusters. At present, the most successful application is distributed search engine. With the emergence of the Java open source implementation project Apache Hadoop at the end of 2007, programmers can easily write distributed parallel programs and run them on computer clusters to complete the calculation of massive data. In the past two years, especially this year, the application of MapReduce model has been gradually enriched at home and abroad. For example, telecom companies such as NTT KDDI and China Mobile use this model to analyze user information and optimize network configuration; the US Power supply Bureau uses this model to analyze the current situation of power grid; financial companies including VISA and JP Morgan use this model to analyze stock data. Retailers and e-commerce companies, including Amazon and ebay, have also begun to adopt this model; even some biological companies have adopted the model for DNA sequencing and analysis. However, it is very difficult to install, deploy and manage Hadoop, which makes many users shy away from Hadoop. Fortunately, this situation has been improved soon. Cloudera provides a very simple release version of Hadoop, which makes it very easy to install, deploy and manage Hadoop. As a result, about 75% of new Hadoop users use Cloudera. The following is the specific plan to implement Hadoop using Cloudera.

Planning

Operation mode

Hadoop has three operation modes: stand-alone (non-distributed) operation mode, pseudo-distribution operation mode and distributed operation mode. Among them, the first two running modes do not reflect the advantages of Hadoop distributed computing, and have no practical significance (of course, they are very helpful for program testing and debugging), so here we still use the distributed running mode used in the actual environment to deploy.

Host planning

Here, we plan to use three hosts to build the Hadoop environment. Since we still need to test the impact of adding and deleting hosts and cross-segment hosts on the Hadoop environment in the later stage, the Hadoop hosts are planned as follows: Hadoop-01 10.137.253.201Hadoop-02 10.137.253.202Hadoop-03 10.137.253.203, the test host Hadoop-04 10.137.253.204Firehare-303 10.10.3.30 is going to join in the later stage, and the cross-network segment test host is ready to be added later.

Hadoop environmental planning

For Hadoop, there are two main contents, one is the distributed file system HDFS, the other is the MapReduce computing model. In the view of distributed file system HDFS, nodes are divided into NameNode and DataNode, in which there is only one NameNode, and DataNode can be many; in the view of MapReduce computing model, nodes can be divided into JobTracker and TaskTracker, in which there is only one JobTracker, and TaskTracker can be many. Therefore, in a real Hadoop environment, there are usually two master nodes, one of which is used as a NameNode. ), one as a JobTracker (management node?? The rest are slave nodes and are used as both DataNode and TaskTracker Of course, NameNode and JobTracker can also be installed on a primary node. Due to the limited number of test machines, Hadoop-01 should be used as Namenode and Jobtracker, and other hosts as DataNode and TaskTracker (if there are a large number of hosts in Hadoop environment, it is still recommended to deploy Namenode and JobTracker to different hosts to improve computing performance).

These are all the contents of the article "how to achieve Hadoop under Cloudera". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.