How to install Hadoop2.2.0 Cluster under RHEL6.2 04/27 Update SLTechnology News&Howtos

How to install Hadoop2.2.0 Cluster under RHEL6.2

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

Editor to share with you how to install the Hadoop2.2.0 cluster under RHEL6.2, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

In the process of building the cluster, we mainly encounter two problems:

(1) the first is: DataNode started (using jps can see the process), but can not be seen in NameNode (192.168.1.10 NameNode 50070), spend about 3 hours to check the problem, according to the log of the logs directory "org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.1.10:9000", to Baidu search, some people encounter that the firewall is not off, but my firewall is off. Finally, a variety of ways to try, it turned out to be / etc/hosts, for master in addition to 192.168.1.10, I also corresponded to 127.0.0.1, removed, restart, all good.

(2) the second is a common problem. If the DataNode caused by format namecode does not start many times, delete / home/hadoop/dfs/data/current/VERSION is fine.

(3) if you read more logs, you can always solve the problem.

What is Hadoop?

Hadoop is the founder of Lucene Doug Cutting, according to the relevant content of Google copied out of the distributed file system and massive data analysis and calculation of the basic framework system, including MapReduce programs, hdfs systems and so on.

Noun interpretation

(1) Hadoop:Apache open source distributed framework.

(2) HDSF:Hadoop distributed file system.

(3) NameNode:Hadoop HDFS metadata master node server, which is responsible for saving DataNode files and storing metadata information, this server is a single point.

(4) JobTracker:Hadoop 's Map/Reduce scheduler, which is responsible for communicating with TaskTracker to assign computing tasks and track the progress of tasks, this server is also a single point.

(5) DataNode:Hadoop data node, which is responsible for storing data.

(6) TaskTracker:Hadoop scheduler, which is responsible for the startup and execution of Map,Reduce tasks.

Cluster deployment structure diagram of Hadoop1

20140412225748359.jpg (50.6 KB, downloads: 0)

Download the attachment and save it to the album

Uploaded 6 days ago

Yarn architecture diagram of Hadoop2

20140413085324421.jpg (183.81 KB, downloads: 0)

Download the attachment and save it to the album

Uploaded 6 days ago

Install the RHEL environment

Install the virtual machine using VMWare WorkStation:

Http://blog.csdn.net/puma_dong/article/details/17889593#t0

Http://blog.csdn.net/puma_dong/article/details/17889593#t1

Install the Java environment:

Http://blog.csdn.net/puma_dong/article/details/17889593#t10

After installation, the 4 virtual machines IP and machine names are as follows:

192.168.1.10 master

192.168.1.11 node1

192.168.1.12 node2

192.168.1.13 node3

It can be viewed through vim / etc/hosts. Note: in / etc/hosts, do not assign the machine name to the address 127.0.0.1, which will cause the data node to fail to connect to the named node. The error is as follows:

Org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.1.10:9000

After installation, the Java location is as follows: / usr/jdk1.6.0_45, which can be viewed through echo $JAVA_HOME.

Configure the Hadoop environment

Create a Hadoop account

(1) create a Hadoop user group: groupadd hadoop

(2) create a Hadoop user: useradd hadoop-g hadoop

(3) set Hadoop user password: passwd password hadoop

(4) add sudo permission to Hadoop account: vim / etc/sudoers, and add content: hduser ALL= (ALL) ALL

Note: the above should be performed for each machine

Create a password-less login from master to slave

(1) switch to Hadoop user: su hadoop cd / home/hadoop/

(2) generate public key and private key: ssh-keygen-Q-t rsa-N ""-f / home/hadoop/.ssh/id_rsa

(3) check the key content: cd / home/hadoop/.ssh cat id_rsa.pub

(4) copy the id_rsa.pub public key to the authorized_keys file: cat id_rsa.pub > authorized_keys

(5) modify master public key permissions: chmod 644 / home/hadoop/.ssh/authorized_keys

(6) copy the authorized_keys file on the master machine to the node1 node:

Scp / home/hadoop/.ssh/authorized_keys node1:/home/hadoop/.ssh/

If there is no .ssh directory on the node1/node2/node3 machine, create it and chmod 700 / home/hadoop/.ssh

Install Hadoop

Installation directory

Hadoop installation directory: / home/hadoop/hadoop-2.2.0

File directory: / home/hadoop/dfs/name, / home/hadoop/dfs/data, / home/hadoop/tmp

Installation steps

Note: the following steps are performed using the hadoop account.

(1) go to the home/hadoop directory: cd / home/hadoop

Download hadoop:wget http://mirror.esocc.com/apache/h... Hadoop-2.2.0.tar.gz

(3) decompress the hadoop and put it in the planned installation location: tar zxvf hadoop-2.2.0.tar.gz

(4) create a file directory: mkdir-p / home/hadoop/dfs/name / home/hadoop/dfs/data / home/hadoop/tmp

(5) modify 7 configuration files, file location: / home/hadoop/hadoop-2.2.0/etc/hadoop/, file name: hadoop-env.sh, yarn-evn.sh, slaves, core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml

Profile hadoop-env.sh

If the system environment variable is set to $JAVA_HOME, this file does not need to be modified, otherwise change ${JAVA_HOME} to: / usr/jdk1.6.0_45

Profile yarn-env.sh

If the system environment variable is set to $JAVA_HOME, this file does not need to be modified, otherwise change ${JAVA_HOME} to: / usr/jdk1.6.0_45

Profile slaves

Vim / home/hadoop/hadoop-2.2.0/etc/hadoop/slaves, change the content to all the machine names of DataNode, one line for each machine. The configuration of this article is as follows:

Node1

Node2

Node3

Profile core-site.xml

Vim / home/hadoop/hadoop-2.2.0/etc/hadoop/core-site.xml, modify the configuration as follows:

Fs.defaultFShdfs://master:9000io.file.buffer.size131072hadoop.tmp.dirfile:/home/hadoop/tmpAbase for other temporary directories.

Profile hdfs-site.xml

Vim / home/hadoop/hadoop-2.2.0/etc/hadoop/hdfs-site.xml, modify the configuration as follows:

Dfs.namenode.secondary.http-addressmaster:9001dfs.namenode.name.dirfile:/home/hadoop/dfs/namedfs.datanode.data.dirfile:/home/hadoop/dfs/datadfs.replication3dfs.webhdfs.enabledtrue

Profile mapred-site.xml

Mv / home/hadoop/hadoop-2.2.0/etc/hadoop/mapred-site.xml.template / home/hadoop/hadoop-2.2.0/etc/hadoop/mapred-site.xml

Vim / home/hadoop/hadoop-2.2.0/etc/hadoop/mapred-site.xml, modify the configuration as follows:

Mapreduce.framework.nameyarnmapreduce.jobhistory.addressmaster:10020mapreduce.jobhistory.webapp.addressmaster:19888

Configure Node yarn-site.xml

Vim / home/hadoop/hadoop-2.2.0/etc/hadoop/yarn-site.xml, modify the configuration as follows:

Yarn.nodemanager.aux-servicesmapreduce_shuffleyarn.nodemanager.aux-services.mapreduce.shuffle.classorg.apache.hadoop.mapred.ShuffleHandleryarn.resourcemanager.addressmaster:8032yarn.resourcemanager.scheduler.addressmaster:8030yarn.resourcemanager.resource-tracker.addressmaster:8031yarn.resourcemanager.admin.addressmaster:8033yarn.resourcemanager.webapp.addressmaster:8088

Copy Hadoop to other nodes

(1) scp-r / home/hadoop/hadoop-2.2.0 hadoop@node1:~/

(2) scp-r / home/hadoop/hadoop-2.2.0 hadoop@node2:~/

(3) scp-r / home/hadoop/hadoop-2.2.0 hadoop@node3:~/

Start Hadoop

(1) switch to hadoop user: su hadoop

(2) enter the installation directory: cd ~ / hadoop-2.2.0/

(3) format namenode:./bin/hdfs namenode-format

(4) start hdfs:. / sbin/start-dfs.sh

(5) jps view. There are processes in master: processes on NameNoce SecondaryNameNode,node1/node2/node3: DataNode.

(6) start yarn:. / sbin/start-yarn.sh

(7) jps view. There are processes in master: processes on NameNoce SecondaryNameNode ResourceManager,node1/node2/node3: DataNode NodeManager.

(8) check the cluster status:. / bin/hdfs dfsadmin-report

(9) View the file block composition:. / bin/hdfs fsck /-files-blocks

(10) Web View HDFS: http://192.168.1.10:50070

(11) Web View RM: http://192.168.1.10:8088

HADOOP_HOME environment variable

For convenience, we set a HADOOP_HOME environment variable and add it to the PATH directory. The steps are as follows:

(1) vim / etc/profile.d/java.sh # because java is required for hadoop, so we can just use this file.

(2) add content:

Export HADOOP_HOME=/home/hadoop/hadoop-2.2.0

Export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

Run the Hadoop computing task

WordCount

(1) under the / home/hadoop directory, there are two text files, file01.txt and file02.txt, with the following contents:

File01.txt:

Kongxianghe

Kong

Yctc

Hello World

File02.txt:

eleven

2222

Kong

Hello

Yctc

(2) put these two files in the HDFS of hadoop:

Hadoop fs-ls / / View the hdfs directory

Hadoop fs-mkdir-p input

Hadoop fs-put / home/hadoop/file*.txt input

Hadoop fs-cat input/file01.txt / / View command

(3) calculate and view the results:

Hadoop jar. / share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount input output

Hadoop fs-ls output

Hadoop fs-cat output/part-r-00000

You can see that the data have been counted.

Run sorting calculation

The following program now generates 10 random numbers for each node, and then sorts the results:

(1) / bin/hadoop jar. / share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar randomwriter rand

(2) / bin/hadoop jar. / share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar sort rand sort-rand

The first command generates unsorted data in the rand directory. The second command reads the data, sorts it, and then writes to the rand-sort directory.

Common mistakes

(1) Name node is in safe mode

When you run the hadoop program, it terminates abnormally, and then when you add or delete files to hdfs, an Name node is in safe mode error occurs:

Rmr: org.apache.hadoop.dfs.SafeModeException: Cannot delete / user/hadoop/input. Name node is in safe mode

Commands for resolution:

Hadoop dfsadmin-safemode leave # turn off safemode

(2) DataNode cannot be started

I have encountered two situations in which DataNode cannot be started: the first is that the machine name in / etc/hosts corresponds to 127.0.0.1 in addition to IP, resulting in the failure to connect port 9000 of DataNode to NameNode; the second is that multiple format namenode causes inconsistencies in the clusterID of namenode and datanode. By looking at the / home/hadoop/dfs/data/current/VERSION of NameNode and DataNode, it is found that they are indeed inconsistent.

These are all the contents of the article "how to install Hadoop2.2.0 clusters under RHEL6.2". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.