Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to install and use hadoop-0.20.2 easily

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article will explain in detail how to install hadoop-0.20.2 and simple use. The content of the article is of high quality. Therefore, Xiaobian shares it with you as a reference. I hope you have a certain understanding of relevant knowledge after reading this article.

The installation steps are as follows:

1.1 Machine Description

Total 4 machines: sc706-26, sc706-27, sc706-28, sc706-29

IP addresses: 192.168.153.89, 192.168.153.90, 192.168.153.91, 192.168.153.92

Operating system: fedora12 for Linux

jdk version: jdk-6u19-linux-i586

Hadoop version is: hadoop-0.20.2

sc706-26 as NameNode, JobTracker, the other three as DataNode, TaskTracker

1.2 ping machine with machine name

Log in as root, modify the/etc/hosts file on NameNode and DataNode, and add the IP addresses and machine names of the four machines, as follows:

192.168.153.89 sc706-26

192.168.153.90 sc706-27

192.168.153.91 sc706-28

192.168.153.92 sc706-29

After setting up, verify whether the ping between the machines is connected. You can use the machine name or IP address, such as ping sc706-27 or ping 192.168.153.90

1.3 New Hadoop User

Hadoop requires that the deployment directory structure of hadoop on all machines be the same and have an account with the same username. My default path is/home/hadoop.

1.4 ssh setting and closing firewall (root, su -required)

1) After fedora is installed, the sshd service is started by default. If you are unsure, you can check [root@sc706-26 hadoop]# service sshd status.

If not started, start [root@sc706-26 hadoop]# service sshd start

Create ssh passwordless login on NameNode [hadoop@sc706-26 ~]$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

Two files will be generated in ~/.ssh/: id_dsa and id_dsa.pub, which appear in pairs. Append the id_dsa.pub file to authorized_keys on DataNode.

[hadoop@sc706-26 ~]$ scp id_dsa.pub sc706-27:/home/hadoop/ (note that there is no space between: after the target machine and the path to the file to be transmitted, i.e. there is no space between sc706: and/home/hadoop/)

scp id_dsa.pub sc706-28:/home/hadoop/

scp id_dsa.pub sc706-29:/home/hadoop/

Log in to DataNode,[hadoop@sc706-27 ~]$ cat id_dsa.pub >> ~/.ssh/authorized_keys, the same for the other two, add to NameNode. Note: After appending, you must modify the permissions of.ssh and authorized_keys on NameNode and DataNode. chmod command, parameter 755. After completion of the test, for example, ssh sc706-27, you can log in without password, and you can know that ssh is successfully set.

2) Turn off the firewall (NameNode and DataNode must both be turned off)

[root@sc706-26 ~]# service iptables stop

Note: Hadoop must be turned off before restarting every time.

1.5 Install jdk1.6(several machines are the same)

Download jdk-6u19-linux-i586.bin from http://java.sun.com and install [root@sc706-26 java]#chmod +x jdk-6u19-linux-i586.bin [root@sc706-26 java]# ./ jdk-6u19-linux-i586.bin, my installation path is: /usr/java/jdk1.6.0_19, after installation add the following statement to/etc/profile:

export JAVA_HOME=/usr/java/jdk1.6.0_19

export JRE_HOME=/usr/java/jdk1.6.0_19/jre

export CLASSPATH=.:$ JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH

export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

1.6 Install hadoop

Download hadoop-0.20.2.tar.gz from http://apache.etoak.com//hadoop/core/

[hadoop@sc706-26 ~]$ tar xzvf hadoop-0.20.2.tar.gz

Add the installation path of hadoop to/etc/profile:

export HADOOP_HOME=/home/hadoop/hadoop-0.20.2

export PATH=$HADOOP_HOME/bin:$PATH

To make/etc/profile work, source [hadoop@sc706-26 ~]$ source /etc/profile

1.7 Configure hadoop

Its configuration file is in the/conf directory

1) Configure Java environment

[hadoop@sc706-26 ~]$ vim hadoop-0.20.2/conf/hadoop-env.sh

export JAVA_HOME=/usr/java/jdk1.6.0_19

2) Configure conf/core-site.xml, conf/hdfs-site.xml, conf/mapred-site.xml files

[hadoop@sc706-26 ~]$ vim hadoop-0.20.2/conf/core-site.xml

hadoop.tmp.dir

/home/hadoop/tmp

fs.default.name

hdfs://sc706-26:9000

[hadoop@sc706-26 ~]$ vim hadoop-0.20.2/conf/mapred-site.xml

mapred.job.tracker

hdfs://sc706-26:9001 Note: Can you add hdfs before sc706-26://Not very clear, I have two clusters, one can be used without adding

[hadoop@sc706-26 ~]$ vim hadoop-0.20.2/conf/hdfs-site.xml

dfs.name.dir

/home/hadoop/name

dfs.data.dir

/home/hadoop/data

dfs.replication

3 Note: If set to 1, there is only one copy of the data. If one of the datanodes has a problem, the whole job will fail.

3) Copy the complete hadoop on NameNode to DataNode. You can compress it first and then directly scp it or copy it with disk.

4) Configure conf/masters and conf/slaves on NameNode

masters:192.168.153.89

slaves:192.168.153.90

192.168.153.91

192.168.153.92

1.8 running Hadoop

1) Format file system

[hadoop@sc706-26 hadoop-0.20.2]$ hadoop namenode -format

Note: When formatting, it is necessary to prevent the namespace ID of NameNode from being inconsistent with the namespace ID of DataNode, because temporary file record information such as Name, Data and tmp will be generated every time the format is formatted, and a lot of formatting will be generated, which will lead to different IDs and cause hadoop not to run.

2) Start Hadoop

[hadoop@sc706-26 hadoop-0.20.2]$ bin/start-all.sh

3) Use the jps command to view the process. The results on NameNode are as follows:

25325 NameNode

25550 JobTracker

28210 Jps

25478 SecondaryNameNode

4) View cluster status

[hadoop@sc706-26 hadoop-0.20.2]$ hadoop dfsadmin -report

Make sure the correct number of DataNodes are running, mine is 3, so you can see which DataNodes are not running

5) View it in the web mode of hadoop

[hadoop@sc706-26 hadoop-0.20.2]$ links http://192.168.153.89 (i.e. master):50070

1.9 Run the Wordcount.java program

1) Create two files f1 and f2 on the local disk first

[hadoop@sc706-26 ~]$ echo "hello Hadoop goodbye hadoop" > f1

[hadoop@sc706-26 ~]$ echo "hello bye hadoop hadoop" > f2

2) Create an input directory on hdfs

[hadoop@sc706-26 ~]$ hadoop dfs -mkdir input

3) Copy f1 and f2 to the input directory of hdfs

[hadoop@sc706-26 ~]$ hadoop dfs -copyFromLocal /home/hadoop/f* input

4) Check whether there is an input directory on hdfs

[hadoop@sc706-26 ~]$ hadoop dfs -ls

5)Check if there are successful copies of f1 and f2 in the input directory

[hadoop@sc706-26 ~]$ hadoop dfs -ls input

6)Execute wordcount (make sure there is no output directory on hdfs)

[hadoop@sc706-26 hadoop-0.20.2]$ hadoop jar hadoop-0.20.2-examples.jar wordcount input output

7) Run complete, view results

[hadoop@sc706-26 hadoop-0.20.2]$ hadoop dfs -cat output/part-r-00000

About how to install hadoop-0.20.2 and simple use to share here, I hope the above content can be of some help to everyone, you can learn more knowledge. If you think the article is good, you can share it so that more people can see it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report