How to install Hadoop completely distributed 04/27 Update SLTechnology News&Howtos

How to install Hadoop completely distributed

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

Editor to share with you how to fully distributed installation of Hadoop, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

Hadoop fully distributed mode installation steps

Introduction to Hadoop mode

Stand-alone mode: easy to install, almost without any configuration, but only for debugging purposes

Pseudo-distribution mode: start namenode, datanode, jobtracker, tasktracker, secondary namenode and other five processes on a single node at the same time to simulate each node running in a distributed way

Fully distributed mode: a normal Hadoop cluster consisting of multiple nodes performing their respective functions

Installation environment

Operating platform: vmware2

Operating system: oracle linux 5.6

Software version: hadoop-0.22.0,jdk-6u18

Cluster architecture: 3 node,master node (gc), slave node (rac1,rac2)

Installation steps

1. Download Hadoop and jdk:

Such as: hadoop-0.22.0

two。 Configure the hosts file

All nodes (gc,rac1,rac2) modify / etc/hosts so that hostnames can be resolved to ip among each other

[root@gc ~] $cat / etc/hosts

# Do not remove the following line, or various programs

# that require network functionality will fail.

127.0.0.1 localhost.localdomain localhost

:: 1 localhost6.localdomain6 localhost6

192.168.2.101 rac1.localdomain rac1

192.168.2.102 rac2.localdomain rac2

192.168.2.100 gc.localdomain gc

3. Set up a hadoop running account

Create hadoop running accounts on all nodes

[root@gc ~] # groupadd hadoop

[root@gc ~] # useradd-g hadoop grid-- Note that grouping must be specified here, otherwise mutual trust may not be established

[root@gc ~] # id grid

Uid=501 (grid) gid=54326 (hadoop) groups=54326 (hadoop)

[root@gc ~] # passwd grid

Changing password for user grid.

New UNIX password:

BAD PASSWORD: it is too short

Retype new UNIX password:

Passwd: all authentication tokens updated successfully.

4. Configure ssh password-free connection

Be careful to log in as the hadoop user and operate under the home directory of the hadoop user.

Each node does the same thing as follows

[hadoop@gc] $ssh-keygen-t rsa

Generating public/private rsa key pair.

Enter file in which to save the key (/ home/hadoop/.ssh/id_rsa):

Created directory'/ home/hadoop/.ssh'.

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in / home/hadoop/.ssh/id_rsa.

Your public key has been saved in / home/hadoop/.ssh/id_rsa.pub.

The key fingerprint is:

54:80:fd:77:6b:87:97:ce:0f:32:34:43:d1:d2:c2:0d hadoop@gc.localdomain

[hadoop@gc ~] $cd .ssh

[hadoop@gc .ssh] $ls

Id_rsa id_rsa.pub

Copy the contents of each node's authorized_keys into each other's file, and then you can avoid ssh connection with each other's passwords.

The operation can be completed in one of the nodes (gc)

[hadoop@gc .ssh] $cat ~ / .ssh/id_rsa.pub > > ~ / .ssh/authorized_keys

[hadoop@gc .ssh] $ssh rac1 cat ~ / .ssh/id_rsa.pub > > ~ / .ssh/authorized_keys

The authenticity of host 'rac1 (192.168.2.101)' can't be established.

RSA key fingerprint is 19:48:e0:0a:37:e1:2a:d5:ba:c8:7e:1b:37:c6:2f:0e.

Are you sure you want to continue connecting (yes/no) yes

Warning: Permanently added 'rac1192.168.2.101' (RSA) to the list of known hosts.

Hadoop@rac1's password:

[hadoop@gc .ssh] $ssh rac2 cat ~ / .ssh/id_rsa.pub > > ~ / .ssh/authorized_keys

The authenticity of host 'rac2 (192.168.2.102)' can't be established.

RSA key fingerprint is 19:48:e0:0a:37:e1:2a:d5:ba:c8:7e:1b:37:c6:2f:0e.

Are you sure you want to continue connecting (yes/no) yes

Warning: Permanently added 'rac2192.168.2.102' (RSA) to the list of known hosts.

Hadoop@rac2's password:

[hadoop@gc .ssh] $scp ~ / .ssh/authorized_keys rac1:~/.ssh/authorized_keys

Hadoop@rac1's password:

Authorized_keys 100% 1213 1.2KB/s 00:00

[hadoop@gc .ssh] $scp ~ / .ssh/authorized_keys rac2:~/.ssh/authorized_keys

Hadoop@rac2's password:

Authorized_keys 100% 1213 1.2KB/s 00:00

[hadoop@gc .ssh] $ll

Total  16

-rw-rw-r-- 1 hadoop hadoop 1213 10-30 09:18 authorized_keys

-rw- 1 hadoop hadoop 1675 10-30 09:05 id_rsa

-rw-r--r-- 1 hadoop hadoop 403 10-30 09:05 id_rsa.pub

-- Test the connection separately

[grid@gc .ssh] $ssh rac1 date

Sunday, November 18th, 2012, 01:35:39 CST

[grid@gc .ssh] $ssh rac2 date

Tuesday, October 30, 2012, 09:52:46 CST

-- you can see that this step is the same as using SSH to establish user equivalence in configuring oracle RAC.

5. Extract the hadoop installation package

-- you can extract the configuration file from a node first.

[grid@gc ~] $ll

Total 43580

-rw-r--r-- 1 grid hadoop 44575568 2012-11-19 hadoop-0.20.2.tar.gz

[grid@gc ~] $tar xzvf / home/grid/hadoop-0.20.2.tar.gz

[grid@gc ~] $ll

Total 43584

Drwxr-xr-x 12 grid hadoop 4096 2010-02-19 hadoop-0.20.2

-rw-r--r-- 1 grid hadoop 44575568 2012-11-19 hadoop-0.20.2.tar.gz

-- install jdk on each node

[root@gc ~] #. / jdk-6u18-linux-x64-rpm.bin

6. Hadoop configuration related files

N configure hadoop-env.sh

[root@gc conf] # pwd

/ root/hadoop-0.20.2/conf

-- modify the jdk installation path

[root@gc conf] vi hadoop-env.sh

Export JAVA_HOME=/usr/java/jdk1.6.0_18

N configure namenode, modify site file

-- modify the core-site.xml file

[gird@gc conf] # vi core-site.xml

< xml version="1.0" >

< xml-stylesheet type="text/xsl" href="configuration.xsl" >

Fs.default.name

Hdfs://192.168.2.100:9000-note that IP must be used here in fully distributed mode, same as below

Note: IP address and port of fs.default.name NameNode

-- modify the hdfs-site.xml file

[gird@gc conf] # vi hdfs-site.xml

< xml version="1.0" >

< xml-stylesheet type="text/xsl" href="configuration.xsl" >

Dfs.data.dir

/ home/grid/hadoop-0.20.2/data-- Note that this directory must have been created and can read and write

Dfs.replication

two

Common configuration parameters in hdfs-site.xml files:

-- modify the mapred-site.xml file

[gird@gc conf] # vi mapred-site.xml

< xml version="1.0" >

< xml-stylesheet type="text/xsl" href="configuration.xsl" >

Mapred.job.tracker

192.168.2.100:9001

Common configuration parameters in mapred-site.xml file

N configure masters and slaves files

[grid@gc conf] $vi masters

[grid@gc conf] $vi slaves

Rac1

Rac2

N copy hadoop to each node

-- copy the hadoop configuration files on the gc host to each node

-- Note: modify the IP of this node in the configuration file after copying to another node

[grid@gc conf] $scp-r hadoop-0.20.2 rac1:/home/grid/

[grid@gc conf] $scp-r hadoop-0.20.2 rac2:/home/grid/

7. Format namenode

-- formatting on each node

[grid@rac2 bin] $pwd

/ home/grid/hadoop-0.20.2/bin

[grid@gc bin] $. / hadoop namenode-format

12-10-31 08:03:31 INFO namenode.NameNode: STARTUP_MSG:

/ *

STARTUP_MSG: Starting NameNode

STARTUP_MSG: host = gc.localdomain/192.168.2.100

STARTUP_MSG: args = [- format]

STARTUP_MSG: version = 0.20.2

STARTUP_MSG: build =; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010

* * /

12-10-31 08:03:31 INFO namenode.FSNamesystem: fsOwner=grid,hadoop

12-10-31 08:03:31 INFO namenode.FSNamesystem: supergroup=supergroup

12-10-31 08:03:31 INFO namenode.FSNamesystem: isPermissionEnabled=true

12-10-31 08:03:32 INFO common.Storage: Image file of size 94 saved in 0 seconds.

12-10-31 08:03:32 INFO common.Storage: Storage directory / tmp/hadoop-grid/dfs/name has been successfully formatted.

12-10-31 08:03:32 INFO namenode.NameNode: SHUTDOWN_MSG:

/ *

SHUTDOWN_MSG: Shutting down NameNode at gc.localdomain/192.168.2.100

* * /

8. Start hadoop

-- start the hadoop daemon on the master node

[grid@gc bin] $pwd

/ home/grid/hadoop-0.20.2/bin

[grid@gc bin] $. / start-all.sh

Starting namenode, logging to / home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-namenode-gc.localdomain.out

Rac2: starting datanode, logging to / home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-datanode-rac2.localdomain.out

Rac1: starting datanode, logging to / home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-datanode-rac1.localdomain.out

The authenticity of host'gc (192.168.2.100) 'can't be established.

RSA key fingerprint is 8e:47:42:44:bd:e2:28:64:10:40:8e:b5:72:f9:6c:82.

Are you sure you want to continue connecting (yes/no) yes

Gc: Warning: Permanently added 'gc,192.168.2.100' (RSA) to the list of known hosts.

Gc: starting secondarynamenode, logging to / home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-secondarynamenode-gc.localdomain.out

Starting jobtracker, logging to / home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-jobtracker-gc.localdomain.out

Rac2: starting tasktracker, logging to / home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-tasktracker-rac2.localdomain.out

Rac1: starting tasktracker, logging to / home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-tasktracker-rac1.localdomain.out

9. Use jps to verify whether each background process starts successfully

-- View background processes in the master node

[grid@gc bin] $/ usr/java/jdk1.6.0_18/bin/jps

27462 NameNode

29012 Jps

27672 JobTracker

27607 SecondaryNameNode

-- View background processes in the slave node

[grid@rac1 conf] $/ usr/java/jdk1.6.0_18/bin/jps

16722 Jps

16672 TaskTracker

16577 DataNode

[grid@rac2 conf] $/ usr/java/jdk1.6.0_18/bin/jps

31451 DataNode

31547 TaskTracker

31608 Jps

10. Problems encountered during installation

1) Ssh cannot establish mutual trust

No grouping is specified when users are created, and mutual trust cannot be established in Ssh. The following steps

[root@gc ~] # useradd grid

[root@gc ~] # passwd grid

Resolve:

Create a new user group and specify this user group when you create the user.

[root@gc ~] # groupadd hadoop

[root@gc ~] # useradd-g hadoop grid

[root@gc ~] # id grid

Uid=501 (grid) gid=54326 (hadoop) groups=54326 (hadoop)

[root@gc ~] # passwd grid

2) after starting hadoop, the slave node does not have a datanode process

Phenomenon:

After the master node starts hadoop, the master node process is normal, but the slave node does not have a datanode process.

-- Master node is normal

[grid@gc bin] $/ usr/java/jdk1.6.0_18/bin/jps

29843 Jps

29703 JobTracker

29634 SecondaryNameNode

29485 NameNode

-- at this time, check the process in the two slave nodes and find that there is still no datanode process.

[grid@rac1 bin] $/ usr/java/jdk1.6.0_18/bin/jps

5528 Jps

3213 TaskTracker

[grid@rac2 bin] $/ usr/java/jdk1.6.0_18/bin/jps

30518 TaskTracker

30623 Jps

Reason:

-- look back at the output log when the master node starts hadoop, and find the log that starts the datanode process in the slave node

[grid@rac2 logs] $pwd

/ home/grid/hadoop-0.20.2/logs

[grid@rac1 logs] $more hadoop-grid-datanode-rac1.localdomain.log

/ *

STARTUP_MSG: Starting DataNode

STARTUP_MSG: host = rac1.localdomain/192.168.2.101

STARTUP_MSG: args = []

STARTUP_MSG: version = 0.20.2

STARTUP_MSG: build =; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010

* * /

2012-11-18 07 can not create directory 4315 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in dfs.data.dir: can not create directory: / usr/hadoop-0.20.2/data

2012-11-1807 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode 4315 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: All directories in dfs.data.dir are invalid.

2012-11-18 07 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 4315 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:

/ *

SHUTDOWN_MSG: Shutting down DataNode at rac1.localdomain/192.168.2.101

* * /

-- found that the directory that is the hdfs-site.xml configuration file data directory has not been created

Resolve:

Create the data directory of hdfs on each node and modify the hdfs-site.xml profile parameters

[grid@gc] # mkdir-p / home/grid/hadoop-0.20.2/data

[grid@gc conf] # vi hdfs-site.xml

< xml version="1.0" >

< xml-stylesheet type="text/xsl" href="configuration.xsl" >

Dfs.data.dir

/ home/grid/hadoop-0.20.2/data-- Note that this directory must have been created and can read and write

Dfs.replication

two

-- restarting the hadoop,slave process is normal

[grid@gc bin] $. / stop-all.sh

[grid@gc bin] $. / start-all.sh

The above is all the contents of the article "how to fully distributed install Hadoop". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.