In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
Editor to share with you how to fully distributed installation of Hadoop, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!
Hadoop fully distributed mode installation steps
Introduction to Hadoop mode
Stand-alone mode: easy to install, almost without any configuration, but only for debugging purposes
Pseudo-distribution mode: start namenode, datanode, jobtracker, tasktracker, secondary namenode and other five processes on a single node at the same time to simulate each node running in a distributed way
Fully distributed mode: a normal Hadoop cluster consisting of multiple nodes performing their respective functions
Installation environment
Operating platform: vmware2
Operating system: oracle linux 5.6
Software version: hadoop-0.22.0,jdk-6u18
Cluster architecture: 3 node,master node (gc), slave node (rac1,rac2)
Installation steps
1. Download Hadoop and jdk:
Such as: hadoop-0.22.0
two。 Configure the hosts file
All nodes (gc,rac1,rac2) modify / etc/hosts so that hostnames can be resolved to ip among each other
[root@gc ~] $cat / etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
:: 1 localhost6.localdomain6 localhost6
192.168.2.101 rac1.localdomain rac1
192.168.2.102 rac2.localdomain rac2
192.168.2.100 gc.localdomain gc
3. Set up a hadoop running account
Create hadoop running accounts on all nodes
[root@gc ~] # groupadd hadoop
[root@gc ~] # useradd-g hadoop grid-- Note that grouping must be specified here, otherwise mutual trust may not be established
[root@gc ~] # id grid
Uid=501 (grid) gid=54326 (hadoop) groups=54326 (hadoop)
[root@gc ~] # passwd grid
Changing password for user grid.
New UNIX password:
BAD PASSWORD: it is too short
Retype new UNIX password:
Passwd: all authentication tokens updated successfully.
4. Configure ssh password-free connection
Be careful to log in as the hadoop user and operate under the home directory of the hadoop user.
Each node does the same thing as follows
[hadoop@gc] $ssh-keygen-t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/ home/hadoop/.ssh/id_rsa):
Created directory'/ home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in / home/hadoop/.ssh/id_rsa.
Your public key has been saved in / home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
54:80:fd:77:6b:87:97:ce:0f:32:34:43:d1:d2:c2:0d hadoop@gc.localdomain
[hadoop@gc ~] $cd .ssh
[hadoop@gc .ssh] $ls
Id_rsa id_rsa.pub
Copy the contents of each node's authorized_keys into each other's file, and then you can avoid ssh connection with each other's passwords.
The operation can be completed in one of the nodes (gc)
[hadoop@gc .ssh] $cat ~ / .ssh/id_rsa.pub > > ~ / .ssh/authorized_keys
[hadoop@gc .ssh] $ssh rac1 cat ~ / .ssh/id_rsa.pub > > ~ / .ssh/authorized_keys
The authenticity of host 'rac1 (192.168.2.101)' can't be established.
RSA key fingerprint is 19:48:e0:0a:37:e1:2a:d5:ba:c8:7e:1b:37:c6:2f:0e.
Are you sure you want to continue connecting (yes/no) yes
Warning: Permanently added 'rac1192.168.2.101' (RSA) to the list of known hosts.
Hadoop@rac1's password:
[hadoop@gc .ssh] $ssh rac2 cat ~ / .ssh/id_rsa.pub > > ~ / .ssh/authorized_keys
The authenticity of host 'rac2 (192.168.2.102)' can't be established.
RSA key fingerprint is 19:48:e0:0a:37:e1:2a:d5:ba:c8:7e:1b:37:c6:2f:0e.
Are you sure you want to continue connecting (yes/no) yes
Warning: Permanently added 'rac2192.168.2.102' (RSA) to the list of known hosts.
Hadoop@rac2's password:
[hadoop@gc .ssh] $scp ~ / .ssh/authorized_keys rac1:~/.ssh/authorized_keys
Hadoop@rac1's password:
Authorized_keys 100% 1213 1.2KB/s 00:00
[hadoop@gc .ssh] $scp ~ / .ssh/authorized_keys rac2:~/.ssh/authorized_keys
Hadoop@rac2's password:
Authorized_keys 100% 1213 1.2KB/s 00:00
[hadoop@gc .ssh] $ll
Total 16
-rw-rw-r-- 1 hadoop hadoop 1213 10-30 09:18 authorized_keys
-rw- 1 hadoop hadoop 1675 10-30 09:05 id_rsa
-rw-r--r-- 1 hadoop hadoop 403 10-30 09:05 id_rsa.pub
-- Test the connection separately
[grid@gc .ssh] $ssh rac1 date
Sunday, November 18th, 2012, 01:35:39 CST
[grid@gc .ssh] $ssh rac2 date
Tuesday, October 30, 2012, 09:52:46 CST
-- you can see that this step is the same as using SSH to establish user equivalence in configuring oracle RAC.
5. Extract the hadoop installation package
-- you can extract the configuration file from a node first.
[grid@gc ~] $ll
Total 43580
-rw-r--r-- 1 grid hadoop 44575568 2012-11-19 hadoop-0.20.2.tar.gz
[grid@gc ~] $tar xzvf / home/grid/hadoop-0.20.2.tar.gz
[grid@gc ~] $ll
Total 43584
Drwxr-xr-x 12 grid hadoop 4096 2010-02-19 hadoop-0.20.2
-rw-r--r-- 1 grid hadoop 44575568 2012-11-19 hadoop-0.20.2.tar.gz
-- install jdk on each node
[root@gc ~] #. / jdk-6u18-linux-x64-rpm.bin
6. Hadoop configuration related files
N configure hadoop-env.sh
[root@gc conf] # pwd
/ root/hadoop-0.20.2/conf
-- modify the jdk installation path
[root@gc conf] vi hadoop-env.sh
Export JAVA_HOME=/usr/java/jdk1.6.0_18
N configure namenode, modify site file
-- modify the core-site.xml file
[gird@gc conf] # vi core-site.xml
< xml version="1.0" > < xml-stylesheet type="text/xsl" href="configuration.xsl" >Fs.default.name
Hdfs://192.168.2.100:9000-note that IP must be used here in fully distributed mode, same as below
Note: IP address and port of fs.default.name NameNode
-- modify the hdfs-site.xml file
[gird@gc conf] # vi hdfs-site.xml
< xml version="1.0" > < xml-stylesheet type="text/xsl" href="configuration.xsl" >Dfs.data.dir
/ home/grid/hadoop-0.20.2/data-- Note that this directory must have been created and can read and write
Dfs.replication
two
Common configuration parameters in hdfs-site.xml files:
-- modify the mapred-site.xml file
[gird@gc conf] # vi mapred-site.xml
< xml version="1.0" > < xml-stylesheet type="text/xsl" href="configuration.xsl" >Mapred.job.tracker
192.168.2.100:9001
Common configuration parameters in mapred-site.xml file
N configure masters and slaves files
[grid@gc conf] $vi masters
Gc
[grid@gc conf] $vi slaves
Rac1
Rac2
N copy hadoop to each node
-- copy the hadoop configuration files on the gc host to each node
-- Note: modify the IP of this node in the configuration file after copying to another node
[grid@gc conf] $scp-r hadoop-0.20.2 rac1:/home/grid/
[grid@gc conf] $scp-r hadoop-0.20.2 rac2:/home/grid/
7. Format namenode
-- formatting on each node
[grid@rac2 bin] $pwd
/ home/grid/hadoop-0.20.2/bin
[grid@gc bin] $. / hadoop namenode-format
12-10-31 08:03:31 INFO namenode.NameNode: STARTUP_MSG:
/ *
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = gc.localdomain/192.168.2.100
STARTUP_MSG: args = [- format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build =; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
* * /
12-10-31 08:03:31 INFO namenode.FSNamesystem: fsOwner=grid,hadoop
12-10-31 08:03:31 INFO namenode.FSNamesystem: supergroup=supergroup
12-10-31 08:03:31 INFO namenode.FSNamesystem: isPermissionEnabled=true
12-10-31 08:03:32 INFO common.Storage: Image file of size 94 saved in 0 seconds.
12-10-31 08:03:32 INFO common.Storage: Storage directory / tmp/hadoop-grid/dfs/name has been successfully formatted.
12-10-31 08:03:32 INFO namenode.NameNode: SHUTDOWN_MSG:
/ *
SHUTDOWN_MSG: Shutting down NameNode at gc.localdomain/192.168.2.100
* * /
8. Start hadoop
-- start the hadoop daemon on the master node
[grid@gc bin] $pwd
/ home/grid/hadoop-0.20.2/bin
[grid@gc bin] $. / start-all.sh
Starting namenode, logging to / home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-namenode-gc.localdomain.out
Rac2: starting datanode, logging to / home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-datanode-rac2.localdomain.out
Rac1: starting datanode, logging to / home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-datanode-rac1.localdomain.out
The authenticity of host'gc (192.168.2.100) 'can't be established.
RSA key fingerprint is 8e:47:42:44:bd:e2:28:64:10:40:8e:b5:72:f9:6c:82.
Are you sure you want to continue connecting (yes/no) yes
Gc: Warning: Permanently added 'gc,192.168.2.100' (RSA) to the list of known hosts.
Gc: starting secondarynamenode, logging to / home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-secondarynamenode-gc.localdomain.out
Starting jobtracker, logging to / home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-jobtracker-gc.localdomain.out
Rac2: starting tasktracker, logging to / home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-tasktracker-rac2.localdomain.out
Rac1: starting tasktracker, logging to / home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-tasktracker-rac1.localdomain.out
9. Use jps to verify whether each background process starts successfully
-- View background processes in the master node
[grid@gc bin] $/ usr/java/jdk1.6.0_18/bin/jps
27462 NameNode
29012 Jps
27672 JobTracker
27607 SecondaryNameNode
-- View background processes in the slave node
[grid@rac1 conf] $/ usr/java/jdk1.6.0_18/bin/jps
16722 Jps
16672 TaskTracker
16577 DataNode
[grid@rac2 conf] $/ usr/java/jdk1.6.0_18/bin/jps
31451 DataNode
31547 TaskTracker
31608 Jps
10. Problems encountered during installation
1) Ssh cannot establish mutual trust
No grouping is specified when users are created, and mutual trust cannot be established in Ssh. The following steps
[root@gc ~] # useradd grid
[root@gc ~] # passwd grid
Resolve:
Create a new user group and specify this user group when you create the user.
[root@gc ~] # groupadd hadoop
[root@gc ~] # useradd-g hadoop grid
[root@gc ~] # id grid
Uid=501 (grid) gid=54326 (hadoop) groups=54326 (hadoop)
[root@gc ~] # passwd grid
2) after starting hadoop, the slave node does not have a datanode process
Phenomenon:
After the master node starts hadoop, the master node process is normal, but the slave node does not have a datanode process.
-- Master node is normal
[grid@gc bin] $/ usr/java/jdk1.6.0_18/bin/jps
29843 Jps
29703 JobTracker
29634 SecondaryNameNode
29485 NameNode
-- at this time, check the process in the two slave nodes and find that there is still no datanode process.
[grid@rac1 bin] $/ usr/java/jdk1.6.0_18/bin/jps
5528 Jps
3213 TaskTracker
[grid@rac2 bin] $/ usr/java/jdk1.6.0_18/bin/jps
30518 TaskTracker
30623 Jps
Reason:
-- look back at the output log when the master node starts hadoop, and find the log that starts the datanode process in the slave node
[grid@rac2 logs] $pwd
/ home/grid/hadoop-0.20.2/logs
[grid@rac1 logs] $more hadoop-grid-datanode-rac1.localdomain.log
/ *
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = rac1.localdomain/192.168.2.101
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build =; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
* * /
2012-11-18 07 can not create directory 4315 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in dfs.data.dir: can not create directory: / usr/hadoop-0.20.2/data
2012-11-1807 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode 4315 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: All directories in dfs.data.dir are invalid.
2012-11-18 07 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 4315 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/ *
SHUTDOWN_MSG: Shutting down DataNode at rac1.localdomain/192.168.2.101
* * /
-- found that the directory that is the hdfs-site.xml configuration file data directory has not been created
Resolve:
Create the data directory of hdfs on each node and modify the hdfs-site.xml profile parameters
[grid@gc] # mkdir-p / home/grid/hadoop-0.20.2/data
[grid@gc conf] # vi hdfs-site.xml
< xml version="1.0" > < xml-stylesheet type="text/xsl" href="configuration.xsl" >Dfs.data.dir
/ home/grid/hadoop-0.20.2/data-- Note that this directory must have been created and can read and write
Dfs.replication
two
-- restarting the hadoop,slave process is normal
[grid@gc bin] $. / stop-all.sh
[grid@gc bin] $. / start-all.sh
The above is all the contents of the article "how to fully distributed install Hadoop". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.