In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Construction and Management of Hadoop Environment on CentOS
Please down load the p_w_upload
Editor date: September 1, 2015
Experimental requirements:
Complete the installation and deployment of the Hadoop platform, test the function and performance of the Hadoop platform, record the experimental process, and submit the experimental report.
1) master the Hadoop installation process
2) understand how Hadoop works
3) Test the scalability of Hadoop system
4) Test the stability of Hadoop system
I. prerequisites
Ensure that all required software is installed on each node in the cluster: JDK, ssh,Hadoop (2.6.0).
1) JDK must be installed (version 1.7 or above). It is recommended to choose the Java version issued by Sun.
2) ssh must install and keep sshd running so that Hadoop scripts can be used to manage remote Hadoop daemons.
II. Installation and configuration of Hadoop
When HDFS starts the dfs and yarn services on the Master node, it needs to start the Slave node service automatically, and HDFS needs to access the Slave node machine through ssh. HDFS needs to build multiple servers to form a distributed system, and node computers need password-free access. The task of this section is to set up ssh, create users, set hadoop parameters, and complete the construction of HDFS distributed environment.
Task implementation:
The task in this section requires four node machines to form a cluster, and the CentOS-6.5-x86_64 system is installed on each node machine. The IP addresses used by the four node machines are 192.168.23.111,192.168.23.112,192.168.23.113,192.168.23.114 respectively, and the corresponding node host names are: node1, node2, node3, node4. The node machine node1 is used as NameNode, and the others as DataNode.
On the node1 host
Edit vi / etc/hosts and add the following:
192.168.23.111 node1
192.168.23.112 node2
192.168.23.113 node3
192.168.23.114 node4
Edit vi / etc/sysconfig/network, modify
HOSTNAME=node1
Turn off the firewall
Chkconfig iptables off
Service iptables stop
Do a similar thing on other node hosts, but you need to change the value of HOSTNAME to the corresponding hostname.
Step 1
Create a hadoop user and create a user hadoop,uid=660 on four node machines with passwords of h2111, h3222, h4333 and h5444, respectively. Log in to the node1 node machine, create the hadoop user and set the password. The operation commands are as follows.
[root@node1] # useradd-u 660 hadoop
[root@node1 ~] # passwd hadoop
The operation of other node machines is the same.
Step 2
Set up master node machine ssh login slave node machine without password.
(1) on the node1 node, log in as the user hadoop user or use su-hadoop to switch to the hadoop user. The operation commands are as follows.
[root@node1 ~] # su-hadoop
(2) use ssh-keygen to generate the certificate key. The operation command is as follows.
[hadoop@node1] $ssh-keygen-t dsa
(3) use ssh-copy-id to copy the certificate public key to the node1,node2,node3,node4 node. The operation command is as follows.
[hadoop@node1] $cd ~
Hadoop@node1 ~] $ssh-copy-id-I. ssh / id_dsa.pub node1
[hadoop@node1 ~] $ssh-copy-id-I. ssh / id_dsa.pub node2
[hadoop@node1 ~] $ssh-copy-id-I. ssh / id_dsa.pub node3
[hadoop@node1 ~] $ssh-copy-id-I. ssh / id_dsa.pub node4
(4) log in to the node1 node machine without a password using ssh test on the node1 node machine, the operation command is as follows.
[hadoop@node1 ~] $ssh node1
Last Login: Mon Dec 22 08:42:38 2014 from node1
[hadoop@node1 ~] $exit
Logout
Connection to node1 closed.
Above indicates that the operation is successful.
Continue to use ssh to test password-less login to node2, node3, and node4 node machines on the node1 node machine, with the following commands.
[hadoop@node1 ~] $ssh node2
[hadoop@node1 ~] $ssh node3
[hadoop@node1 ~] $ssh node4
After testing and logging in to each node machine, enter exit to exit.
Step 3
Upload or download the hadoop-2.6.0.tar.gz package to the root directory of the node1 node machine. If the hadoop package is compiled on the node1 node, copy the compiled package to the root directory. First find the address http://mirror.bit.edu.cn/apache/hadoop/common/ of the desired software package, as shown in the figure.
Then use the wget command or other command to download the required software package, for example:
Wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
Step 4
Extract the file and install the file. The operation commands are as follows.
[root@node1 ~] # cd
[root@node1 ~] # tar xvzf hadoop-2.6.0.tar.gz
[root@node1 ~] # cd hadoop-2.6.0
[root@node1 hadoop-2.6.0] # mv * / home/hadoop/
Step 5
Modify the hadoop configuration file. The main Hadoop configuration files are: hadoop-env.sh, yarn-env.sh, slaves, core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml. The configuration file is in the / home/hadoop/etc/hadoop/ directory, which you can enter to configure. The operation commands are as follows.
[root@node1 hadoop-2.6.0] # cd / home/hadoop/etc/hadoop/
(1) modify hadoop-env.sh
If you have not already installed java, install java first
Yum-y install java-1.7.0-openjdk*
Problems with the installation can be dealt with by referring to the tutorial below.
Http://jingyan.baidu.com/article/4853e1e51d0c101909f72607.html
Check whether the / etc/profile of each host has a JAVA_HOME variable, and do not add it at the end:
JAVA_HOME=/usr/lib/jvm/java-1.7.0
Export JAVA_HOME
PATH=$JAVA_HOME/bin:$PATH
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
Export PATH
Export CLASSPATH
Export HADOOP_HOME=/home/hadoop/
Export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Execute source / etc/profile after saving and exiting
Modify export JAVA_HOME=$ {JAVA_HOME} in the hadoop-env.sh file to
Export JAVA_HOME=/usr/lib/jvm/java-1.7.0
(2) modify slaves, which registers the hostnames of DataNode nodes, and adds the hostnames of three node2,node3,node4 nodes. As shown below.
[root@node1 hadoop] # vi slaves
Node2
Node3
Node4
(3) modify the core-site.xml to change the content in the file to the following.
Fs.defaultFS
Hdfs://node1:9000
Io.file.buffer.size
131072
Hadoop.tmp.dir
File:/home/hadoop/tmp
Abase for other temporary directories.
Hadoop.proxyuser.hadoop.hosts
*
Hadoop.proxyuser.hadoop.groups
*
Where node1 is the NameNode (Master) node machine of the cluster, and node1 can be represented by IP address.
(4) modify the hdfs-site.xml to change the content in the file to the following.
Dfs.namenode.secondary.http-address
Node1:9001
Dfs.namenode.name.dir
File:/home/hadoop/dfs/name
Dfs.datanode.data.dir
File:/home/hadoop/dfs/data
Dfs.replication
three
Dfs.webhdfs.enabled
True
In order to facilitate teaching, the second NameNode also uses node1 node machine, the data generated by NameNode is stored in the / home/hadoop/dfs/name directory, the data generated by DataNode is stored in the / home/hadoop/dfs/data directory, set up three backups.
(5) rename the file mapred-site.xml.template to mapred-site.xml. The operation is as follows.
[root@node1 hadoop] # mv mapred-site.xml.template mapred-site.xml
Change the contents in the file to the following.
Mapreduce.framework.name
Yarn
Mapreduce.jobhistory.address
Node1:10020
Mapreduce.jobhistory.webapp.address
Node1:19888
(6) modify the yarn-site.xml to change the content in the file to the following.
Yarn.resourcemanager.hostname
192.168.23.111
Yarn.nodemanager.aux-services
Mapreduce_shuffle
Yarn.nodemanager.aux-services.mapreduce.shuffle.class
Org.apache.hadoop.mapred.ShuffleHandler
Yarn.resourcemanager.address
Node1:8032
Yarn.resourcemanager.scheduler.address
Node1:8030
Yarn.resourcemanager.resource-tracker.address
Node1:8031
Yarn.resourcemanager.admin.address
Node1:8033
Yarn.resourcemanager.webapp.address
Node1:8088
Step 6
Modify the user master / group properties of the "/ home/hadoop/" file as follows.
[root@node1 hadoop] # chown-R hadoop:hadoop / home/hadoop
Step 7
Copy the configured hadoop system to another node machine as follows.
[root@node1 hadoop] # cd / home/hadoop
[root@node1 hadoop] # scp-r hadoop-2.6.0 hadoop@node2:/home/hadoop
[root@node1 hadoop] # scp-r hadoop-2.6.0 hadoop@node3:/home/hadoop
[root@node1 hadoop] # scp-r hadoop-2.6.0 hadoop@node4:/home/hadoop
Step 8
Log in to the node2,node3,node4 node machine and modify the "/ home/hadoop/" file user master / group attributes.
[root@node2~] # chown-R hadoop:hadoop / home/hadoop
[root@node3~] # chown-R hadoop:hadoop / home/hadoop
[root@node4~] # chown-R hadoop:hadoop / home/hadoop
So far, the whole hadoop distributed system has been built.
III. Management of Hadoop
1. Format a new distributed file system
Format a new distributed file system first
$cd / home/hadoop
$bin/hadoop namenode-format
System output in case of success:
/ opt/hadoop/hadoopfs/name has been successfully formatted.
View the output to ensure that the distributed file system is formatted successfully
After execution, you can see the / home/hadoop/name directory on the master machine.
two。 Start the distributed file service
Sbin/start-all.sh
Or
Sbin/start-dfs.sh
Sbin/start-yarn.sh
Use a browser to browse the Master node machine http://192.168.23.111:50070, view the NameNode node status, and browse the Datanodes data node.
Use a browser to browse the Master node machine http://192.168.23.111:8088 to view all applications.
3. Turn off the distributed file service
Sbin/stop-all.sh
4. Document management
Create a swvtc directory in hdfs with the following command.
[hadoop@node1] $hdfs dfs-mkdir / swvtc # is similar to mkdir / swvtc
To view the current directory in hdfs, the operation command is as follows.
[hadoop@node1 ~] $hdfs dfs-ls / # similar to ls /
Found 1 items
Drwxr-xr-x-hadoop supergroup 0 2014-12-23 10:07 / swvtc
To edit the file jie.txt on the local system, the command is as follows.
[hadoop@node1 ~] $vi jie.txt
Add content:
Hi,Hadoop!
Upload the file jie.txt to the / swvtc directory of hdfs with the following command.
[hadoop@node1 ~] $hdfs dfs-put jie.txt / swvtc
Download the file from hdfs. Operation command:
[hadoop@node1 ~] $hdfs dfs-get / swvtc/jie.txt
Check the contents of / swvtc/jie.txt in hdfs, and operate the command:
[hadoop@node1 ~] $hdfs dfs-text / swvtc/jie.txt
Hi,Hadoop!
Hadoop dfs-getin getin gets the file from HDFS and renames it to getin, just like put to manipulate files and directories.
Hadoop dfs-rmr out deletes the specified file from the HDFS
Hadoop dfs-cat in/* to view the contents of the in directory on HDFS
Hadoop dfsadmin-report looks at the basic statistics of HDFS, and the results are as follows
Hadoop dfsadmin-safemode leave exits safe mode
Hadoop dfsadmin-safemode enter enters safe mode
5. Add nod
Scalability is an important feature of HDFS. First, install hadoop on the newly added node, then modify the $HADOOP_HOME/conf/ master file to add the NameNode hostname, then modify the $HADOOP_HOME/conf/slaves file on the NameNode node, add the new node hostname, and then establish a password-free SSH connection to the new node.
Run the startup command:
. / start-all.sh
You can then view the newly added DataNode through the hostname of http://(Masternode): 50070
6. Load balancing
Run the command:
. / start-balancer.sh
You can make the selection policy on the DataNode node rebalance the distribution of data blocks on the DataNode.
Attachment: http://down.51cto.com/data/2366095
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.