Construction and Management of Hadoop Environment on CentOS 11/01 Update SLTechnology News&Howtos

Construction and Management of Hadoop Environment on CentOS

2025-11-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Construction and Management of Hadoop Environment on CentOS

Please down load the p_w_upload

Editor date: September 1, 2015

Experimental requirements:

Complete the installation and deployment of the Hadoop platform, test the function and performance of the Hadoop platform, record the experimental process, and submit the experimental report.

1) master the Hadoop installation process

2) understand how Hadoop works

3) Test the scalability of Hadoop system

4) Test the stability of Hadoop system

I. prerequisites

Ensure that all required software is installed on each node in the cluster: JDK, ssh,Hadoop (2.6.0).

1) JDK must be installed (version 1.7 or above). It is recommended to choose the Java version issued by Sun.

2) ssh must install and keep sshd running so that Hadoop scripts can be used to manage remote Hadoop daemons.

II. Installation and configuration of Hadoop

When HDFS starts the dfs and yarn services on the Master node, it needs to start the Slave node service automatically, and HDFS needs to access the Slave node machine through ssh. HDFS needs to build multiple servers to form a distributed system, and node computers need password-free access. The task of this section is to set up ssh, create users, set hadoop parameters, and complete the construction of HDFS distributed environment.

Task implementation:

The task in this section requires four node machines to form a cluster, and the CentOS-6.5-x86_64 system is installed on each node machine. The IP addresses used by the four node machines are 192.168.23.111,192.168.23.112,192.168.23.113,192.168.23.114 respectively, and the corresponding node host names are: node1, node2, node3, node4. The node machine node1 is used as NameNode, and the others as DataNode.

On the node1 host

Edit vi / etc/hosts and add the following:

192.168.23.111 node1

192.168.23.112 node2

192.168.23.113 node3

192.168.23.114 node4

Edit vi / etc/sysconfig/network, modify

HOSTNAME=node1

Turn off the firewall

Chkconfig iptables off

Service iptables stop

Do a similar thing on other node hosts, but you need to change the value of HOSTNAME to the corresponding hostname.

Step 1

Create a hadoop user and create a user hadoop,uid=660 on four node machines with passwords of h2111, h3222, h4333 and h5444, respectively. Log in to the node1 node machine, create the hadoop user and set the password. The operation commands are as follows.

[root@node1] # useradd-u 660 hadoop

[root@node1 ~] # passwd hadoop

The operation of other node machines is the same.

Step 2

Set up master node machine ssh login slave node machine without password.

(1) on the node1 node, log in as the user hadoop user or use su-hadoop to switch to the hadoop user. The operation commands are as follows.

[root@node1 ~] # su-hadoop

(2) use ssh-keygen to generate the certificate key. The operation command is as follows.

[hadoop@node1] $ssh-keygen-t dsa

(3) use ssh-copy-id to copy the certificate public key to the node1,node2,node3,node4 node. The operation command is as follows.

[hadoop@node1] $cd ~

Hadoop@node1 ~] $ssh-copy-id-I. ssh / id_dsa.pub node1

[hadoop@node1 ~] $ssh-copy-id-I. ssh / id_dsa.pub node2

[hadoop@node1 ~] $ssh-copy-id-I. ssh / id_dsa.pub node3

[hadoop@node1 ~] $ssh-copy-id-I. ssh / id_dsa.pub node4

(4) log in to the node1 node machine without a password using ssh test on the node1 node machine, the operation command is as follows.

[hadoop@node1 ~] $ssh node1

Last Login: Mon Dec 22 08:42:38 2014 from node1

[hadoop@node1 ~] $exit

Logout

Connection to node1 closed.

Above indicates that the operation is successful.

Continue to use ssh to test password-less login to node2, node3, and node4 node machines on the node1 node machine, with the following commands.

[hadoop@node1 ~] $ssh node2

[hadoop@node1 ~] $ssh node3

[hadoop@node1 ~] $ssh node4

After testing and logging in to each node machine, enter exit to exit.

Step 3

Upload or download the hadoop-2.6.0.tar.gz package to the root directory of the node1 node machine. If the hadoop package is compiled on the node1 node, copy the compiled package to the root directory. First find the address http://mirror.bit.edu.cn/apache/hadoop/common/ of the desired software package, as shown in the figure.

Then use the wget command or other command to download the required software package, for example:

Wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz

Step 4

Extract the file and install the file. The operation commands are as follows.

[root@node1 ~] # cd

[root@node1 ~] # tar xvzf hadoop-2.6.0.tar.gz

[root@node1 ~] # cd hadoop-2.6.0

[root@node1 hadoop-2.6.0] # mv * / home/hadoop/

Step 5

Modify the hadoop configuration file. The main Hadoop configuration files are: hadoop-env.sh, yarn-env.sh, slaves, core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml. The configuration file is in the / home/hadoop/etc/hadoop/ directory, which you can enter to configure. The operation commands are as follows.

[root@node1 hadoop-2.6.0] # cd / home/hadoop/etc/hadoop/

(1) modify hadoop-env.sh

If you have not already installed java, install java first

Yum-y install java-1.7.0-openjdk*

Problems with the installation can be dealt with by referring to the tutorial below.

Http://jingyan.baidu.com/article/4853e1e51d0c101909f72607.html

Check whether the / etc/profile of each host has a JAVA_HOME variable, and do not add it at the end:

JAVA_HOME=/usr/lib/jvm/java-1.7.0

Export JAVA_HOME

PATH=$JAVA_HOME/bin:$PATH

CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

Export PATH

Export CLASSPATH

Export HADOOP_HOME=/home/hadoop/

Export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Execute source / etc/profile after saving and exiting

Modify export JAVA_HOME=$ {JAVA_HOME} in the hadoop-env.sh file to

Export JAVA_HOME=/usr/lib/jvm/java-1.7.0

(2) modify slaves, which registers the hostnames of DataNode nodes, and adds the hostnames of three node2,node3,node4 nodes. As shown below.

[root@node1 hadoop] # vi slaves

Node2

Node3

Node4

(3) modify the core-site.xml to change the content in the file to the following.

Fs.defaultFS

Hdfs://node1:9000

Io.file.buffer.size

131072

Hadoop.tmp.dir

File:/home/hadoop/tmp

Abase for other temporary directories.

Hadoop.proxyuser.hadoop.hosts

Hadoop.proxyuser.hadoop.groups

Where node1 is the NameNode (Master) node machine of the cluster, and node1 can be represented by IP address.

(4) modify the hdfs-site.xml to change the content in the file to the following.

Dfs.namenode.secondary.http-address

Node1:9001

Dfs.namenode.name.dir

File:/home/hadoop/dfs/name

Dfs.datanode.data.dir

File:/home/hadoop/dfs/data

Dfs.replication

three

Dfs.webhdfs.enabled

True

In order to facilitate teaching, the second NameNode also uses node1 node machine, the data generated by NameNode is stored in the / home/hadoop/dfs/name directory, the data generated by DataNode is stored in the / home/hadoop/dfs/data directory, set up three backups.

(5) rename the file mapred-site.xml.template to mapred-site.xml. The operation is as follows.

[root@node1 hadoop] # mv mapred-site.xml.template mapred-site.xml

Change the contents in the file to the following.

Mapreduce.framework.name

Yarn

Mapreduce.jobhistory.address

Node1:10020

Mapreduce.jobhistory.webapp.address

Node1:19888

(6) modify the yarn-site.xml to change the content in the file to the following.

Yarn.resourcemanager.hostname

192.168.23.111

Yarn.nodemanager.aux-services

Mapreduce_shuffle

Yarn.nodemanager.aux-services.mapreduce.shuffle.class

Org.apache.hadoop.mapred.ShuffleHandler

Yarn.resourcemanager.address

Node1:8032

Yarn.resourcemanager.scheduler.address

Node1:8030

Yarn.resourcemanager.resource-tracker.address

Node1:8031

Yarn.resourcemanager.admin.address

Node1:8033

Yarn.resourcemanager.webapp.address

Node1:8088

Step 6

Modify the user master / group properties of the "/ home/hadoop/" file as follows.

[root@node1 hadoop] # chown-R hadoop:hadoop / home/hadoop

Step 7

Copy the configured hadoop system to another node machine as follows.

[root@node1 hadoop] # cd / home/hadoop

[root@node1 hadoop] # scp-r hadoop-2.6.0 hadoop@node2:/home/hadoop

[root@node1 hadoop] # scp-r hadoop-2.6.0 hadoop@node3:/home/hadoop

[root@node1 hadoop] # scp-r hadoop-2.6.0 hadoop@node4:/home/hadoop

Step 8

[root@node2~] # chown-R hadoop:hadoop / home/hadoop

[root@node3~] # chown-R hadoop:hadoop / home/hadoop

[root@node4~] # chown-R hadoop:hadoop / home/hadoop

So far, the whole hadoop distributed system has been built.

III. Management of Hadoop

1. Format a new distributed file system

Format a new distributed file system first

$cd / home/hadoop

$bin/hadoop namenode-format

System output in case of success:

/ opt/hadoop/hadoopfs/name has been successfully formatted.

View the output to ensure that the distributed file system is formatted successfully

After execution, you can see the / home/hadoop/name directory on the master machine.

two。 Start the distributed file service

Sbin/start-all.sh

Sbin/start-dfs.sh

Sbin/start-yarn.sh

Use a browser to browse the Master node machine http://192.168.23.111:50070, view the NameNode node status, and browse the Datanodes data node.

Use a browser to browse the Master node machine http://192.168.23.111:8088 to view all applications.

3. Turn off the distributed file service

Sbin/stop-all.sh

4. Document management

Create a swvtc directory in hdfs with the following command.

[hadoop@node1] $hdfs dfs-mkdir / swvtc # is similar to mkdir / swvtc

To view the current directory in hdfs, the operation command is as follows.

[hadoop@node1 ~] $hdfs dfs-ls / # similar to ls /

Found 1 items

Drwxr-xr-x-hadoop supergroup 0 2014-12-23 10:07 / swvtc

To edit the file jie.txt on the local system, the command is as follows.

[hadoop@node1 ~] $vi jie.txt

Add content:

Hi,Hadoop!

Upload the file jie.txt to the / swvtc directory of hdfs with the following command.

[hadoop@node1 ~] $hdfs dfs-put jie.txt / swvtc

Download the file from hdfs. Operation command:

[hadoop@node1 ~] $hdfs dfs-get / swvtc/jie.txt

Check the contents of / swvtc/jie.txt in hdfs, and operate the command:

[hadoop@node1 ~] $hdfs dfs-text / swvtc/jie.txt

Hi,Hadoop!

Hadoop dfs-getin getin gets the file from HDFS and renames it to getin, just like put to manipulate files and directories.

Hadoop dfs-rmr out deletes the specified file from the HDFS

Hadoop dfs-cat in/* to view the contents of the in directory on HDFS

Hadoop dfsadmin-report looks at the basic statistics of HDFS, and the results are as follows

Hadoop dfsadmin-safemode leave exits safe mode

Hadoop dfsadmin-safemode enter enters safe mode

5. Add nod

Scalability is an important feature of HDFS. First, install hadoop on the newly added node, then modify the $HADOOP_HOME/conf/ master file to add the NameNode hostname, then modify the $HADOOP_HOME/conf/slaves file on the NameNode node, add the new node hostname, and then establish a password-free SSH connection to the new node.

Run the startup command:

. / start-all.sh

You can then view the newly added DataNode through the hostname of http://(Masternode): 50070

6. Load balancing

Run the command:

. / start-balancer.sh

You can make the selection policy on the DataNode node rebalance the distribution of data blocks on the DataNode.

Attachment: http://down.51cto.com/data/2366095

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.