Construction of Hadoop distributed Cluster Environment 07/15 Update SLTechnology News&Howtos

Construction of Hadoop distributed Cluster Environment

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Introduction to the Environment of distributed Environment Construction

Previously, we have introduced how to build a pseudo-distributed Hadoop environment on a single machine, but in practice, it must be a multi-machine and multi-node distributed cluster environment, so this article will briefly introduce how to build a Hadoop distributed environment on multiple machines.

I have prepared three machines here. The IP address is as follows:

192.168.77.128192.168.77.130192.168.77.134

First edit the / etc/hosts configuration file on these three machines, modify the hostname, and configure the hostname of the other machines

[root@localhost ~] # vim / etc/hosts # all three machines need to operate 192.168.77.128 hadoop000192.168.77.130 hadoop001192.168.77.134 hadoop002 [root@localhost ~] # reboot

The role of the three machines in the cluster:

Hadoop000 login as NameNode, DataNode, ResourceManager, NodeManagerhadoop001 as DataNode, NodeManagerhadoop002 as DataNode, NodeManager configuration ssh password-free login

Machines between clusters need to communicate with each other, so we have to configure password-free login first. Run the following command on each of the three machines to generate a key pair:

[root@hadoop000] # ssh-keygen-t rsa # all three machines need to execute this command to generate a key pair Generating public/private rsa key pair.Enter file in which to save the key (/ root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in / root/.ssh/id_rsa.Your public key has been saved in / root/.ssh/id_rsa.pub.The key fingerprint is:0d:00: Bd:a3:69:b7:03:d5:89:dc:a8:a2:ca:28:d6:06 root@hadoop000The key's randomart image is:+-- [RSA 2048]-+ | .o. |. | |. *. | | B + o | | = .S. | | E. *. | | .oo. | | | =. O o | | *.. . | | +-+ [root@hadoop000 ~] # ls .ssh / authorized_keys id_rsa id_rsa.pub known_ hosts [root @ hadoop000 ~] # |

Based on hadoop000, execute the following command to copy the public key to other machines:

[root@hadoop000] # ssh-copy-id-I / .ssh/id_rsa.pub hadoop000 [root@hadoop000] # ssh-copy-id-I / .ssh/id_rsa.pub hadoop001 [root@hadoop000 ~] # ssh-copy-id-I / .ssh/id_rsa.pub hadoop002

Note: the other two machines also need to execute the above three commands.

After the copy is complete, test whether the secret-free login can be performed normally:

[root@hadoop000] # ssh hadoop000Last login: Mon Apr 2 17:20:02 2018 from localhost [root@hadoop000] # ssh hadoop001Last login: Tue Apr 3 00:49:59 2018 from 192.168.77.1 [root@hadoop001] # Logout Connection to hadoop001 closed. [root@hadoop000] # ssh hadoop002Last login: Tue Apr 3 00:50:03 2018 from 192.168.77.1 [root@hadoop002] # Logout Connection to hadoop002 closed. [root@hadoop000] # Logout Connection to hadoop000 closed. [root@hadoop000] #

As mentioned above, the hadoop000 machine has been able to log in to the other two machines without secret, so our configuration is successful.

Install JDK

Go to the official website of Oracle to get the download link for JDK. I use JDK1.8 here at the following address:

Http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

Use the wget command to download JDK to the / usr/local/src/ directory, which I have downloaded here:

[root@hadoop000 ~] # cd / usr/local/src/ [root@hadoop000 / usr/local/src] # lsjdk-8u151-linux-x64.tar.gz [root@hadoop000 / usr/local/src] #

Extract the downloaded package and move the extracted directory to the / usr/local/ directory:

[root@hadoop000 / usr/local/src] # tar-zxvf jdk-8u151-linux-x64.tar.gz [root@hadoop000 / usr/local/src] # mv. / jdk1.8.0_151 / usr/local/jdk1.8

Edit the / etc/profile file to configure environment variables:

[root@hadoop000 ~] # vim / etc/profile # add the following content JAVA_HOME=/usr/local/jdk1.8/JAVA_BIN=/usr/local/jdk1.8/binJRE_HOME=/usr/local/jdk1.8/jrePATH=$PATH:/usr/local/jdk1.8/bin:/usr/local/jdk1.8/jre/binCLASSPATH=/usr/local/jdk1.8/jre/lib:/usr/local/jdk1.8/lib:/usr/local/jdk1.8/jre/lib/charsets .jarexport PATH=$PATH:/usr/local/mysql/bin/

Use the source command to load the configuration file to make it take effect, and then execute the java-version command to see the JDK version:

[root@hadoop000 ~] # source / etc/profile [root@hadoop000 ~] # java-versionjava version "1.8.0mm 151" Java (TM) SE Runtime Environment (build 1.8.0_151-b12) Java HotSpot (TM) 64-Bit Server VM (build 25.151-b12, mixed mode) [root@hadoop000 ~] #

After installing JDK on hadoop000, synchronize JDK and configuration files to other machines through the rsync command:

[root@hadoop000 ~] # rsync-av / usr/local/jdk1.8 hadoop001:/usr/local [root@hadoop000 ~] # rsync-av / usr/local/jdk1.8 hadoop002:/usr/local [root@hadoop000 ~] # rsync-av / etc/profile hadoop001:/etc/profile [root@hadoop000 ~] # rsync-av / etc/profile hadoop002:/etc/profile

After the synchronization is completed, source the configuration files on both machines to make the environment variables take effect, and then execute the java-version command to test whether JDK has been installed successfully.

Hadoop configuration and Distribution

Download the tar.gz package for Hadoop 2.6.0-cdh6.7.0 and extract it:

[root@hadoop000 ~] # cd / usr/local/src/ [root@hadoop000 / usr/local/src] # wget http://archive.cloudera.com/cdh6/cdh/5/hadoop-2.6.0-cdh6.7.0.tar.gz[root@hadoop000 / usr/local/src] # tar-zxvf hadoop-2.6.0-cdh6.7.0.tar.gz-C / usr/local/

Note: if you are slow to download on Linux, you can use this link on windows's Thunderbolt to download. Then upload it to Linux, which will be faster.

After decompressing, enter the decompressed directory, and you can see that the directory structure of hadoop is as follows:

[root@hadoop000 / usr/local/src] # cd / usr/local/hadoop-2.6.0-cdh6.7.0/ [root@hadoop000 / usr/local/hadoop-2.6.0-cdh6.7.0] # lsbin cloudera examples include libexec NOTICE.txt sbin srcbin-mapreduce1 etc examples-mapreduce1 lib LICENSE.txt README.txt share [root@hadoop000 / usr/local/hadoop-2.6.0-cdh6.7.0] #

A brief description of what is stored in several of these directories:

Bin directory stores executable files etc directory stores configuration files sbin directory stores services startup commands jar packages and documents are stored in share directory

Even if the hadoop is installed, the next step is to edit the configuration file and configure the JAVA_HOME:

[root@hadoop000 / usr/local/hadoop-2.6.0-cdh6.7.0] # cd etc/ [root@hadoop000 / usr/local/hadoop-2.6.0-cdh6.7.0/etc] # cd hadoop [root@hadoop000 / usr/local/hadoop-2.6.0-cdh6.7.0/etc/hadoop] # vim hadoop-env.shexport JAVA_HOME=/usr/local/jdk1.8/ # modify according to your environment variables [root@hadoop000 / usr/local/hadoop-2.6.0-cdh6.7.0/etc/hadoop] #

Then configure the installation directory of Hadoop into the environment variable to make it easier to use its commands later:

[root@hadoop000 ~] # vim ~ / .bash_profile # add the following content export HADOOP_HOME=/usr/local/hadoop-2.6.0-cdh6.7.0/export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ path [root @ localhost ~] # source! $source ~ / .bash_ profile [root @ localhost ~] #

Then edit the core-site.xml and hdfs-site.xml configuration files respectively:

[root@hadoop000 ~] # cd $HADOOP_ Home [root @ hadoop000 / usr/local/hadoop-2.6.0-cdh6.7.0] # cd etc/hadoop [root@hadoop000 / usr/local/hadoop-2.6.0-cdh6.7.0/etc/hadoop] # vim core-site.xml # add the following content: fs.default.name hdfs://hadoop000:8020 # specify the default access address and port number [root@hadoop000 / usr/local/hadoop-2.6.0-cdh6.7.0/etc/hadoop] # vim hdfs-site.xml # add the following content: dfs.namenode.name.dir / data/hadoop/app/tmp/dfs/name # directory where temporary namenode files are stored dfs.datanode.data.dir / data/hadoop/app/tmp/dfs/data # Directory where temporary datanode files are stored [root@hadoop000 / usr/local/hadoop-2.6.0-cdh6.7.0/etc/hadoop] # mkdir-p / data/hadoop/app/tmp/dfs/name [root@hadoop000 / usr/local/hadoop-2.6.0-cdh6.7.0/etc/hadoop] # mkdir-p / data/hadoop/app/tmp/dfs/data

Next, you need to edit the yarn-site.xml configuration file:

[root@hadoop000 / usr/local/hadoop-2.6.0-cdh6.7.0/etc/hadoop] # vim yarn-site.xml # add the following content yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.hostname hadoop000 [root@hadoop000 / usr/local/hadoop-2.6.0-cdh6.7.0/etc/hadoop] #

Copy and edit the configuration file for MapReduce:

[root@hadoop000 / usr/local/hadoop-2.6.0-cdh6.7.0/etc/hadoop] # cp mapred-site.xml.template mapred-site.xml [root@hadoop000 / usr/local/hadoop-2.6.0-cdh6.7.0/etc/hadoop] # vim! $# add the following content mapreduce.framework.name yarn [root@hadoop000 / usr/local/hadoop-2.6.0-cdh6.7.0/etc/hadoop] #

Finally, configure the hostname of the slave node, and use IP if the hostname is not configured:

[root@hadoop000 / usr/local/hadoop-2.6.0-cdh6.7.0/etc/hadoop] # vim slaveshadoop000hadoop001hadoop002 [root@hadoop000 / usr/local/hadoop-2.6.0-cdh6.7.0/etc/hadoop] #

So far, we have built the Hadoop cluster environment of our master node (master) on hadoop000, but there are two other machines as slave nodes (slave) that are not configured with Hadoop environment, so next we need to distribute the Hadoop installation directory and environment variable configuration files on hadoop000 to the other two machines, and execute the following commands:

[root@hadoop000 ~] # rsync-av / usr/local/hadoop-2.6.0-cdh6.7.0/ hadoop001:/usr/local/hadoop-2.6.0-cdh6.7.0/ [root@hadoop000 ~] # rsync-av / usr/local/hadoop-2.6.0-cdh6.7.0/ hadoop002:/usr/local/hadoop-2.6.0-cdh6.7.0/ [root@hadoop000 ~] # rsync-av ~ / .bash_profile hadoop001:~/ .bash _ profile [root @ hadoop000 ~] # rsync-av ~ / .bash_profile hadoop002:~/.bash_profile

After the distribution is complete, execute the source command and create a temporary directory on both machines:

[root@hadoop001 ~] # source .bash _ profile [root @ hadoop001 ~] # mkdir-p / data/hadoop/app/tmp/dfs/name [root@hadoop001 ~] # mkdir-p / data/hadoop/app/tmp/dfs/data [root@hadoop002 ~] # source .bash _ profile [roo t@hadoop002 ~] # mkdir-p / data/hadoop/app/tmp/dfs/name [root@hadoop002 ~] # mkdir-p / data/hadoop/app/tmp/dfs/dataHadoop format and start and stop

To format NameNode, you only need to execute it on hadoop000:

[root@hadoop000 ~] # hdfs namenode-format

After the formatting is complete, you can start the Hadoop cluster:

[root@hadoop000] # start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh18/04/02 20:10:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicableStarting namenodes on [hadoop000] hadoop000: starting namenode, logging to / usr/local/hadoop-2.6.0-cdh6.7.0/logs/hadoop-root-namenode-hadoop000.outhadoop000: starting datanode, logging to / usr/local/hadoop-2.6.0-cdh6.7.0/logs/hadoop-root-datanode-hadoop000.outhadoop001: starting datanode, logging to / usr/local/hadoop-2.6.0-cdh6.7.0/logs/hadoop-root-datanode-hadoop001.outhadoop002: starting datanode Logging to / usr/local/hadoop-2.6.0-cdh6.7.0/logs/hadoop-root-datanode-hadoop002.outStarting secondary namenodes [0.0.0.0] The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.ECDSA key fingerprint is 4d:5a:9d:31:65:75:30:47:a3:9c:f5:56:63:c4:0f:6a.Are you sure you want to continue connecting (yes/no)? Yes # enter yes to get 0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.0.0.0.0: starting secondarynamenode, logging to / usr/local/hadoop-2.6.0-cdh6.7.0/logs/hadoop-root-secondarynamenode-hadoop000.out18/04/02 20:11:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicablestarting yarn daemonsstarting resourcemanager, logging to / usr/local/hadoop-2.6.0-cdh6.7.0/logs/yarn-root-resourcemanager-hadoop000.outhadoop001: starting nodemanager, logging to / usr/local/hadoop-2.6.0-cdh6.7.0/logs/yarn-root-nodemanager-hadoop001.outhadoop002: starting nodemanager, logging to / usr/local/hadoop-2.6.0-cdh6.7.0/logs/yarn-root-nodemanager-hadoop002.outhadoop000: starting nodemanager Logging to / usr/local/hadoop-2.6.0-cdh6.7.0/logs/yarn-root-nodemanager-hadoop000.out [root@hadoop000 ~] # jps # check whether there are the following processes 6256 Jps5538 DataNode5843 ResourceManager5413 NameNode5702 SecondaryNameNode5945 NodeManager [root@hadoop000 ~] #

Check the progress on the other two machines:

Hadoop001:

[root@hadoop001 ~] # jps3425 DataNode3538 NodeManager3833 Jps [root@hadoop001 ~] #

Hadoop002:

[root@hadoop002 ~] # jps3171 DataNode3273 NodeManager3405 Jps [root@hadoop002 ~] #

After checking the process of each machine and making sure that there is no problem, access port 50070 of the master node on the browser, for example: 192.168.77.128virtual 50070. You will visit the following page:

Click "Live Nodes" to view the surviving nodes:

As above, access to port 50070 means that the HDFS in the cluster is normal.

Next we need to access port 8088 of the primary node, which is the web service port of YARN, for example: 192.168.77.128aster 8088. As follows:

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.