Construction of Hadoop cluster environment 10/17 Update SLTechnology News&Howtos

Construction of Hadoop cluster environment

2025-10-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Part I: prepare the Linux environment

The steps of creating a virtual machine installation system will not be discussed here. For detailed steps, please see my other articles.

Open the built virtual machine

1. Modify Hostname

1. Temporarily modify hostname

Hostname bigdata-01.liu.com

This modification will fail after the system is rebooted.

2. Permanently modify hostname

Vim / etc/sysconfig/network

After opening it, edit the following

NETWORKING=yes # using the network

HOSTNAME=bigdata-01.liu.com # set hostname

2. Configure Host

Vim / etc/hosts

Add the following

172.18.74.172 bigdata-01.liu.com

Third, turn off the firewall

View firewall status

Service iptables status

Temporarily turn off the firewall

Service iptables stop

Permanently shut down the firewall (reboot is required to take effect)

Chkconfig iptables off

4. Close selinux

Selinux is a sub-security mechanism of Linux, which can be disabled in the learning environment.

Vim / etc/sysconfig/selinux

Set SELINUX to disabled

SELINUX=disabled

5. Install JDK

Check if jdk is installed on the system

Java-version

If it shows that there is an openjdk that needs to be uninstalled before installing Oracle's jdk (other versions of jdk do not support some commands of hadoop)

Rpm-qa | grep java

Files that uninstall openjdk,.noarch can be deleted without deletion

Rpm-e-- nodeps java-1.7.0-openjdk-headless-1.7.0.191-2.6.15.4.e17_5.x86_64

Rpm-e-- nodeps java-1.7.0-openjdk-1.7.0.191-2.6.15.4.e17_5.x86_64

Then use rpm-qa again | grep java to see if openjdk has been uninstalled, and if there is one, uninstall it again

Remote virtual machine with xshell remote tool, install lrzsz command

Yum-y install lrzsz

Import the jdk package and extract it to the / opt/modules directory

Tar-zvxf jdk-8u181-linux-x64.tar.gz-C / opt/modules

Add environment variabl

To set the environment variable JAVA_HOME of JDK, you need to modify the configuration file / etc/profile, and append

Export JAVA_HOME= "/ opt/modules/jdk1.8.0_181"

Export PATH=$JAVA_HOME/bin:$PATH

After the modification is completed, execute source / etc/profile to make the modification take effect

Execute java-version again, and you can see that the installation is complete

Part II: hadoop installation

Full division is the real use of multiple Linux hosts to deploy Hadoop, planning for the Linux machine cluster, so that each module of Hadoop is deployed on different machines.

I. Environmental preparation

1. Clone a virtual machine

Select the machine to be cloned on the left side of the Vmware, where the original BigData01 machine is cloned, and in the virtual machine menu, select the clone command under the management menu.

Select "create full Clone", the virtual machine name is BigData02, select the virtual machine file save path, and clone.

Clone a virtual machine named BigData03 again.

two。 Configure the network

Modify the name of the network card

Edit the network card information on the BigData02 and BigData03 machines. Execute the sudo vim / etc/udev/rules.d/70-persistent-net.rules command. Because it is cloned from the BigData01 machine, the network card eth0 of BigData01 is retained and another network card eth2 is added. And the Mac address of eth0 is the same as that of BigData01, and the Mac address is not allowed to be the same, so to delete eth0, only the eth2 network card is retained, and the eth2 is renamed to eth0. Copy the mac address of the modified eth0 and modify the HWADDR property in the network-scripts file.

Vim / etc/sysconfig/network-scripts/ifcfg-eth0

Modify network parameters:

BigData02 machine IP changed to 172.18.74.173

BigData03 machine IP changed to 172.18.74.174

!! If you build a Hadoop environment in a real server, you need to build two more virtual machines. Follow the above steps to do it again. It is not ideal to clone the host in the server!

3. Configure Hostname and hosts

BigData02 configures hostname to bigdata-02.liu.com

BigData03 configures hostname to bigdata-03.liu.com

The hosts of BigData01, BigData02 and BigData03 machines are all configured as follows:

172.18.74.172 bigdata-01.liu.com

172.18.74.173 bigdata-02.liu.com

172.18.74.174 bigdata-03.liu.com

4. Configure the SSH client on Windows

Add SSH links to BigData02 and BigData03 machines on the SSH client in the local Windows

Second, server function planning

Bigdata-01.liu.combigdata-02.liu.combigdata-03.liu.comNameNodeResourceManageSecondaryNameNodeDataNodeDataNodeDataNodeNodeManagerNodeManagerNodeManagerHistoryServer

Install a new Hadoop on the first machine

Create a hadoop directory

Mkdir-p / opt/modules/app

Import the Hadoop package and extract it to the Hadoop directory

Tar-zxf / opt/sofeware/hadoop-2.7.4-with-centos-6.7.tar.gz-C / opt/modules/app/

Change to the / opt/modules/app/hadoop-2.7.4/etc/hadoop directory, and configure the Hadoop JDK path in this directory to modify the JDK path in the hadoop-env.sh, mapred-env.sh, and yarn-env.sh files:

Export JAVA_HOME= "/ opt/modules/jdk1.8.0_181"

Configure core-site.xml

Vim core-site.xml

Fs.defaultFS

Hdfs://bigdata-01.liu.com:8020

Hadoop.tmp.dir

/ opt/modules/app/hadoop-2.7.4/data/tmp

Dfs.namenode.name.dir

File://${hadoop.tmp.dir}/dfs/name

Dfs.datanode.data.dir

File://${hadoop.tmp.dir}/dfs/data

Fs.defaultFS is the address of NameNode.

Hadoop.tmp.dir is the address of the hadoop temporary directory, and by default, the data files for NameNode and DataNode will be stored in the corresponding subdirectory in this directory. You should make sure that this directory exists, and if not, create it first.

Configure hdfs-site.xml

Vim hdfs-site.xml

Dfs.namenode.secondary.http-address

Bigdata-03.liu.com:50090

Dfs.namenode.secondary.http-address is the http access address and port number of the specified secondaryNameNode, because in the plan, we plan to plan BigData03 as a SecondaryNameNode server, so set it here: bigdata-03.liu.com:50090

Configure slaves

Vim slaves

Bigdata-01.liu.com

Bigdata-02.liu.com

Bigdata-03.liu.com

The slaves file specifies which DataNode nodes are on the HDFS

Configure yarn-site.xml

Vim yarn-site.xml

Yarn.nodemanager.aux-services

Mapreduce_shuffle

Yarn.resourcemanager.hostname

Bigdata-02.liu.com

Yarn.log-aggregation-enable

True

Yarn.log-aggregation.retain-seconds

106800

According to the plan yarn.resourcemanager.hostname, the designated resourcemanager server points to bigdata-02.liu.com.

Yarn.log-aggregation-enable is to configure whether log aggregation is enabled.

Yarn.log-aggregation.retain-seconds is to configure how long the aggregated logs can be kept on the HDFS.

Configure mapred-site.xml

Copy a mapred-site.xml file from mapred-site.xml.template

Cp mapred-site.xml.template mapred-site.xml

Vim mapred-site.xml

Mapreduce.framework.name

Yarn

Mapreduce.jobhistory.address

Bigdata-01.liu.com:10020

Mapreduce.jobhistory.webapp.address

Bigdata-01.liu.com:19888

Mapreduce.framework.name sets the mapreduce task to run on yarn

Mapreduce.jobhistory.address is the history server that sets up mapreduce and is installed on the BigData01 machine

Mapreduce.jobhistory.webapp.address is the web page address and port number of the setting history server

4. Set SSH login without password

The machines in the Hadoop cluster will access each other through SSH, so it is unrealistic to enter a password for each access, so it is necessary to configure that the SSH between the machines is logged in without a password.

Generate a public key on BigData01

Ssh-keygen-t rsa

Enter all the way to the default value, and then the public key file (id_rsa.pub) and private key file (id_rsa) are generated in the .ssh directory of the current user's Home directory. If the .ssh file is hidden, ls-a will be displayed in the home directory.

Distribute the public key

Ssh-copy-id bigdata-01.liu.com

Ssh-copy-id bigdata-02.liu.com

Ssh-copy-id bigdata-03.liu.com

5. Distribute Hadoop documents

First create a directory for Hadoop on the other two machines

Mkdir-p / opt/modules/app

Distributed through Scp

Du-sh / opt/modules/app/hadoop-2.7.4/share/doc

Scp-r / opt/modules/app/hadoop-2.7.4/ bigdata-02.liu.com:/opt/modules/app

Scp-r / opt/modules/app/hadoop-2.7.4/ bigdata-03.liu.com:/opt/modules/app

Format NameNode

Perform formatting on the NameNode machine bigdata-01:

/ opt/modules/app/hadoop-2.7.4/bin/hdfs namenode-format

After formatting, a current directory is generated under the / opt/modules/app/hadoop-2.7.4/data/tmp/dfs/data/ directory, which contains a series of files.

Note:

If you need to reformat NameNode, you need to delete all the files under the original NameNode and DataNode, otherwise an error will be reported. The directories where NameNode and DataNode are located are configured in the attributes hadoop.tmp.dir, dfs.namenode.name.dir and dfs.datanode.data.dir in core-site.xml.

Because for each format, a cluster ID is created by default and written to the VERSION files of NameNode and DataNode (the directories of VERSION files are dfs/name/current and dfs/data/current). When reformatting, a new cluster ID is generated by default. If the original directory is not deleted, it will result in the new cluster ID in the VERSION file in namenode and the old cluster ID in DataNode, and an error will be reported in case of inconsistency.

Start the cluster

Change to / opt/modules/app/hadoop-2.7.4 directory

Start HDFS

/ opt/modules/app/hadoop-2.7.4/sbin/start-dfs.sh

Jps to view services that have been started

Start YARN

/ opt/modules/app/hadoop-2.7.4/sbin/start-yarn.sh

You can also use this command to get there in one step.

/ opt/modules/app/hadoop-2.7.4/sbin/start-all.sh

Start ResourceManager on BigData02:

Sbin/yarn-daemon.sh start resourcemanager

Start the log server

Since we are planning to run the MapReduce logging service on the BigData03 server, we will start it on BigData03

/ opt/modules/app/hadoop-2.7.4/sbin/mr-jobhistory-daemon.sh start historyserver

View the HDFS Web page

Http://bigdata-01.liu.com:50070/

If the domain name is not resolved, you can enter the ip+ port in the search bar, such as:

172.18.74.172:50070

View the YARN Web page

Http://bigdata-02.liu.com:8088/cluster

172.18.74.173:8088/cluster

All right, this is where the Hadoop cluster environment is built. It is suggested that when building the Hadoop environment, Ha friends should first understand what the various components are, which is more conducive to the smooth completion of the Hadoop environment. Some of my classmates learned this relatively early, and at that time no one understood what Hadoop was. I listened to them say that it took 10 days and a half months to build it. I gradually learned a little about Hadoop under their influence, and it took me more than a day to build it. Many unexpected situations have been encountered in the process of building, so if there is no unexpected situation in the process of building, plus knowing in advance, it can be done in half a day, never give up when you encounter difficulties, and you will win if you persist!

. . . Once upon a time, the chariot and horse was very slow, the letter was far away, and there was only one person I could love in my life!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.