How to install Hadoop Cluster in CentOS 7 07/03 Update SLTechnology News&Howtos

How to install Hadoop Cluster in CentOS 7

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about how to install Hadoop clusters in CentOS 7. Many people may not know much about it. In order to make you understand better, the editor has summarized the following for you. I hope you can get something according to this article.

I. hardware environment

The hardware I use is a minicloud device from Yunchuang. It consists of three nodes (8GB memory of each node + 128GB SSD+3 block 3TB SATA) and a gigabit switch.

Second, prepare before installation

1. To create a new hadoop user under CentOS 7, it is officially recommended that hadoop, mapreduce and yarn are installed by different users. Here I installed them all under hadoop users in order to save trouble.

two。 Download the installation package:

1) JDK:jdk-8u112-linux-x64.rpm

Download address: http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

2) Hadoop-2.7.3:hadoop-2.7.3.tar.gz

Download address: http://archive.apache.org/dist/hadoop/common/stable2/

3. Uninstall the OpenJDK that comes with CentOS 7 (under root permissions)

1) check the existing openjdk of the system first.

Rpm-qa | grep jdk

See the following results:

[hadoop@localhost Desktop] $rpm-qa | grep jdkjava-1.7.0-openjdk-1.7.0.111-2.6.7.2.el7_2.x86_64java-1.8.0-openjdk-headless-1.8.0.101-3.b13.el7_2.x86_64java-1.8.0-openjdk-1.8.0.101-3.b13.el7-2.x86-64javaMurray 1.7.0Movjdkmuri headlessly1.7.0.111- 2.6.7.2.el7_2.x86_64

2) Uninstall the openjdk package found above

Yum-y remove java-1.7.0-openjdk-1.7.0.111-2.6.7.2.el7_2.x86_64yum-y remove java-1.8.0-openjdk-headless-1.8.0.101-3.b13.el7_2.x86_64yum-y remove java-1.8.0-openjdk-1.8.0.101-3.b13.el7_2.x86_64yum-y remove java-1.7.0-openjdk-headless-1. 7.0.111-2.6.7.2.el7_2.x86_64

4. Install Oracle JDK (under root permissions)

Rpm-ivh jdk-8u112-linux-x64.rpm

After installation, the path to jdk is / usr/java/jdk1.8.0_112

Then add the path of the installed jdk to the system environment variable:

Vi / etc/profile

Add the following at the end of the file:

Export JAVA_HOME=/usr/java/jdk1.8.0_112export JRE_HOME=/usr/java/jdk1.8.0_112/jreexport PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/binexport CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

Close the profile file and execute the following command to make the configuration effective:

Source / etc/profile

At this point, we can check whether the jdk path is configured successfully with the java-version command, as shown below:

[root@localhost jdk1.8.0_112] # java-versionjava version "1.8.0mm 112" Java (TM) SE Runtime Environment (build 1.8.0_112-b15) Java HotSpot (TM) 64-Bit Server VM (build 25.112-b15, mixed mode) [root@localhost jdk1.8.0_112] #

5. Turn off the firewall (under root permission)

Execute the following command to turn off the firewall:

Systemctl stop firewalld.service systemctl disable firewalld.service

The terminal effect is as follows:

[root@localhost Desktop] # systemctl stop firewalld.service [root@localhost Desktop] # systemctl disable firewalld.serviceRemoved symlink / etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.Removed symlink / etc/systemd/system/basic.target.wants/firewalld.service. [root@localhost Desktop] #

6. Modify the hostname and configure the related network (under root permission)

1) modify the hostname

On the master host

Hostnamectl set-hostname Master

On the slave1 host

Hostnamectl set-hostname slave1

On the slave2 host

Hostnamectl set-hostname slave2

2) configure the network

Take the master host as an example to demonstrate how to configure a static network and host files.

My machine has two network cards per node, and I configure one of them as a static IP for node internal communication.

Vi / etc/sysconfig/network-scripts/ifcfg-enp7s0

(note: the name of network card to be configured on my master machine is ifcfg-enp7s0)

The original content of ifcfg-enp7s0 is as follows:

TYPE=EthernetBOOTPROTO=dhcpDEFROUTE=yesPEERDNS=yesPEERROUTES=yesIPV4_FAILURE_FATAL=noIPV6INIT=yesIPV6_AUTOCONF=yesIPV6_DEFROUTE=yesIPV6_PEERDNS=yesIPV6_PEERROUTES=yesIPV6_FAILURE_FATAL=noNAME=enp7s0UUID=914595f1-e6f9-4c9b-856a-c4bd79ffe987DEVICE=enp7s0ONBOOT=no

Modified to:

TYPE=EthernetONBOOT=yesDEVICE=enp7s0UUID=914595f1-e6f9-4c9b-856a-c4bd79ffe987BOOTPROTO=staticIPADDR=59.71.229.189GATEWAY=59.71.229.254DEFROUTE=yesIPV6INIT=noIPV4_FAILURE_FATAL=yes

3) modify / etc/hosts file

Vi / etc/hosts

Add the following:

59.71.229.189 master59.71.229.190 slave159.71.229.191 slave2

Perform the above network configuration and hosts file configuration for all nodes in the cluster.

7. Configure cluster node SSH password-free login (under hadoop permission)

Here, for convenience, any node in the configured cluster can log in to any other node in the cluster without SSH password. The specific steps are as follows:

1) for each machine, execute the following instructions under the hadoop user:

Ssh-keygen-t rsa-P''

Press Enter directly to the end.

2) for each machine, first add your own public key to the authorized_keys to ensure that ssh localhost logs in without a password:

Cat id_rsa.pub > > authorized_keys

3) then add your own public key to the authorized_keys of each other machine, and you need to enter the password of the other machine in the process:

Master:

Scp / home/hadoop/.ssh/id_rsa.pub hadoop@slave1:/home/hadoop/.ssh/id_rsa_master.pubscp / home/hadoop/.ssh/id_rsa.pub hadoop@slave2:/home/hadoop/.ssh/id_rsa_master.pub

Slave1:

Scp / home/hadoop/.ssh/id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa_slave1.pubscp / home/hadoop/.ssh/id_rsa.pub hadoop@slave2:/home/hadoop/.ssh/id_rsa_slave1.pub

Slave2:

Scp / home/hadoop/.ssh/id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa_slave2.pubscp / home/hadoop/.ssh/id_rsa.pub hadoop@slave1:/home/hadoop/.ssh/id_rsa_slave2.pub

4) go to the / home/hadoop/.ssh/ directory of each host, and use the cat command to add public keys other than the local public key (id_rsa.pub) to the authorized_keys. After adding, use the chmod command to set permissions to the authorized_keys file, and then use the rm command to delete all public keys:

Master:

Cat id_rsa_slave1.pub > > authorized_keyscat id_rsa_slave2.pub > > authorized_keyschmod 600 authorized_keysrm id_rsa*.pub

Slave1:

Cat id_rsa_master.pub > > authorized_keyscat id_rsa_slave2.pub > > authorized_keyschmod 600 authorized_keysrm id_rsa*.pub

Slave2:

Cat id_rsa_master.pub > > authorized_keyscat id_rsa_slave1.pub > > authorized_keyschmod 600 authorized_keysrm id_rsa*.pub

By completing the above steps, you can log in to any other machine without a password from any machine through the ssh command.

Install and configure Hadoop (the following steps are performed under the hadoop user)

1. Extract the hadoop-2.7.3.tar.gz file to the / home/hadoop/ directory (in this document, the location of the file is on the desktop under the hadoop account), you can first extract it to the location of the file with the following command:

Tar-zxvf hadoop-2.7.3.tar.gz

Then copy all the contents of the extracted file hadoop-2.7.3 to the / home/hadoop directory, and delete the hadoop folder where the file is located:

Cp-r / home/hadoop/Desktop/hadoop-2.7.3 / home/hadoop/

two。 Specific configuration process:

1) on master, first create the following directory under the / home/hadoop/ directory:

Mkdir-p / home/hadoop/hadoopdir/namemkdir-p / home/hadoop/hadoopdir/datamkdir-p / home/hadoop/hadoopdir/tempmkdir-p / home/hadoop/hadoopdir/logsmkdir-p / home/hadoop/hadoopdir/pids

2) then copy the hadoopdir directory to another node through the scp command:

Scp-r / home/hadoop/hadoopdir hadoop@slave1:/home/hadoop/scp-r / home/hadoop/hadoopdir hadoop@slave2:/home/hadoop/

3) go to the / home/hadoop/hadoop-2.7.3/etc/hadoop directory and modify the following files:

Hadoop-env.sh:

Export JAVA_HOME=/usr/java/jdk1.8.0_112export HADOOP_LOG_DIR=/home/hadoop/hadoopdir/logsexport HADOOP_PID_DIR=/home/hadoop/hadoopdir/pids

Mapred-env.sh:

Export JAVA_HOME=/usr/java/jdk1.8.0_112export HADOOP_MAPRED_LOG_DIR=/home/hadoop/hadoopdir/logsexport HADOOP_MAPRED_PID_DIR=/home/hadoop/hadoopdir/pids

Yarn-env.sh:

Export JAVA_HOME=/usr/java/jdk1.8.0_112YARN_LOG_DIR=/home/hadoop/hadoopdir/logs

Slaves file:

# localhostslave1slave2

(note: if localhost is not commented in the slaves file, it means that the machine is also used as a DataNode node.)

Core-site.xml:

Fs.defaultFS hdfs://master:9000 io.file.buffer.size 131072 hadoop.tmp.dir file:///home/hadoop/hadoopdir/temp

Hdfs-site.xml:

Dfs.namenode.name.dir file:///home/hadoop/hadoopdir/name dfs.datanode.data.dir file:///home/hadoop/hadoopdir/data dfs.replication 2 dfs.blocksize 64m Dfs.namenode.secondary.http-address master:9001 dfs.webhdfs.enabled true

Mapred-site.xml:

Cp mapred-site.xml.template mapred-site.xmlvi mapred-site.xml mapreduce.framework.name yarn true mapreduce.jobhistory.address master:10020 mapreduce.jobtracker.http.address master:50030 Mapred.job.tracker http://master:9001 mapreduce.jobhistory.webapp.address master:19888

Yarn-site.xml:

Yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce_shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.resourcemanager.hostname master yarn.resourcemanager.scheduler.address master:8030 yarn.resourcemanager.resource-tracker.address master:8031 yarn.resourcemanager.address master:8032 yarn.resourcemanager.admin.address master:8033 yarn.resourcemanager.webapp.address master:8088

4) under the master machine, copy all the contents in the / home/hadoop/hadoop-2.7.3 directory to other nodes

Scp-r / home/hadoop/hadoop-2.7.3 hadoop@slave1:/home/hadoop/scp-r / home/hadoop/hadoop-2.7.3 hadoop@slave2:/home/hadoop/

5) enter the / home/hadoop/hadoop-2.7.3/bin directory and format the file system:

. / hdfs namenode-format

Formatting the file system produces a series of terminal output. Seeing STATUS=0 in the last few lines of the output indicates that the format was successful. If the format fails, please check the log in detail to determine the cause of the error.

6) enter the / home/hadoop/hadoop-2.7.3/sbin directory:

. / start-dfs.sh./start-yarn.sh

The above command starts hdfs and yarn. The hadoop cluster is running. If you want to shut it down, execute the following command in the sbin directory:

. / stop-yarn.sh./stop-dfs.sh

7) HDFS startup example

After executing start-dfs.sh, you can see the following results on the master:50070 page, and you can see the cluster information and datanode-related information:

After executing the start-yarn.sh, you can see the following results on the master:8088 page, and you can see the cluster information:

After reading the above, do you have any further understanding of how to install Hadoop clusters in CentOS 7? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.