How to install Hadoop2.7 in CentOS 7 07/06 Update SLTechnology News&Howtos

How to install Hadoop2.7 in CentOS 7

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

Editor to share with you how to install CentOS 7 Hadoop2.7, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

The general idea is to prepare the master-slave server, configure the master server to log in to the slave server without password SSH, decompress and install JDK, decompress and install Hadoop, and configure hdfs, mapreduce and other master-slave relationships.

1. Environment, 3 CentOS7, 64-bit, Hadoop2.7 requires 64-bit Linux,CentOS7 Minimal ISO file is only 600m, the operating system can be installed in more than ten minutes

Master 192.168.0.182

Slave1 192.168.0.183

Slave2 192.168.0.184

2. SSH password-free login, because Hadoop needs to log in to each node through SSH to operate. I use root users, and each server generates a public key, which is then merged into authorized_keys.

(1) CentOS does not start ssh secret login by default. Remove the comment on 2 lines of / etc/ssh/sshd_config and set it on each server.

# RSAAuthentication yes

# PubkeyAuthentication yes

(2) enter the command, ssh-keygen-t rsa, generate key, do not enter a password, and press enter all the time. / root will generate a .ssh folder, which needs to be set on each server.

(3) merge the public key into the authorized_keys file, on the Master server, enter the / root/.ssh directory, and merge through the SSH command

Cat id_rsa.pub > > authorized_keys

Ssh root@192.168.0.183 cat ~ / .ssh/id_rsa.pub > > authorized_keys

Ssh root@192.168.0.184 cat ~ / .ssh/id_rsa.pub > > authorized_keys

(4) copy the authorized_keys and known_hosts of Master server to the / root/.ssh directory of Slave server

(5) after completion, ssh root@192.168.0.183 and ssh root@192.168.0.184 do not need to enter passwords.

3. JDK7 is required to install JDK,Hadoop2.7. Since my CentOS installation is minimized, there is no OpenJDK. Just extract the downloaded JDK and configure the variables.

(1) download "jdk-7u79-linux-x64.gz" and put it in the / home/java directory

(2) decompress, enter the command, tar-zxvf jdk-7u79-linux-x64.gz

(3) Editing / etc/profile

Export JAVA_HOME=/home/java/jdk1.7.0_79

Export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

Export PATH=$PATH:$JAVA_HOME/bin

(4) to make the configuration effective, enter the command, source / etc/profile

(5) enter the command, java-version, complete

4. Install Hadoop2.7, extract it from the Master server, and then copy it to the Slave server.

(1) download "hadoop-2.7.0.tar.gz" and put it in the / home/hadoop directory

(2) decompress, enter the command, tar-xzvf hadoop-2.7.0.tar.gz

(3) create a folder for data storage under the / home/hadoop directory, tmp, hdfs, hdfs/data, hdfs/name

5. Configure the core-site.xml under the / home/hadoop/hadoop-2.7.0/etc/hadoop directory

Fs.defaultFS

Hdfs://192.168.0.182:9000

Hadoop.tmp.dir

File:/home/hadoop/tmp

Io.file.buffer.size

131702

6. Configure the hdfs-site.xml under the / home/hadoop/hadoop-2.7.0/etc/hadoop directory

Dfs.namenode.name.dir

File:/home/hadoop/dfs/name

Dfs.datanode.data.dir

File:/home/hadoop/dfs/data

Dfs.replication

two

Dfs.namenode.secondary.http-address

192.168.0.182:9001

Dfs.webhdfs.enabled

True

7. Configure the mapred-site.xml under the / home/hadoop/hadoop-2.7.0/etc/hadoop directory

Mapreduce.framework.name

Yarn

Mapreduce.jobhistory.address

192.168.0.182:10020

Mapreduce.jobhistory.webapp.address

192.168.0.182:19888

8. Configure the mapred-site.xml under the / home/hadoop/hadoop-2.7.0/etc/hadoop directory

Yarn.nodemanager.aux-services

Mapreduce_shuffle

Yarn.nodemanager.auxservices.mapreduce.shuffle.class

Org.apache.hadoop.mapred.ShuffleHandler

Yarn.resourcemanager.address

192.168.0.182:8032

Yarn.resourcemanager.scheduler.address

192.168.0.182:8030

Yarn.resourcemanager.resource-tracker.address

192.168.0.182:8031

Yarn.resourcemanager.admin.address

192.168.0.182:8033

Yarn.resourcemanager.webapp.address

192.168.0.182:8088

Yarn.nodemanager.resource.memory-mb

seven hundred and sixty eight

9. Configure the JAVA_HOME of hadoop-env.sh and yarn-env.sh under the / home/hadoop/hadoop-2.7.0/etc/hadoop directory. If you do not set it, you will not be able to start it.

Export JAVA_HOME=/home/java/jdk1.7.0_79

10. Configure the slaves under the / home/hadoop/hadoop-2.7.0/etc/hadoop directory, delete the default localhost, and add 2 slave nodes

192.168.0.183

192.168.0.184

11. Copy the configured Hadoop to the corresponding location of each node and transmit it through scp

Scp-r / home/hadoop 192.168.0.183:/home/

Scp-r / home/hadoop 192.168.0.184:/home/

12. Start hadoop on the Master server, and the slave node will start automatically and enter the / home/hadoop/hadoop-2.7.0 directory

(1) initialize, enter command, bin/hdfs namenode-format

(2) start sbin/start-all.sh all, or separate sbin/start-dfs.sh and sbin/start-yarn.sh

(3) if you stop, enter the command, sbin/stop-all.sh

(4) enter the command, jps, and you can see the relevant information

13. To access Web, open the port or turn off the firewall directly.

(1) enter the command, systemctl stop firewalld.service

(2) Open http://192.168.0.182:8088/ in browser

(3) Open http://192.168.0.182:50070/ in browser

14. Installation completed. This is only the beginning of big data's application, and then the work is to write a program to call the interface of Hadoop and play the role of hdfs and mapreduce according to his own situation.

Hadoop is a distributed system infrastructure that enables users to develop distributed programs without knowing the underlying details of the distribution.

The important core of Hadoop: HDFS and MapReduce. HDFS is responsible for storage and MapReduce is responsible for computing.

Here are the key points for installing Hadoop:

In fact, it is not troublesome to install Hadoop, mainly need the following advance conditions, if the following advance conditions are done, it is very easy to start according to the official website configuration.

1. The running environment of Java, and the distribution of Sun is recommended.

2. SSH public key authentication

The above environment is settled, and all that is left is the configuration of Hadoop. Different versions of this configuration may be different. Please refer to the official documentation for details.

Environment

Virtual machine: VMWare10.0.1 build-1379776

Operating system: CentOS7 64 bit

Install the Java environment

Download address: http://www.Oracle.com/technetwork/cn/java/javase/downloads/jdk8-downloads-2133151-zhs.html

Select the appropriate download package according to your operating system version. If you support the rpm package, download rpm directly or use the rpm address.

Rpm-ivh http://download.oracle.com/otn-pub/java/jdk/8u20-b26/jdk-8u20-linux-x64.rpm

JDK is constantly updated, so to install the latest version of JDK, you need to go to the official website to get the rpm address of the latest installation package.

Configure SSH public key non-secret authentication

CentOS comes with openssh-server, openssh-clients and rsync by default. If you don't have it on your system, please find your own installation method.

Create a joint account

Create hadoop (self-defined name) accounts on all machines, and set the password to hadoop

Useradd-d / home/hadoop-s / usr/bin/bash-g wheel hadoop

Passwd hadoop

SSH configuration

Vim / etc/ssh/sshd_config

Find the following three configuration items and change them to the following settings. If commented, remove the previous # uncomment to make the configuration effective.

RSAAuthentication yes

PubkeyAuthentication yes

# The default is to check both .ssh / authorized_keys and .ssh / authorized_keys2

# but this is overridden so installations will only check .ssh / authorized_keys

AuthorizedKeysFile .ssh / authorized_keys

.ssh / authorized_keys is the path where the public key is stored.

Key public key generation

Cd ~

Ssh-keygen-t rsa-P''

Save the generated ~ / .ssh/id_rsa.pub file as ~ / .ssh/authorized_keys

Cp / .ssh/id_rsa.pub ~ / .ssh/authorized_keys

Use the scp command to copy the .ssh directory to other machines and lazily make all machines have the same key and share the public key.

Scp / .ssh/* hadoop@slave1:~/.ssh/

Make sure that the access to ~ / .ssh/id_rsa must be 600, which forbids other users to access it.

The above is all the contents of the article "how to install Hadoop2.7 in CentOS 7". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.