How to configure the environment of Hadoop2.X 09/23 Update SLTechnology News&Howtos

How to configure the environment of Hadoop2.X

2025-09-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to configure the environment of Hadoop2.X". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to configure the environment of Hadoop2.X.

I. preparation before installation 1.1 modify the host name

Enter the Linux system to view the host name of this machine. View through the hostname command.

[root@localhost ~] # hostname

Localhost.localdomain

If you need to modify the hostname at this time, you can modify it as follows

Example 1: temporarily change the host name to Hadoop01 hostname hadoop01 and expire after restart

Example 2: permanently change the host name to Hadoop01

Vi / etc/sysconfig/network

NETWORKING=yes

HOSTNAME=hadoop01

After modifying the host name, you need to edit the / etc/hosts file and map the host name to the IP address

Vi / etc/hosts adds the following to the file

192.168.1.128 hadoop01 # address is the ip of your host name

1.2 turn off the firewall

1) service iptables stop turns off the firewall

2) chkconfig iptables off permanently shuts down the firewall and starts

3) chkconfig iptables-- list to check the firewall boot status

1.3 Planning the software installation directory

1) create the save directory of the installation package, and the installation directory

Mkdir-p / opt/software saves the installation package of the software

Installation path of mkdir-p / opt/app software

1.4 create Hadoop users and grant sudo permissions

1) create a Hadoop user, and all subsequent operations are done under the hadoop user

Useradd hadoop creates hadoop users

Passwd hadoop sets the password for hadoop users

2) grant sudo permissions to hadoop users

Under root, execute the visudo command to edit the following

Allow root to run any commands anywhere root ALL= (ALL) ALL

Hadoop ALL= (ALL) ALL sets sudo permissions for hadoop users

Same thing without a password% wheel ALL= (ALL) NOPASSWD: ALL hadoop ALL= (ALL) NOPASSWD: ALL. Set password-free sudo permissions for hadoop users

1.5 install the JDK environment

First upload the installation package of jdk to the software folder, and then install it

1) decompress sudo tar-zvxf jdk-8u181-linux-x64.tar.gz

2) configure JDK environment variables

First get the installation path of JDK

[hadoop@hadoop01 jdk1.8.0_181] $pwd

/ opt/soft/jdk1.8.0_181

Next, open the / etc/profile file to set the environment variable.

Vi / etc/profile

Add the jdk path at the end of the profie file:

# JAVA_HOME

Export JAVA_HOME=/data/jdk1.8.0_181

Export PATH=$JAVA_HOME/bin:$PATH

Launched after saving: wq

After the above operations are completed, the configuration file does not take effect immediately. You need to use the following command to make the configuration file take effect immediately.

[hadoop@hadoop01 jdk1.8.0_181] $source / etc/profile

Then test whether the JDK environment variable is configured successfully, and use the following command to output the version information of JDK

[hadoop@hadoop01 jdk1.8.0_181] $java-version

Java version "1.8.0,181"

Java (TM) SE Runtime Environment (build 1.8.0_181-b13)

Java HotSpot (TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

If you can see the above information, it means that the environment variable of JDK has been configured successfully.

1.6 install the Hadoop environment

1) go to the installation path of the package:

[hadoop@hadoop01 /] $cd / opt/soft/

Extract the hadoop installation package

[hadoop@hadoop01 soft] $sudo tar-zvxf hadoop-2.7.2.tar.gz

After the decompression is successful, the installation directory structure of hadoop is as follows:

The most basic bin:Hadoop management scripts and the directories where the scripts are used. These scripts are the basic implementation of the management scripts under the sbin directory. Users can directly use these scripts to manage and use hadoop.

The directory where the etc:Hadoop configuration files are located, including core-site.xml, hdfs-site.xml, mapred-site.xml and other configuration files inherited from hadoop1.0 and hadoop 2.0 new configuration files such as yarn-site.xml

Include: externally provided programming cool header files (specific dynamic and static libraries are in the lib directory). These header files are defined by C++ and are usually used for C++ programs to access hdfs or write mapreduce programs.

Lib: this directory contains the programming dynamic and static libraries provided by Hadoop, used in conjunction with the header files in the include directory.

Libexec: the directory where the shell configuration file for each service resides, which can be used to configure the log output directory, startup parameters (such as JVM parameter) and other basic information.

The directory where sbin:Hadoop management scripts are located, which mainly contains startup / shutdown scripts for various services in HDFS and YARN

The directory where the compiled jar package of each module of share:Hadoop is located

2) configure hadoop environment

Hadoop requires that all the related files that we configure are stored in the $HADOOP_HOME/etc/hadoop directory. Enter this directory first.

[hadoop@hadoop01 hadoop] $cd etc/hadoop/

After entering the directory, use the ls command to view the file information in the directory.

-rw-r--r--. 1 root root 4436 May 22 2017 capacity-scheduler.xml

-rw-r--r--. 1 root root 1335 May 22 2017 configuration.xsl

-rw-r--r--. 1 root root 318 May 22 2017 container-executor.cfg

-rw-r--r--. 1 root root 774 May 22 2017 core-site.xml

-rw-r--r--. 1 root root 3670 May 22 2017 hadoop-env.cmd

-rw-r--r--. 1 root root 4224 May 22 2017 hadoop-env.sh

-rw-r--r--. 1 root root 2598 May 22 2017 hadoop-metrics2.properties

-rw-r--r--. 1 root root 2490 May 22 2017 hadoop-metrics.properties

-rw-r--r--. 1 root root 9683 May 22 2017 hadoop-policy.xml

-rw-r--r--. 1 root root 775 May 22 2017 hdfs-site.xml

-rw-r--r--. 1 root root 1449 May 22 2017 httpfs-env.sh

-rw-r--r--. 1 root root 1657 May 22 2017 httpfs-log4j.properties

-rw-r--r--. 1 root root 21 May 22 2017 httpfs-signature.secret

-rw-r--r--. 1 root root 620 May 22 2017 httpfs-site.xml

-rw-r--r--. 1 root root 3518 May 22 2017 kms-acls.xml

-rw-r--r--. 1 root root 1527 May 22 2017 kms-env.sh

-rw-r--r--. 1 root root 1631 May 22 2017 kms-log4j.properties

-rw-r--r--. 1 root root 5511 May 22 2017 kms-site.xml

-rw-r--r--. 1 root root 11237 May 22 2017 log4j.properties

-rw-r--r--. 1 root root 951 May 22 2017 mapred-env.cmd

-rw-r--r--. 1 root root 1383 May 22 2017 mapred-env.sh

-rw-r--r--. 1 root root 4113 May 22 2017 mapred-queues.xml.template

-rw-r--r--. 1 root root 758 May 22 2017 mapred-site.xml.template

-rw-r--r--. 1 root root 10 May 22 2017 slaves

-rw-r--r--. 1 root root 2316 May 22 2017 ssl-client.xml.example

-rw-r--r--. 1 root root 2268 May 22 2017 ssl-server.xml.example

-rw-r--r--. 1 root root 2250 May 22 2017 yarn-env.cmd

-rw-r--r--. 1 root root 4567 May 22 2017 yarn-env.sh

-rw-r--r--. 1 root root 690 May 22 2017 yarn-site.xml

First of all, we can find that the current permissions of these files are based on root users, but we are now using hadoop users, and hadoop users do not have the permissions for these files, so we need to modify the permissions first.

Use the chown command to modify user and user group permissions

Sudo chown-R hadoop:hadoop / opt/soft/hadoop-2.7.2/

After the modification is completed, use the ls command to view the file again to see if the modification was successful [hadoop@hadoop01 hadoop] $ll.

-rw-r--r--. 1 hadoop hadoop 4436 May 22 2017 capacity-scheduler.xml

-rw-r--r--. 1 hadoop hadoop 1335 May 22 2017 configuration.xsl

-rw-r--r--. 1 hadoop hadoop 318 May 22 2017 container-executor.cfg

-rw-r--r--. 1 hadoop hadoop 774 May 22 2017 core-site.xml

-rw-r--r--. 1 hadoop hadoop 3670 May 22 2017 hadoop-env.cmd

-rw-r--r--. 1 hadoop hadoop 4224 May 22 2017 hadoop-env.sh

-rw-r--r--. 1 hadoop hadoop 2598 May 22 2017 hadoop-metrics2.properties

-rw-r--r--. 1 hadoop hadoop 2490 May 22 2017 hadoop-metrics.properties

-rw-r--r--. 1 hadoop hadoop 9683 May 22 2017 hadoop-policy.xml

-rw-r--r--. 1 hadoop hadoop 775 May 22 2017 hdfs-site.xml

-rw-r--r--. 1 hadoop hadoop 1449 May 22 2017 httpfs-env.sh

-rw-r--r--. 1 hadoop hadoop 1657 May 22 2017 httpfs-log4j.properties

-rw-r--r--. 1 hadoop hadoop 21 May 22 2017 httpfs-signature.secret

-rw-r--r--. 1 hadoop hadoop 620 May 22 2017 httpfs-site.xml

-rw-r--r--. 1 hadoop hadoop 3518 May 22 2017 kms-acls.xml

-rw-r--r--. 1 hadoop hadoop 1527 May 22 2017 kms-env.sh

-rw-r--r--. 1 hadoop hadoop 1631 May 22 2017 kms-log4j.properties

-rw-r--r--. 1 hadoop hadoop 5511 May 22 2017 kms-site.xml

-rw-r--r--. 1 hadoop hadoop 11237 May 22 2017 log4j.properties

-rw-r--r--. 1 hadoop hadoop 951 May 22 2017 mapred-env.cmd

-rw-r--r--. 1 hadoop hadoop 1383 May 22 2017 mapred-env.sh

-rw-r--r--. 1 hadoop hadoop 4113 May 22 2017 mapred-queues.xml.template

-rw-r--r--. 1 hadoop hadoop 758 May 22 2017 mapred-site.xml.template

-rw-r--r--. 1 hadoop hadoop 10 May 22 2017 slaves

-rw-r--r--. 1 hadoop hadoop 2316 May 22 2017 ssl-client.xml.example

-rw-r--r--. 1 hadoop hadoop 2268 May 22 2017 ssl-server.xml.example

-rw-r--r--. 1 hadoop hadoop 2250 May 22 2017 yarn-env.cmd

-rw-r--r--. 1 hadoop hadoop 4567 May 22 2017 yarn-env.sh

-rw-r--r--. 1 hadoop hadoop 690 May 22 2017 yarn-site.xml

By looking again, we find that all the permissions have been changed to hadoop, so that we can manipulate these files under the hadoop user.

After completing the above configuration, we first need to configure the following files

Hadoop-env.sh: environment variable profile for hadoop

# The java implementation to use.

Export JAVA_HOME=/opt/soft/jdk1.8.0_181

After you find the above content in the configuration file and modify JAVA_HOME to configure your own JDK path, you can enter the following command under the hadoop root path

Bin/hadoop

Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]

CLASSNAME run the class named CLASSNAME

Where COMMAND is one of:

Fs run a generic filesystem user client

Version print the version

Jar run a jar file

Note: please use "yarn jar" to launch

YARN applications, not this command.

Checknative [- a |-h] checknative hadoop and compression libraries availability

Distcp copy file or directories recursively

Archive-archiveName NAME-p * create a hadoop archive

Classpath prints the classpath needed to get the

Credential interact with credential providers

Hadoop jar and the required libraries

Daemonlog get/set the log level for each daemon

Trace view and modify Hadoop tracing settings

Most commands print help when invoked w/o parameters.

If you can see the information above, it means that the basic operating environment has been built.

Second, Hadoop operation mode

The operation modes of Hadoop are as follows:

1) Local mode (default mode)

There is no need to enable separate processes, which can be used for running, testing and development.

2) pseudo distribution pattern

It is equivalent to fully distributed, with only one node.

3) fully distributed mode

Multiple nodes run together.

2.1 run the Hadoop official case Grep locally

In this case, the main function is to match the specified regular expression in a pile of files and count the number of successful words.

$mkdir input

$cp etc/hadoop/*.xml input

$bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs [a murz.] +'

$cat output/*

The above is the case code given on the official website

As can be seen from the above case code, first you need to create a directory to store the files that need statistics, but you do not need to create a directory to save the statistical results before. Note: the directory that outputs the results in Hadoop cannot exist in advance.

Example: run a grep case

1) create a folder input under the hadoop root directory

[hadoop@hadoop01 hadoop-2.7.2] $mkdir input

2) copy the xml configuration file of hadoop to input

[hadoop@hadoop01 hadoop-2.7.2] $cp etc/hadoop/*.xml input/

3) execute the mapreduce program in share directory

[hadoop@hadoop01 hadoop-2.7.2] $bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs [a Murray z.] +'

4) View the output result

[hadoop@hadoop01 hadoop-2.7.2] $cat output/*

1 dfsadmin

2.2 run an official wordcount case

1) create a wcinput directory in the hadoop root directory to save statistics files

[hadoop@hadoop01 hadoop-2.7.2] $mkdir wcinput

2) create a wordcount.txt file under the wcinput file

[hadoop@hadoop01 wcinput] $vi worldcount.txt

Hello java world input

Hadoop hive zookeeper java

World input hello hadoop

Hbase zookeeper sqoop

3) execute the wordcount case program

[hadoop@hadoop01 hadoop-2.7.2] $bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount wcinput wcoutput

4) View the results

[hadoop@hadoop01 hadoop-2.7.2] $cat wcoutput/part-r-00000

Hadoop 2

Hbase 1

Hello 2

Hive 1

Input 2

Java 2

Sqoop 1

World 2

Zookeeper 2

In the above way, the most basic environment of Hadoop can be built, as well as some cases of running Hadoop.

At this point, I believe you have a deeper understanding of "how to configure the environment of Hadoop2.X". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.