Construction of hadoop2.9.1 pseudo-distributed environment and simple operation of file system 07/13 Update SLTechnology News&Howtos

Construction of hadoop2.9.1 pseudo-distributed environment and simple operation of file system

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

1. Prepare

1.1.The virtual machine with centos7 installed on vmware

1.2, system configuration

Configure the network

# vi / etc/sysconfig/network-scripts/ifcfg-ens33

BOOTPROTO=static

ONBOOT=yes

IPADDR=192.168.120.131

GATEWAY=192.168.120.2

NETMASK=255.255.255.0

DNS1=8.8.8.8

DNS2=4.4.4.4

1.3. Configure hostname

# hostnamectl set-hostname master1

# hostname master1

1.4. Specify the time zone (if the time zone is not Shanghai)

# ll / etc/localtime

Lrwxrwxrwx. 1 root root 35 June 4 19:25 / etc/localtime->.. / usr/share/zoneinfo/Asia/Shanghai

If the time zone is wrong, you need to modify the time zone. Method:

# ln-sf / usr/share/zoneinfo/Asia/Shanghai / etc/localtime

1.5. Upload package

Hadoop-2.9.1.tar

Jdk-8u171-linux-x64.tar

2. Start building the environment

2.1. Create users and groups

[root@master1 ~] # groupadd hadoop

[root@master1 ~] # useradd-g hadoop hadoop

[root@master1 ~] # passwd hadoop

2.2. Decompress the package

Switch users

[root@master1 ~] # su hadoop

Create a directory where the package is stored

[hadoop@master1 root] $cd

[hadoop@master1 ~] $mkdir src

[hadoop@master1] $mv * .tar src

Decompression package

[hadoop@master1 ~] $cd src

[hadoop@master1 src] $tar-xf jdk-8u171-linux-x64.tar-C.. /

[hadoop@master1 src] $tar xf hadoop-2.9.1.tar-C.. /

[hadoop@master1 src] $cd

[hadoop@master1 ~] $mv jdk1.8.0_171 jdk

[hadoop@master1 ~] $mv hadoop-2.9.1 hadoop

2.3. Configure environment variables

[hadoop@master1 ~] $vi .bashrc

Export JAVA_HOME=/home/hadoop/jdk

Export JRE_HOME=/$JAVA_HOME/jre

Export CLASSPATH=.:$JAVA_HOME/lib

Export PATH=$PATH:$JAVA_HOME/bin

Export HADOOP_HOME=/home/hadoop/hadoop

Export HADOOP_INSTALL=$HADOOP_HOME

Export HADOOP_MAPRED_HOME=$HADOOP_HOME

Export HADOOP_COMMON_HOME=$HADOOP_HOME

Export HADOOP_HDFS_HOME=$HADOOP_HOME

Export YARN_HOME=$HADOOP_HOME

Export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

Export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Make the configuration file effective

[hadoop@master1 ~] $source .bashrc

Verification

[hadoop@master1 ~] $java-version

Java version "1.8.0,171"

Java (TM) SE Runtime Environment (build 1.8.0_171-b11)

Java HotSpot (TM) 64-Bit Server VM (build 25.171-b11, mixed mode)

[hadoop@master1 ~] $hadoop version

Hadoop 2.9.1

Subversion https://github.com/apache/hadoop.git-r e30710aea4e6e55e69372929106cf119af06fd0e

Compiled by root on 2018-04-16T09:33Z

Compiled with protoc 2.5.0

From source with checksum 7d6d2b655115c6cc336d662cc2b919bd

This command was run using / home/hadoop/hadoop/share/hadoop/common/hadoop-common-2.9.1.jar

2.4.Modification of hadoop configuration file

[hadoop@master1 ~] $cd hadoop/etc/hadoop/

[hadoop@master1 hadoop] $vi hadoop-env.sh

Export JAVA_HOME=/home/hadoop/jdk

[hadoop@master1 hadoop] $vi core-site.xml

Fs.defaultFS

Hdfs://192.168.120.131:9000

Hadoop.tmp.dir

/ data/hadoop/hadoop_tmp_dir

Description:

Fs.defaultFS: this attribute is used to specify the file system communication address of namenode's hdfs protocol. You can specify either a host + port or a namenode service (this service can have multiple namenode implementing ha's namenode service)

The directory of some temporary files stored by the hadoop.tmp.dir:hadoop cluster while it is working

[hadoop@master1 hadoop] $vi hdfs-site.xml

Dfs.replication

one

Description:

Setting for the number of copies of dfs.replication:hdfs. That is, to upload a file, after the split block block, the number of redundant copies of each block. The default configuration is 3.

The configuration of the following parameters will cause the problem that datanode cannot be started, so without configuration, it is not clear how it came into being.

The directory where the dfs.namenode.name.dir:namenode data is stored. This is the directory where namenode metadata is stored, which records the metadata of files in the hdfs system.

The directory where the dfs.datanode.data.dir:datanode data is stored. That is, the directory where the block block is stored.

Abnormal information is posted below.

[hadoop@master1 logs] $pwd

/ home/hadoop/hadoop/logs

[hadoop@master1 logs] $tail-f hadoop-hadoop-datanode-master1.log

2018-06-12 22 DISK 30 15 14 749 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage directory [DISK] file:/data/hadoop/hdfs/dn/

Java.io.IOException: Incompatible clusterIDs in / data/hadoop/hdfs/dn: namenode clusterID = CID-5bbc555b-4622-4781-9a7fmurc2e5131e4869; datanode clusterID = CID-29ec402d-95f8-4148-8d18-f7e4b965be4f

At org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition (DataStorage.java:760)

2018-06-12 22 Datanode Uuid f39576ae-b7af-44aa-841a-48ba03b956f4 30 14 752 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool (Datanode Uuid f39576ae-b7af-44aa-841a-48ba03b956f4) service to master1/192.168.120.131:9000. Exiting.

Java.io.IOException: All specified directories have failed to load.

At org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead (DataStorage.java:557)

2018-06-12 22 Block pool 30 Block pool 14 753 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool (Datanode Uuid f39576ae-b7af-44aa-841a-48ba03b956f4) service to master1/192.168.120.131:9000

2018-06-12 22 30 14 854 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool (Datanode Uuid f39576ae-b7af-44aa-841a-48ba03b956f4)

2018-06-12 22 30 15 15 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode

2018-06-12 22 30 14 16 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:

/ *

SHUTDOWN_MSG: Shutting down DataNode at master1/192.168.120.131

[hadoop@master1 hadoop] $cp mapred-site.xml.template mapred-site.xml

[hadoop@master1 hadoop] $vi mapred-site.xml

Mapreduce.framework.name

Yarn

Description:

Mapreduce.framework.name: specifies that the mr framework is in yarn mode, and the Hadoop second-generation MP is also run on Yarn.

[hadoop@master1 hadoop] $vi yarn-site.xml

Yarn.resourcemanager.hostname

192.168.120.131

Yarn.nodemanager.aux-services

Mapreduce_shuffle

Description:

The IPC address of the yarn.resourcemanager.hostname:yarn general manager, which can be IP or hostname.

Yarn.nodemanager.aux-service: the shuffle service provided by the cluster for MapReduce programs

2.5. Create a directory and grant permissions

[hadoop@master1 hadoop] $exit

[root@master1] # mkdir-p / data/hadoop/hadoop_tmp_dir

[root@master1] # mkdir-p / data/hadoop/hdfs/ {nn,dn}

[root@master1] # chown-R hadoop:hadoop / data

3. Format the file system and start the service

3.1. Format the file system

[root@master1 ~] # su hadoop

[hadoop@master1 ~] $cd hadoop/bin

[hadoop@master1 bin] $. / hdfs namenode-format

Note:

In the case of a clustered environment, HDFS initialization can only run on the primary node

3.2. start HDFS

[hadoop@master1 bin] $cd sbin

[hadoop@master1 sbin] $. / start-dfs.sh

Note:

If it is a cluster environment, it can be run no matter which node in the cluster

If an individual service fails to start and there is no problem with the configuration, it is likely to be the permission problem of the created directory.

3. Start YARN

[hadoop@master1 sbin] $. / start-yarn.sh

Note:

If it is a cluster environment, it can only be run in the primary node.

View service status

[hadoop@master1 sbin] $jps

6708 NameNode

6966 SecondaryNameNode

6808 DataNode

7116 Jps

5791 ResourceManager

5903 NodeManager

3.4. Browser to view service status

Use web to view HSFS running status

Enter in the browser

Http://192.168.120.131:50070

Use web to view YARN running status

Enter in the browser

Http://192.168.120.131:8088

4. Start ssh password-free authentication

You also need to enter a user name and login password when starting the service, as shown below:

[hadoop@master1 sbin] $. / start-yarn.sh

Starting yarn daemons

Starting resourcemanager, logging to / home/hadoop/hadoop/logs/yarn-hadoop-resourcemanager-master1.out

Hadoop@localhost's password:

If you want to start the service without a password, you need to configure ssh

[hadoop@master1 sbin] $cd ~ / .ssh/

[hadoop@master1 .ssh] $ll

Total dosage 4

-rw-r--r--. 1 hadoop hadoop 372 June 12 18:36 known_hosts

[hadoop@master1 .ssh] $ssh-keygen

Generating public/private rsa key pair.

Enter file in which to save the key (/ home/hadoop/.ssh/id_rsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in / home/hadoop/.ssh/id_rsa.

Your public key has been saved in / home/hadoop/.ssh/id_rsa.pub.

The key fingerprint is:

SHA256:D14LpPKZbih0K+kVoTl23zGsKK1xOVlNuSugDvrkjJA hadoop@master1

The key's randomart image is:

+-[RSA 2048]-+

| | |

|. | |

|. + |

| o. *. | |

| = = o S. | |

| | E.=oOoB + o |

| ooBo.. | |

+-[SHA256]-+

Just press enter all the way.

[hadoop@master1 .ssh] $ll

Total dosage 12

-rw-. 1 hadoop hadoop 1675 June 12 18:46 id_rsa

-rw-r--r--. 1 hadoop hadoop 396 June 12 18:46 id_rsa.pub

-rw-r--r--. 1 hadoop hadoop 372 June 12 18:36 known_hosts

[hadoop@master1 .ssh] $cat id_rsa.pub > > ~ / .ssh/authorized_keys

[hadoop@master1 .ssh] $ll

Total dosage 16

-rw-rw-r--. 1 hadoop hadoop 396 June 12 18:47 authorized_keys

-rw-. 1 hadoop hadoop 1675 June 12 18:46 id_rsa

-rw-r--r--. 1 hadoop hadoop 396 June 12 18:46 id_rsa.pub

-rw-r--r--. 1 hadoop hadoop 372 June 12 18:36 known_hosts

If you find that you still need to enter a password to log in, this is because of the file permissions. You can change the permissions.

[hadoop@master1 .ssh] $chmod 600 authorized_keys

Found that it is possible to log in without a password.

[hadoop@master1 .ssh] $ssh localhost

Last login: Tue Jun 12 18:48:38 2018 from fe80::e961:7d5b:6a72:a2a9%ens33

[hadoop@master1 ~] $

Of course, the implementation of secret login can also be implemented in another way.

After performing the ssh-keygen

Execute the following command

Ssh-copy-id-I / .ssh/id_rsa.pub hadoop@master1

5. Simple application of file system and some problems encountered.

5.1. Create a directory

Create a directory in the file system

[hadoop@master1 bin] $hdfs dfs-mkdir-p / user/hadoop

18-06-12 21:25:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicable

List the directories created

[hadoop@master1 bin] $hdfs dfs-ls /

18-06-12 21:29:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicable

Found 1 items

Drwxr-xr-x-hadoop supergroup 0 2018-06-12 21:25 / user

5.2. Resolve warning issues

There is a WARN warning, but it does not affect the normal use of Hadoop.

There are two ways to solve this alarm problem, one is to recompile the source code, the other is to cancel the alarm information in the log, I use the second way.

[hadoop@master1] $cd / home/hadoop/hadoop/etc/hadoop/

[hadoop@master1 hadoop] $vi log4j.properties

Add

# native WARN

Log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR

You can see the effect.

[hadoop@master1 hadoop] $hdfs dfs-ls /

Found 1 items

Drwxr-xr-x-hadoop supergroup 0 2018-06-12 21:25 / user

Upload files to the hdfs file system

[hadoop@master1 bin] $hdfs dfs-mkdir-p input

[hadoop@master1 hadoop] $hdfs dfs-put / home/hadoop/hadoop/etc/hadoop input

Hadoop comes with rich examples by default, including wordcoun,terasort,join,grep, etc. Execute the following command to view:

[hadoop@master1 bin] $hadoop jar / home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar

An example program must be given as the first argument.

Valid program names are:

Aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.

Aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.

Bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.

Dbcount: An example job that count the pageview counts from a database.

Distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.

Grep: A map/reduce program that counts the matches of a regex in the input.

Join: A job that effects a join over sorted, equally partitioned datasets

Multifilewc: A job that counts words from several files.

Pentomino: A map/reduce tile laying program to find solutions to pentomino problems.

Pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.

Randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.

Randomwriter: A map/reduce program that writes 10GB of random data per node.

Secondarysort: An example defining a secondarysort to the reduce.

Sort: A map/reduce program that sorts the data written by the random writer.

Sudoku: A sudoku solver.

Teragen: Generate data for the terasort

Terasort: Run the terasort

Teravalidate: Checking results of terasort

Wordcount: A map/reduce program that counts the words in the input files.

Wordmean: A map/reduce program that counts the average length of the words in the input files.

Wordmedian: A map/reduce program that counts the median length of the words in the input files.

Wordstandarddeviation: A map/reduce program that counts the standarddeviation of the length of the words in the input files.

Pseudo-distributed MapReduce jobs run in the same way as stand-alone mode, except that the pseudo-distributed mode reads files in HDFS (you can delete the local input folder created in the stand-alone step and the output output folder to verify this).

[hadoop@master1 sbin] $hadoop jar / home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar grep input output 'dfs [a Murz] +'

18-06-12 22:57:05 INFO client.RMProxy: Connecting to ResourceManager at / 192.168.120.131:8032

18-06-12 22:57:07 INFO input.FileInputFormat: Total input files to process: 30

Omit...

18-06-12 22:57:08 INFO mapreduce.Job: Running job: job_1528815135795_0001

18-06-12 22:57:23 INFO mapreduce.Job: Job job_1528815135795_0001 running in uber mode: false

18-06-12 22:57:23 INFO mapreduce.Job: map 0 reduce 0

18-06-12 22:58:02 INFO mapreduce.Job: map 13% reduce 0

Omit...

18-06-12 23:00:17 INFO mapreduce.Job: map 97% reduce 32%

18-06-12 23:00:18 INFO mapreduce.Job: map 100% reduce 32%

18-06-12 23:00:19 INFO mapreduce.Job: map 100 reduce 100%

18-06-12 23:00:20 INFO mapreduce.Job: Job job_1528815135795_0001 completed successfully

18-06-12 23:00:20 INFO mapreduce.Job: Counters: 50

File System Counters

FILE: Number of bytes read=46

FILE: Number of bytes written=6136681

FILE: Number of read operations=0

Omit...

File Input Format Counters

Bytes Read=138

File Output Format Counters

Bytes Written=24

View the result

[hadoop@master1 sbin] $hdfs dfs-cat output/*

1 dfsmetrics

1 dfsadmin

Take the results locally

[hadoop@master1 sbin] $hdfs dfs-get output / data

[hadoop@master1 sbin] $ll / data

Total dosage 0

Drwxrwxrwx. 5 hadoop hadoop 52 June 12 19:20 hadoop

Drwxrwxr-x. 2 hadoop hadoop 42 June 12 23:03 output

[hadoop@master1 sbin] $cat / data/output/*

1 dfsmetrics

1 dfsadmin

6. Start the history server

The history server service is used to view the running status of tasks in web

[hadoop@master1 sbin] $mr-jobhistory-daemon.sh start historyserver

Starting historyserver, logging to / home/hadoop/hadoop/logs/mapred-hadoop-historyserver-master1.out

[hadoop@master1 sbin] $jps

19985 Jps

15778 ResourceManager

15890 NodeManager

14516 NameNode

14827 SecondaryNameNode

19948 JobHistoryServer

14653 DataNode

Simplify the configuration as much as possible when you are beginners, which is helpful for troubleshooting after mistakes.

Reference:

Https://www.cnblogs.com/wangxin37/p/6501484.html

Https://www.cnblogs.com/xing901022/p/5713585.html

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.