Hadoop pseudo-distributed installation 07/08 Update SLTechnology News&Howtos

Hadoop pseudo-distributed installation

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Foreword:

Hadoop: storage and processing platform

Hdfs: cluster, NN,SNN,DN / / SNN: merge logs and images of HDFS

Mapreduce: cluster with central node,jobTracker/Task tracker

JT: cluster resource management

TT: task, map,reduce

Hadoop 2.0

YARN: cluster resource management, segmentation

MapReduce: data processing

RN,NM,AM / / RN: resource node, NM: node management, AM: resource agent

Container:mr task

Tez:execution engine

MR:batch

The ability to HA,YARN NN nodes can also be highly available

I. 2.0 working model

A [NM/Container A / APP M (B)]

\ /

[RM]-- [NM/Container B / APP M (A)]

B [NM/Container A /]

Client-- > RM--- > node1/node2/node n...

Resouce Manager: RM is independent

There are [node manager+App Master+ Container] / / NM+AM running on node

Node manager:NM, which runs on each node, periodically reports node information to RM

Clinet request job: Application master on node decides how many mapper and how many reducer to start

Mapper and reducer are called Container / / jobs that run inside the container.

There is only one Application master, and APP M for the same task is only on one node, but Container runs on multiple nodes and periodically reports its processing status to APP M.

APP M reports the health of the task to RM, and RM shuts down APP M after the task is completed.

When a task fails, it is managed by App M, not by RM

RM is global, NM is unique on each node, there is only one AM for a program, but contianer needs to be on multiple node

Hadoop 1.0,2.0

1.0 2.0

= =

/ MR/Pig/Hive |

Pig/Hive/Others [Tez] RT/Service (HBase)

[MapReduce] [YARN]

[HDFS] [HDFS2]

In Hadoop v1:

Mapreduce is:

1. Develop API

two。 Operation framework

3. Operation environment

II. Installation of Hadoop

1. Stand-alone model: test usage

two。 Pseudo-distributed model: running on a single machine

3. Distributed model: cluster model

Hadoop: based on the java language, you need to rely on jvm

Hadoop-2.6.2:jdk 1.6 +

Hadoop-2.7 jdk 1.7 +

1. Environment

Vim / etc/profile.d/java.sh

JAVA_HOME=/usr

Yum install java-1.8.0-openjdk-devel.x86_64

When each java program is running, it starts a jvm and needs to configure its heap memory.

New generation, old age, persistent generation / / garbage collector

Slaves:

A DN;data node, for yarn, is a node manager

Tar xvf hadoop-2.6.2.tar.gz-C / bdapps/

Cd / bdapps/

Ln-sv hadoop-2.6.2/ hadoop

Cd hadoop

Vim / etc/profile.d/hadoop.sh

Export HADOOP_PREFIX=/bdapps/hadoopexport PATH=$PATH:$ {HADOOP_PREFIX} / bin:$ {HADOOP_PREFIX} / sbinexport HADOOP_YARN_HOME=$ {HADOOP_PREFIX} export HADOOP_MAPPERD_HOME=$ {HADOOP_PREFIX} export HADOOP_COMMON_HOME=$ {HADOOP_PREFIX} export HADOOP_HDFS_HOME=$ {HADOOP_PREFIX}

. / etc/profile.d/hadoop.sh

two。 Create users and related directories that run the Hadoop process

[root@node2 hadoop] # groupadd hadoop [root@node2 hadoop] # useradd-g hadoop yarn [root@node2 hadoop] # useradd-g hadoop hdfs [root@node2 hadoop] # useradd-g hadoop mapred create data and log directory [root@node2 hadoop] # mkdir-pv / data/hadoop/hdfs/ {nn,snn Dn} [root@node2 hadoop] # chown-R hdfs:hadoop / data/hadoop/hdfs/ [root@node2 hadoop] # cd / bdapps/hadoop [root@node2 hadoop] # mkdir logs [root@node2 hadoop] # chmod Grouw logs [root@node2 hadoop] # chown-R yarn:hadoop. / *

3. Configure hadoop

Core-site.xml contains NameNode host address and listening RPC port information. For pseudo-distributed model installation, the host address is localhost,NameNode and the default RPC port is 8020.

Vim etc/hadoop/core-site.xml

Fs.defaultFS hdfs://localhost:8020 true

Configure hdfs-related attributes, such as replication factors (copies of data blocks), NN and directories where DN stores data, and so on. The copy of the block should be 1 for pseudo-distributed Hadoop

The directory used by NN and DN to store the data is the path specifically created for it in the previous step.

[root@node2 hadoop] # vim etc/hadoop/hdfs-site.xml

Number of copies of dfs.replication / / dfs 1 dfs.namenode.name.dir file:///data/hadoop/hdfs/nn dfs.datanode.data.dir file:///data/hadoop/hdfs/dn fs.checkpoint.dir file:///data/hadoop/hdfs/snn fs.checkpoint.edits.dir file:///data/hadoop/hdfs/snn

Configure MapReduce framework, which should specify the use of yarn. Other available values are local and classic

[root@node2 hadoop] # cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml

[root@node2 hadoop] # vim etc/hadoop/mapred-site.xml

Mapreduce.framework.name yarn

Configure the YARN process and YARN-related properties, first execute the host of the ResourceManager daemon and the listening port.

For the pseudo-distributed model, the host is localhost, and the default port is 8032; secondly, specify the scheduler used by ResourceManager and the auxiliary tasks of NodeManager.

[root@node2 hadoop] # vim etc/hadoop/yarn-site.xml

Yarn.resourcemanager.address localhost:8032 yarn.resourcemanager.scheduler.address localhost:8030 yarn.resourcemanager.resource-tracker.address localhost:8031 yarn.resourcemanager.admin.address localhost:8033 yarn.resourcemanager.webapp.address localhost:8088 yarn.nodemanager.aux-services mapreduce_shuffle Yarn.nodemanager.auxservices.mapreduce_shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler

Configure hadoop-devn.sh and yarn-env.sh-- by default

Each daemon of Hadoop depends on JAVA_HOME, and most daemons of Hadoop have a heap memory size of 1G.

Can be adjusted according to demand.

Slaves: defines the list of slave nodes for hdfs-default is native

4. Format HDFS

If the dfs.namenode.name.dir directory in hdfs-site.xml does not exist, formatting creates it

If the implementation exists: you need to ensure that its permissions are set correctly, and formatting will erase its internal data and re-establish the file system

Switch the identity execution of the hdfs user

[root@node2 hadoop] # su-hdfs

Hdfs command classification:

User Commands

Dfs: file system command, rm,cat,put,get,rmr,ls,cp,du,...

Hdfs dfs-put localfile / user/hadoop/hadoopfile

Fetchdt

Fsck

Version

Administration Commands

Balancer

Datanode

Dfsadmin

Mover

Namenode

Secondarynamenode

Simple configuration error: cannot generate files in / data directory

Resolution steps:

1.diff command to delete all white space in the header of the configuration file

Vim

::% s / ^ [[: space:]]\ + / /

[hdfs@localhost ~] $hdfs namenode-format

/ data/hadoop/hdfs/nn has been successfully formatted.

5. Start hadoop / / NN,DN,SNN,RM,NM

NameNode: hadoop-daemon.sh start/stop namenode

DataNode: hadoop-daemon.sh start/stop datanode

Secondary NameNode: hadoop-daemon.sh start/stop secondarynamenode

ResourceManager: yarn-daemon.sh start/stop resourcemanager

NodeManager: yarn-daemon.sh start/stop nodemanager

Start the HDFS service

HDFS has three daemons:

Namenode, datanode and secondarynamenode can all go through hadoop daemon.sh

YARN has two daemons:

Both resourcemanager and nodemanager can be started through the yarn-daemon.sh script

[hdfs@localhost ~] $hadoop-daemon.sh start namenode

Starting namenode, logging to / bdapps/hadoop/logs/hadoop-hdfs-namenode-localhost.o

[hdfs@localhost ~] $jps / / ps command for java

4215 NameNode

4255 Jps

[hdfs@localhost ~] $hadoop-daemon.sh start secondarynamenode

Starting secondarynamenode, logging to / bdapps/hadoop/logs/hadoop-hdfs-secondarynamenode-localhost.out

[hdfs@localhost ~] $hadoop-daemon.sh start datanode

Starting datanode, logging to / bdapps/hadoop/logs/hadoop-hdfs-datanode-localhost.ou

/ / the file can be uploaded at this time

[hdfs@localhost ~] $hdfs dfs-mkdir / test / / create a group

[hdfs@localhost ~] $hdfs dfs-ls /

Found 1 items

Drwxr-xr-x-hdfs supergroup 0 2017-05-13 22:18 / te

[hdfs@localhost ~] $hdfs dfs-put / etc/fstab / test/

[hdfs@localhost ~] $hdfs dfs-ls / test

Found 1 items

-rw-r--r-- 1 hdfs supergroup 537 2017-05-13 22:21 / test/fstab

Cat / data/hadoop/hdfs/dn/current/BP-1163334701-127.0.0.1-1494676927122/current/finalized/subdir0/subdir0/blk_1073741825

/ / you can view it

Hdfs dfs-cat / test/fstab / / can also view the content.

/ / if the file is too large, it will be cut into n pieces.

Note: if you want other users to have write access to hdfs, you need to add an attribute definition to hdfs-site.xml.

Dfs.permissions false [root@node2 test] # su-yarn [yarn@localhost ~] $yarn-daemon.sh start resourcemanagerstarting resourcemanager, logging to / bdapps/hadoop/logs/yarn-yarn-resourcemanager-localhost.out [yarn@localhost ~] $yarn-daemon.sh start nodemanagerstarting nodemanager, logging to / bdapps/hadoop/logs/yarn-yarn-nodemanager-localhost.out [yarn@localhost ~] $jps5191 Jps5068 NodeManager4829 ResourceManager

Overview of 6.Web UI

HDFS and YARN ResourceManager each provide Web interfaces.

HDFS-NameNode: http://NameNodeHost:50070/

YARN-ResourceManager http://ResourceManagerHost:8088

Http://192.168.4.105:50070/dfshealth.html#tab-overview

YARN-ResourceManager: only listen at 127.0.0.1

The content of EditLog:

NameNode SecondaryNameNode

Fs-p_w_picpath

| | _ > merge into a new fs-p_w_picpath |

EditLog scrolls once-> take out |

| |

Overwrite the original fs-p_w_picpath_ |

SecondaryNameNode: merge Editlog on NameNode into fs-imange to ensure persistent impact files and keep as much information as possible

The real metadata is in the memory of NameNode. If the data is modified, it will be appended to editlog.

Firefox localhost:8088 &

Applications

SUBMITED: submitted assignment

ACCEPTED: accepted assignment

RUNNING

FINISHED

FAILED: failed

KILLED:

/ / SecnondaryNameNode is obtained from NameNode

7. Run the test program

[root@localhost mapreduce] # su-hdfs

[hdfs@localhost ~] $cd / bdapps/hadoop/share/hadoop/mapreduce/

Yarn jar hadoop-mapreduce-examples-2.6.2.jar / / there are many test programs

Hdfs dfs-ls / test/fstab.out / / View

Hdfs dfs-cat / test/fstab.out/paprt-r-00000 / / View

/ / count the number of occurrences of each word

Summary

1. Configure the environment

Java.sh hadoop.sh

two。 Create users and related directories

3. Configure hadoop

Core-site.xml

Hdfs-site.xml

Mapred-site.xml

Yarn-site.xml

4. Format HDFS

[hdfs@localhost ~] $hdfs namenode-format

/ data/hadoop/hdfs/nn has been successfully formatted.

5. Start hadoop

NameNode: hadoop-daemon.sh start/stop namenode

DataNode: hadoop-daemon.sh start/stop datanode

Secondary NameNode: hadoop-daemon.sh start/stop secondarynamenode

ResourceManager: yarn-daemon.sh start/stop resourcemanager

NodeManager: yarn-daemon.sh start/stop nodemanager

Overview of 6.Web UI

HDFS and YARN ResourceManager each provide Web interfaces.

HDFS-NameNode: http://NameNodeHost:50070/

YARN-ResourceManager http://ResourceManagerHost:8088

Help documentation: http://hadoop.apache.org/docs/r2.6.5/

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.