In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Foreword:
Hadoop: storage and processing platform
Hdfs: cluster, NN,SNN,DN / / SNN: merge logs and images of HDFS
Mapreduce: cluster with central node,jobTracker/Task tracker
JT: cluster resource management
TT: task, map,reduce
Hadoop 2.0
YARN: cluster resource management, segmentation
MapReduce: data processing
RN,NM,AM / / RN: resource node, NM: node management, AM: resource agent
Container:mr task
Tez:execution engine
MR:batch
The ability to HA,YARN NN nodes can also be highly available
I. 2.0 working model
=
A [NM/Container A / APP M (B)]
\ /
[RM]-- [NM/Container B / APP M (A)]
/\
B [NM/Container A /]
Client-- > RM--- > node1/node2/node n...
Resouce Manager: RM is independent
There are [node manager+App Master+ Container] / / NM+AM running on node
Node manager:NM, which runs on each node, periodically reports node information to RM
Clinet request job: Application master on node decides how many mapper and how many reducer to start
Mapper and reducer are called Container / / jobs that run inside the container.
There is only one Application master, and APP M for the same task is only on one node, but Container runs on multiple nodes and periodically reports its processing status to APP M.
APP M reports the health of the task to RM, and RM shuts down APP M after the task is completed.
When a task fails, it is managed by App M, not by RM
RM is global, NM is unique on each node, there is only one AM for a program, but contianer needs to be on multiple node
Hadoop 1.0,2.0
1.0 2.0
= =
/ MR/Pig/Hive |
Pig/Hive/Others [Tez] RT/Service (HBase)
[MapReduce] [YARN]
[HDFS] [HDFS2]
In Hadoop v1:
Mapreduce is:
1. Develop API
two。 Operation framework
3. Operation environment
II. Installation of Hadoop
1. Stand-alone model: test usage
two。 Pseudo-distributed model: running on a single machine
3. Distributed model: cluster model
Hadoop: based on the java language, you need to rely on jvm
Hadoop-2.6.2:jdk 1.6 +
Hadoop-2.7 jdk 1.7 +
1. Environment
Vim / etc/profile.d/java.sh
JAVA_HOME=/usr
Yum install java-1.8.0-openjdk-devel.x86_64
When each java program is running, it starts a jvm and needs to configure its heap memory.
New generation, old age, persistent generation / / garbage collector
Slaves:
A DN;data node, for yarn, is a node manager
Tar xvf hadoop-2.6.2.tar.gz-C / bdapps/
Cd / bdapps/
Ln-sv hadoop-2.6.2/ hadoop
Cd hadoop
Vim / etc/profile.d/hadoop.sh
Export HADOOP_PREFIX=/bdapps/hadoopexport PATH=$PATH:$ {HADOOP_PREFIX} / bin:$ {HADOOP_PREFIX} / sbinexport HADOOP_YARN_HOME=$ {HADOOP_PREFIX} export HADOOP_MAPPERD_HOME=$ {HADOOP_PREFIX} export HADOOP_COMMON_HOME=$ {HADOOP_PREFIX} export HADOOP_HDFS_HOME=$ {HADOOP_PREFIX}
. / etc/profile.d/hadoop.sh
two。 Create users and related directories that run the Hadoop process
[root@node2 hadoop] # groupadd hadoop [root@node2 hadoop] # useradd-g hadoop yarn [root@node2 hadoop] # useradd-g hadoop hdfs [root@node2 hadoop] # useradd-g hadoop mapred create data and log directory [root@node2 hadoop] # mkdir-pv / data/hadoop/hdfs/ {nn,snn Dn} [root@node2 hadoop] # chown-R hdfs:hadoop / data/hadoop/hdfs/ [root@node2 hadoop] # cd / bdapps/hadoop [root@node2 hadoop] # mkdir logs [root@node2 hadoop] # chmod Grouw logs [root@node2 hadoop] # chown-R yarn:hadoop. / *
3. Configure hadoop
Core-site.xml contains NameNode host address and listening RPC port information. For pseudo-distributed model installation, the host address is localhost,NameNode and the default RPC port is 8020.
Vim etc/hadoop/core-site.xml
Fs.defaultFS hdfs://localhost:8020 true
Configure hdfs-related attributes, such as replication factors (copies of data blocks), NN and directories where DN stores data, and so on. The copy of the block should be 1 for pseudo-distributed Hadoop
The directory used by NN and DN to store the data is the path specifically created for it in the previous step.
[root@node2 hadoop] # vim etc/hadoop/hdfs-site.xml
Number of copies of dfs.replication / / dfs 1 dfs.namenode.name.dir file:///data/hadoop/hdfs/nn dfs.datanode.data.dir file:///data/hadoop/hdfs/dn fs.checkpoint.dir file:///data/hadoop/hdfs/snn fs.checkpoint.edits.dir file:///data/hadoop/hdfs/snn
Configure MapReduce framework, which should specify the use of yarn. Other available values are local and classic
[root@node2 hadoop] # cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
[root@node2 hadoop] # vim etc/hadoop/mapred-site.xml
Mapreduce.framework.name yarn
Configure the YARN process and YARN-related properties, first execute the host of the ResourceManager daemon and the listening port.
For the pseudo-distributed model, the host is localhost, and the default port is 8032; secondly, specify the scheduler used by ResourceManager and the auxiliary tasks of NodeManager.
[root@node2 hadoop] # vim etc/hadoop/yarn-site.xml
Yarn.resourcemanager.address localhost:8032 yarn.resourcemanager.scheduler.address localhost:8030 yarn.resourcemanager.resource-tracker.address localhost:8031 yarn.resourcemanager.admin.address localhost:8033 yarn.resourcemanager.webapp.address localhost:8088 yarn.nodemanager.aux-services mapreduce_shuffle Yarn.nodemanager.auxservices.mapreduce_shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
Configure hadoop-devn.sh and yarn-env.sh-- by default
Each daemon of Hadoop depends on JAVA_HOME, and most daemons of Hadoop have a heap memory size of 1G.
Can be adjusted according to demand.
Slaves: defines the list of slave nodes for hdfs-default is native
4. Format HDFS
If the dfs.namenode.name.dir directory in hdfs-site.xml does not exist, formatting creates it
If the implementation exists: you need to ensure that its permissions are set correctly, and formatting will erase its internal data and re-establish the file system
Switch the identity execution of the hdfs user
[root@node2 hadoop] # su-hdfs
Hdfs command classification:
User Commands
Dfs: file system command, rm,cat,put,get,rmr,ls,cp,du,...
Hdfs dfs-put localfile / user/hadoop/hadoopfile
Fetchdt
Fsck
Version
Administration Commands
Balancer
Datanode
Dfsadmin
Mover
Namenode
Secondarynamenode
Simple configuration error: cannot generate files in / data directory
Resolution steps:
1.diff command to delete all white space in the header of the configuration file
Vim
::% s / ^ [[: space:]]\ + / /
[hdfs@localhost ~] $hdfs namenode-format
/ data/hadoop/hdfs/nn has been successfully formatted.
5. Start hadoop / / NN,DN,SNN,RM,NM
NameNode: hadoop-daemon.sh start/stop namenode
DataNode: hadoop-daemon.sh start/stop datanode
Secondary NameNode: hadoop-daemon.sh start/stop secondarynamenode
ResourceManager: yarn-daemon.sh start/stop resourcemanager
NodeManager: yarn-daemon.sh start/stop nodemanager
Start the HDFS service
HDFS has three daemons:
Namenode, datanode and secondarynamenode can all go through hadoop daemon.sh
YARN has two daemons:
Both resourcemanager and nodemanager can be started through the yarn-daemon.sh script
[hdfs@localhost ~] $hadoop-daemon.sh start namenode
Starting namenode, logging to / bdapps/hadoop/logs/hadoop-hdfs-namenode-localhost.o
[hdfs@localhost ~] $jps / / ps command for java
4215 NameNode
4255 Jps
[hdfs@localhost ~] $hadoop-daemon.sh start secondarynamenode
Starting secondarynamenode, logging to / bdapps/hadoop/logs/hadoop-hdfs-secondarynamenode-localhost.out
[hdfs@localhost ~] $hadoop-daemon.sh start datanode
Starting datanode, logging to / bdapps/hadoop/logs/hadoop-hdfs-datanode-localhost.ou
/ / the file can be uploaded at this time
[hdfs@localhost ~] $hdfs dfs-mkdir / test / / create a group
[hdfs@localhost ~] $hdfs dfs-ls /
Found 1 items
Drwxr-xr-x-hdfs supergroup 0 2017-05-13 22:18 / te
[hdfs@localhost ~] $hdfs dfs-put / etc/fstab / test/
[hdfs@localhost ~] $hdfs dfs-ls / test
Found 1 items
-rw-r--r-- 1 hdfs supergroup 537 2017-05-13 22:21 / test/fstab
Cat / data/hadoop/hdfs/dn/current/BP-1163334701-127.0.0.1-1494676927122/current/finalized/subdir0/subdir0/blk_1073741825
/ / you can view it
Hdfs dfs-cat / test/fstab / / can also view the content.
/ / if the file is too large, it will be cut into n pieces.
Note: if you want other users to have write access to hdfs, you need to add an attribute definition to hdfs-site.xml.
Dfs.permissions false [root@node2 test] # su-yarn [yarn@localhost ~] $yarn-daemon.sh start resourcemanagerstarting resourcemanager, logging to / bdapps/hadoop/logs/yarn-yarn-resourcemanager-localhost.out [yarn@localhost ~] $yarn-daemon.sh start nodemanagerstarting nodemanager, logging to / bdapps/hadoop/logs/yarn-yarn-nodemanager-localhost.out [yarn@localhost ~] $jps5191 Jps5068 NodeManager4829 ResourceManager
Overview of 6.Web UI
HDFS and YARN ResourceManager each provide Web interfaces.
HDFS-NameNode: http://NameNodeHost:50070/
YARN-ResourceManager http://ResourceManagerHost:8088
Http://192.168.4.105:50070/dfshealth.html#tab-overview
YARN-ResourceManager: only listen at 127.0.0.1
The content of EditLog:
NameNode SecondaryNameNode
Fs-p_w_picpath
| | _ > merge into a new fs-p_w_picpath |
EditLog scrolls once-> take out |
| |
Overwrite the original fs-p_w_picpath_ |
SecondaryNameNode: merge Editlog on NameNode into fs-imange to ensure persistent impact files and keep as much information as possible
The real metadata is in the memory of NameNode. If the data is modified, it will be appended to editlog.
Firefox localhost:8088 &
Applications
SUBMITED: submitted assignment
ACCEPTED: accepted assignment
RUNNING
FINISHED
FAILED: failed
KILLED:
/ / SecnondaryNameNode is obtained from NameNode
7. Run the test program
[root@localhost mapreduce] # su-hdfs
[hdfs@localhost ~] $cd / bdapps/hadoop/share/hadoop/mapreduce/
Yarn jar hadoop-mapreduce-examples-2.6.2.jar / / there are many test programs
Hdfs dfs-ls / test/fstab.out / / View
Hdfs dfs-cat / test/fstab.out/paprt-r-00000 / / View
/ / count the number of occurrences of each word
Summary
1. Configure the environment
Java.sh hadoop.sh
two。 Create users and related directories
3. Configure hadoop
Core-site.xml
Hdfs-site.xml
Mapred-site.xml
Yarn-site.xml
4. Format HDFS
[hdfs@localhost ~] $hdfs namenode-format
/ data/hadoop/hdfs/nn has been successfully formatted.
5. Start hadoop
NameNode: hadoop-daemon.sh start/stop namenode
DataNode: hadoop-daemon.sh start/stop datanode
Secondary NameNode: hadoop-daemon.sh start/stop secondarynamenode
ResourceManager: yarn-daemon.sh start/stop resourcemanager
NodeManager: yarn-daemon.sh start/stop nodemanager
Overview of 6.Web UI
HDFS and YARN ResourceManager each provide Web interfaces.
HDFS-NameNode: http://NameNodeHost:50070/
YARN-ResourceManager http://ResourceManagerHost:8088
Help documentation: http://hadoop.apache.org/docs/r2.6.5/
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.