In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Before the sudden:
Daemon in Hadoop Cluster
HDFS:
NameNode,NN
SecondaryNode,SNN
DataNode:DN
/ data/hadoop/hdfs/ {nn,snn,dn}
Nn:fsp_w_picpath,editlog// mirroring and editing logs
/ / the NN of hdfs stores data in memory and constantly modifies metadata according to file state changes.
Fsp_w_picpath stores: which node will the files be stored on after segmentation?
/ / changes to the file metadata will be written to editllog and finally to fsp_w_picpath, so the data will still exist after the next NN restart. Read the data from fsp_w_picpath and get it to memory.
/ / once nn crashes, data recovery will take a lot of time
Snn: when nn crashes, put it on in time to save the time to repair nn and bring nn back online, but each data node reports the data status and the time to repair is still needed.
Normally: snn is in charge of fsp_w_picpath and editlog for copy nn and then merges on snn
Checkpoint: because nn is constantly changing, snn specifies to merge to that point in time.
/ / it is officially recommended that more than 30 node should build hadoop clusters.
Does data need to work on raid// because hdfs already has the function of replicate, so it is not necessary to provide redundancy again?
Hadoop-daemon.sh running process
When you hadoop-daemon.sh start DataNode in cluster mode, you need to find each DataNode node automatically, and then start it automatically on each DataNode.
How to find, or how to ensure that the command can automatically connect to each slave node through the master node, and have permission to execute the command.
On the primary node: configuration
YARN:
ResourceManager
NodeManager:
Yarn-daemon.sh start/stop
The actual running process:
[NN] [SNN] [RM]
| | |
-
[node1/NN] [nod2/NN] [node3/NN]
Start on node: just the datanode process and the nodemanager process
Experimental model:
[NN/SNN/RM]
| |
-
[node1/NN] [nod2/NN] [node3/NN]
Run on the master node: namenode,secondarynamenode,resourcemanager three processes
Start on other node: datanode process and nodemanager process
Preparation:
1.ntpdate synchronization
Tzselect
Timedatactl / / View time zone settings
Timedatectl list-timezones # list all time zones
Timedatectl set-local-rtc 1 # adjusts the hardware clock to match the local clock, and 0 is set to UTC time
Timedatectl set-timezone Asia/Shanghai # sets the system time zone to Shanghai
Cp / usr/share/zoneinfo/Asia/Shanghai / etc/localtime / / the simplest solution
2.hosts communication
172.16.100.67 node1.mt.com node1 master
172.16.100.68 node2.mt.com node2
172.16.100.69 node3.mt.com node3
172.16.100.70 node4.mt.com node4
If you need to start or stop the entire cluster through the master node, you need to configure users running the service on master, such as hdfs and yarn, to be able to link based on key ssh
Node1:
I. Prelude
(1) configure the environment
Vim / etc/profile.d/java.sh
JAVA_HOME=/usr
Yum install java-1.8.0-openjdk-devel.x86_64
Scp / etc/profile.d/java.sh node2:/etc/profile.d/
Scp / etc/profile.d/java.sh node3:/etc/profile.d/
Scp / etc/profile.d/java.sh node4:/etc/profile.d/
Vim / etc/profile.d/hadoop.sh
Export HADOOP_PREFIX=/bdapps/hadoop
Export PATH=$PATH:$ {HADOOP_PREFIX} / bin:$ {HADOOP_PREFIX} / sbin
Export HADOOP_YARN_HOME=$ {HADOOP_PREFIX}
Export HADOOP_MAPPERD_HOME=$ {HADOOP_PREFIX}
Export HADOOP_COMMON_HOME=$ {HADOOP_PREFIX}
Export HADOOP_HDFS_HOME=$ {HADOOP_PREFIX}
. / etc/profile.d/hadoop.sh
Scp / etc/profile.d/hadoop.sh node2:/etc/profile.d/
Scp / etc/profile.d/hadoop.sh node3:/etc/profile.d/
Scp / etc/profile.d/hadoop.sh node4:/etc/profile.d/
(2) modify hosts file
Vim / etc/hosts
172.16.100.67 node1.mt.com node1 master
172.16.100.68 node2.mt.com node2
172.16.100.69 node3.mt.com node3
172.16.100.70 node4.mt.com node4
Scp to node2,node3,node4
(3) login with hadoop key
Useradd hadoop / / node2,3,4 has a hadoop user
Echo "hadoop" | passwd-- stdin hadoop
Useradd-g hadoop hadoop / / both use one user here, and you can also create yarn and hdfs users respectively
Su-hadoop
Ssh-keygen
For i in 2 34; do ssh-copy-id-I. ssh / id_rsa.pub hadoop@node$ {I}; done
Verify:
Ssh node2 'date'
Ssh node3 'date'
Ssh node4 'date'
2. Install and deploy hadoop
(1) decompression
Mkdir-pv / bdapps/ / data/hadoop/hdfs/ {nn,snn,dn} / / the dn here is not needed, because the master node does not store data and may not be created.
Chown-R hadoop:hadoop / data/hadoop/hdfs
Tar xvf hadoop-2.6.2.tar.gz-C / bdapps/
Cd / bdapps/
Ln-sv hadoop-2.6.2 hadoop
Cd hadoop
Mkdir logs
Chown Grouw logs
Chown-R hadoop:hadoop. / *
(2) configuration file modification
1.core-site.xml configuration
Vim etc/hadoop/core-site.xml
Fs.defaultFS
Hdfs://master:8020
/ / the access interface of hdfs. If master cannot be parsed, you can also use ip address.
True
/ / core points to NN
2.yanr-site.xml file configuration
Vim etc/hadoop/yarn-site.xml
Yarn.resourcemanager.address
Master:8032
Yarn.resourcemanager.scheduler.address
Master:8030
Yarn.resourcemanager.resource-tracker.address
Master:8031
Yarn.resourcemanager.admin.address
Master:8033
Yarn.resourcemanager.webapp.address
Master:8088
Yarn.nodemanager.aux-services
Mapreduce_shuffle
Yarn.nodemanager.auxservices.mapreduce_shuffle.class
Org.apache.hadoop.mapred.ShuffleHandler
Yarn.resourcemanager.scheduler.class
Org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
/ /% s/localhost/master/g / / replace localhost with master
/ / point to ResourceManager
3.hdfs-site.xml configuration
Vim etc/hadoop/hdfs-site.xml
Number of copies of dfs.replication / / dfs
two
Dfs.namenode.name.dir
File:///data/hadoop/hdfs/nn
Dfs.datanode.data.dir
File:///data/hadoop/hdfs/dn
Fs.checkpoint.dir
File:///data/hadoop/hdfs/snn
Fs.checkpoint.edits.dir
File:///data/hadoop/hdfs/snn
4.
Mapred-site.xml is the only one that doesn't need to be modified.
The default is yarn.
5.
Vim slaves
Node2
Node3
Node4
/ / slaves is datanode and nodemanager
(3)
After node2,node3,node4 assigns to this step: chown-R hadoop:hadoop. / *
Su-hadoop
Scp / bdapps/hadoop/etc/hadoop/* node2:/bdapps/hadoop/etc/hadoop/
Scp / bdapps/hadoop/etc/hadoop/* node3:/bdapps/hadoop/etc/hadoop/
Scp / bdapps/hadoop/etc/hadoop/* node4:/bdapps/hadoop/etc/hadoop/
three。 Format and start
Su-hadoop
Hdfs namenode-format
Show / data/hadoop/hdfs/nn hash been successful formatted indicates success
There are two ways to start hadoop:
1. Start the service to be started on each node
Start the yarn service using the yarn user identity
Master nodes: NameNode services and ResourceManager services
Su-hdfs-c 'hadoop-daemon.sh start namenode'
Su-hdfs-c 'yarn-daemon.sh start resourcemanager'
Slave nodes: DataNode services and NodeManager services
Su-hdfs-c 'hadoop-daemon.sh start datanode'
Su-hdfs-c 'yarn-daemon.sh start nodemanager'
two。 Start the entire cluster on master
Su-hdfs-c 'start-dfs.sh'
Su-hdfs-c 'start-yarn.sh'
The old version used start-all.sh and stop-all.sh to control hdfs and mapreduce
Start the service:
Su-hdfs-c 'start-dfs.sh'
Su-hdfs-c 'stop-dfs.sh' / / close hdfs
It will be prompted to start on the 2pm 3pm 4 node.
Su-hdfs-c 'start-yarn.sh'
Master starts resourcemanager
Start nodemanager on slave
Test:
Node3: su-hadoop
Jps / / View DataNode process and NodeManager process
Node1:su-hadoop
Jps / / starts secondaryNameNode and NameNode processes
Hdfs dfs-mkdir / test
Hdfs dfs-put / etc/fstab / test/fstab
Hdfs dfs-ls-R / test
Hdfs dfs-cat / test/fstab
Node3:
Ls / data/hadoop/hdfs/dn/current/..../blk,... Store it here.
Note: one of the node2,3,4 does not store the file because the defined slaves is 2
Vim etc/hadoop/hdfs-site.xml
Number of copies of dfs.replication / / dfs
two
View the Web API:
172.16.100.67:8088
The memory is displayed as 24G, because of 3G, the physical memory generation size of each node is 8G.
172.16.100.67:50070
Datanode: there are three
A single file that is too small will not be cut, and files larger than 64m will be sliced.
Compressed files can be uploaded directly and will be sliced.
Run the task test:
Yarn jar / bdapps/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-example-2.6.jar wordcount / test/fstab / test/functions / test/wc
Hdfs dfs cat / test/wc/part-r-0000
IV. Other nodes
Node2:
User hadoop
Echo "hadoop" | passwd-- stdin hadoop
Mkdir-pv / bdapps / data/hadoop/hdfs/ {nn,snn,dn} / / only dn is useful
Chown-R hadoop:hadoop / data/hadoop/hdfs/
Tar xvf hadoop-2.6.2.tar.gz-C / bdapps/
Cd / bdapps/
Ln-sv hadoop-2.6.2 hadoop
Cd hadoop
Mkdir logs
Chown Grouw logs
Chown-R hadoop:hadoop. / *
/ / after modifying the configuration file, you can copy it directly to node3 and node4 because the configuration is the same.
Node3:
User hadoop
Echo "hadoop" | passwd-- stdin hadoop
Mkdir-pv / bdapps / data/hadoop/hdfs/ {nn,snn,dn} / / only dn is useful
Chown-R hadoop:hadoop / data/hadoop/hdfs/
Tar xvf hadoop-2.6.2.tar.gz-C / bdapps/
Cd / bdapps/
Ln-sv hadoop-2.6.2 hadoop
Cd hadoop
Mkdir logs
Chown Grouw logs
Chown-R hadoop:hadoop. / *
Node4:
User hadoop
Echo "hadoop" | passwd-- stdin hadoop
Mkdir-pv / bdapps / data/hadoop/hdfs/ {nn,snn,dn} / / only dn is useful
Chown-R hadoop:hadoop / data/hadoop/hdfs/
Tar xvf hadoop-2.6.2.tar.gz-C / bdapps/
Cd / bdapps/
Ln-sv hadoop-2.6.2 hadoop
Cd hadoop
Mkdir logs
Chown Grouw logs
Chown-R hadoop:hadoop. / *
=
Cluster Management Command of yarn
Yarn [--config confdir] COMMAND
Resourcemanager-format-state-store / / Delete RMStateStore
Resourcemanager / / run ResourceManager
Nodemanaer / / run nodemanager on each slave
Timelineserver / / run timelineserver, task scheduling, timeline
Rmadmin / / resourcemanager management
Version
Jar / / run the jar file
Application / / display application information
Report/kill application
Applicationattempt / / attempt to run the relevant report
Container / / Container related information
Node / / display node
Queue / / report queue information
Logs / / backup container log
Classpath / / displays the class loading path when java runs the program
Daemonlog / / get the log level of the daemon
Jar,application,node,logs,classpath,version is a commonly used user command
Resourcemanager,nodemanager,proxyserver,rmadmin,daemon is a commonly used management command
Yarn application [options]
-status ApplicationID status information
Yarn application-status application_1494685700454_0001
-list lists the application on yarn
-appTypes:MAPREDUCE,YARN
-appStates:ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING,FINISHED,FAILED,KILLED
Yarn application-appStates=all
-kill ApplicationID
Yarn node
-list / / instantiate node list
-states: NEW,RUNNING,UNHEALTHY is unhealthy, DECOMMISSION retired, LOST,REBOOTED
-staus Node-ID / / display node information
Logs: displays the log of the completed YARN program (and status is: FAILED,KILLED,FINISHED)
If you need to view logs on the command line, you need to configure yarn-site.xml
The yarn.log-aggregation-enable attribute value is true
Yarn logs-applicationId [applicationID] [options]
-applicationId applicationID prerequisite, which is used to get its details from ResourceManager.
-appOwner APPOwner is the current user by default, optional
-nodeAddress NodeAddress-containerId containerID: gets the information about the specified container on the current specified node. The format of NodeAddress is the same as NodeId.
Classpath:
Yarn calasspath / / load java program path
Administrative commands:
Rmadmin
Nodemanager
Timelineserver
Rmadmin is a client program of ResourceManager that can be used to refresh access control policies, scheduler queues, nodes registered to RM, and so on.
After refreshing, no reboot is required to take effect.
Yarn rmadmin [options]
-help
-refreshQueues: reloads the acl, status and caller queue of the queue; it reinitializes the scheduler based on the configuration information in the configuration file
-refreshNodes: refreshes the host information for RM, which updates the list of nodes that the cluster needs to include or exclude by reading the include and exclude files of the RM node.
-refreshUserToGroupMappings: updates the mapping relationship between users and groups by refreshing the information in the group cache according to the configured Hadoop security group mapping.
-refreshSuperUserGroupsConfiguration: refresh the superuser agent group mapping and update the agent group defined by the hadoop.proxyuser attribute in the agent host and core-site.xml configuration files
-refreshAdminAcls: refreshes the management ACL of RM based on the yarn.admin.acl property of the yarn site profile or the default profile
-refreshServiceAcl: reloads the service level authorization policy file, and then RM will reload the authorization policy file; it checks whether hadoop security authorization is enabled and refreshes the ACL for IPC Server,ApplicationMaster,Client and Resource tracker
DaemonLog: viewing or more detailed
Http://host:port/logLevel?log=name service?
Yarn daemonlog [options] args
-getLevel host:port name: displays the log level of the specified daemon
-getLevel host:port level: sets the log level of the daemon
Run YARN application
Yarn application can be a shell script, a MapReduce job, or any other type of job.
Steps:
1.Application initialization submission / / client completion
two。 Allocate memory and start AM / / RM to complete
3.AM registration and resource allocation / / AM is completed on nodemanager
4. Start and monitor container / / AM report to NM, NM report RM completion
5.Application Progress report / / AM complete
6.Application progress completed / /
Using ambari to deploy hadoop Cluster
Https://www.ibm.com/developerworks/cn/opensource/os-cn-bigdata-ambari/
Https://cwiki.apache.org/confluence/display/AMBARI/Installation+Guide+for+Ambari+2.5.0
IBM official Technical Forum: https://www.ibm.com/developerworks/cn/opensource/
Ambari 2.2.2 download Resources
OS Format URL
Http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.2.2.0
Http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.2.2.0/ambari.repo
Http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.2.2.0/ambari-2.2.2.0-centos7.tar.gz
HDP 2.4.2 download Resources
Http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.4.2.0
Http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.4.2.0/hdp.repo
Http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.4.2.0/HDP-2.4.2.0-centos7-rpm.tar.gz
Http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/centos7
Http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/centos7/HDP-UTILS-1.1.0.20-centos7.tar.gz
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.