In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Because the spark environment needs to be deployed, a tested hadoop cluster has been specially reinstalled. The relevant steps are now recorded as follows:
Hardware environment: four virtual machines, hadoop1~hadoop4,3G memory, 60G hard disk, 2-core CPU
Software environment: CentOS6.5,hadoop-2.6.0-cdh6.8.2,JDK1.7
Deployment planning:
Hadoop1 (192.168.0.3): namenode (active), resourcemanager
Hadoop2 (192.168.0.4): namenode (standby), journalnode, datanode, nodemanager, historyserver
Hadoop3 (192.168.0.5): journalnode, datanode, nodemanager
Hadoop4 (192.168.0.6): journalnode, datanode, nodemanager
The HA of HDFS adopts the QJM method (journalnode):
I. system preparation
1. Selinux is closed for each unit.
# vi / etc/selinux/config
SELINUX=disabled
2. Turn off the firewall for each machine (be sure to turn it off, otherwise you will report an error when formatting hdfs and cannot connect to journalnode)
# chkconfig iptables off
# service iptables stop
3. Install jdk1.7 on each machine
# cd / software
# tar-zxf jdk-7u65-linux-x64.gz-C / opt/
# cd / opt
# ln-s jdk-7u65-linux-x64.gz java
# vi / etc/profile
Export JAVA_HOME=/opt/java
Export PATH=$PATH:$JAVA_HOME/bin
4. Establish hadoop-related users on each machine and establish mutual trust
# useradd grid
# passwd grid
(brief steps for establishing mutual trust)
5. Set up relevant catalogs for each machine
# mkdir-p / hadoop_data/hdfs/name
# mkdir-p / hadoop_data/hdfs/data
# mkdir-p / hadoop_data/hdfs/journal
# mkdir-p / hadoop_data/yarn/local
# chown-R grid:grid / hadoop_data
II. Hadoop deployment
The main purpose of HDFS HA is to specify nameservices (if you don't do HDFS ferderation, there will be only one ID), as well as the two namenode under that nameserviceID and their addresses. The nameservice name here is set to hadoop-spark
1. Each machine decompresses the hadoop package
# cd / software
# tar-zxf hadoop-2.6.0-cdh6.8.2.tar.gz-C / opt/
# cd / opt
# chown-R grid:grid hadoop-2.6.0-cdh6.8.2
# ln-s hadoop-2.6.0-cdh6.8.2 hadoop
2. Switch to grid to continue the operation.
# su-grid
$cd / opt/hadoop/etc/hadoop
3. Configure hadoop-env.sh (actually only configure JAVA_HOME)
$vi hadoop-env.sh
# The java implementation to use.
Export JAVA_HOME=/opt/java
4. Set hdfs-site.xml
Dfs.replication1dfs.nameserviceshadoop-sparkComma-separated list of nameservices.dfs.ha.namenodes.hadoop-sparknn1,nn2The prefix for a given nameservice Contains a comma-separatedlist of namenodes for a given nameservice (eg EXAMPLENAMESERVICE). Dfs.namenode.rpc-address.hadoop-spark.nn1hadoop1:8020RPC address for nomenode1 of hadoop-sparkdfs.namenode.rpc-address.hadoop-spark.nn2hadoop2:8020RPC address for nomenode2 of hadoop-sparkdfs.namenode.http-address.hadoop-spark.nn1hadoop1:50070The address and the base port where the dfs namenode1 web ui will listen on.dfs.namenode.http-address.hadoop-spark.nn2hadoop2:50070The address and the base port where the dfs namenode2 web ui will listen on.dfs. Namenode.name.dir file:///hadoop_data/hdfs/nameDetermines where on the local filesystem the DFS name nodeshould store the name table (fsp_w_picpath). If this is a comma-delimited listof directories then the name table is replicated in all of thedirectories, for redundancy. Dfs.namenode.shared.edits.dirqjournal://hadoop2:8485;hadoop3:8485;hadoop4:8485/hadoop-sparkA directory on shared storage between the multiple namenodesin an HA cluster. This directory will be written by the active and readby the standby in order to keep the namespaces synchronized. This directorydoes not need to be listed in dfs.namenode.edits.dir above. It should beleft empty in a non-HA cluster.dfs.datanode.data.dir file:///hadoop_data/hdfs/dataDetermines where on the local filesystem an DFS data nodeshould store its blocks. If this is a comma-delimitedlist of directories, then data will be stored in all nameddirectories, typically on different devices.Directories that do not exist are ignored. Dfs.client.failover.proxy.provider.hadoop-spark org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProviderdfs.ha.automatic-failover.enabledfalseWhether automatic failover is enabled. See the HDFS HighAvailability documentation for details on automatic HAconfiguration.dfs.journalnode.edits.dir/hadoop_data/hdfs/journal
5. Configure core-site.xml (configure fs.defaultFS, use the nameservices name of HA)
Fs.defaultFShdfs://hadoop-sparkThe name of the default file system. A URI whosescheme and authority determine the FileSystem implementation. Theuri's scheme determines the config property (fs.SCHEME.impl) namingthe FileSystem implementation class. The uri's authority is used todetermine the host, port, etc. For a filesystem.
6. Configure mapred-site.xml
Mapreduce.framework.nameyarnThe runtime framework for executing MapReduce jobs.Can be one of local, classic or yarn.mapreduce.jobhistory.addresshadoop2:10020MapReduce JobHistory Server IPC host:portmapreduce.jobhistory.webapp.addresshadoop2:19888MapReduce JobHistory Server Web UI host:port
7. Configure yarn-site.xml
The hostname of the RM.yarn.resourcemanager.hostnamehadoop1The address of the applications manager interface in the RM.yarn.resourcemanager.address$ {yarn.resourcemanager.hostname}: 8032The address of the scheduler interface.yarn.resourcemanager.scheduler.address$ {yarn.resourcemanager.hostname}: 8030The http address of the RM web application.yarn.resourcemanager.webapp.address$ {yarn.resourcemanager.hostname}: 8088The https adddress of the RM web application.yarn.resourcemanager.webapp.https.address$ {yarn.resourcemanager.hostname}: 8090yarn.resourcemanager.resourceMurtracker.address$ {yarn.resourcemanager. Hostname}: 8031The address of the RM admin interface.yarn.resourcemanager.admin.address$ {yarn.resourcemanager.hostname}: 8033The class to use as the resource scheduler.yarn.resourcemanager.scheduler.classorg.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerfair-scheduler conf locationyarn.scheduler.fair.allocation.file$ {yarn.home.dir} / etc/hadoop/fairscheduler.xmlList of directories to store localized files in. Anapplication's localized file directory willbe found in:$ {yarn.nodemanager.local-dirs} / usercache/$ {user} / appcache/application_$ {appid} .Individual containers' work directories, called container_$ {contid}, willbe subdirectories of this.yarn.nodemanager.local-dirs/hadoop_data/yarn/localWhether to enable log aggregationyarn.log-aggregation-enabletrueWhere to aggregate logs to.yarn.nodemanager.remote-app-log-dir/tmp/logsAmount of physical memory, in MB That can be allocatedfor containers.yarn.nodemanager.resource.memory-mb2048Number of CPU cores that can be allocatedfor containers.yarn.nodemanager.resource.cpu-vcores2the valid service name should only contain a-zA-Z0-9 _ and can not start with numbersyarn.nodemanager.aux-servicesmapreduce_shuffle
8. Configure slaves
Hadoop2
Hadoop3
Hadoop4
9. Configure fairscheduler.xml
0mb, 0 vcores 6144 mb, 6 vcores 503001.0grid
10. Synchronize configuration files to each node
$cd / opt/hadoop/etc
$scp-r hadoop hadoop2:/opt/hadoop/etc/
$scp-r hadoop hadoop3:/opt/hadoop/etc/
$scp-r hadoop hadoop4:/opt/hadoop/etc/
Start the cluster (format the file system)
1. Establish environmental variables
$vi ~ / .bash_profile
Export HADOOP_HOME=/opt/hadoop
Export YARN_HOME_DIR=/opt/hadoop
Export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
Export YARN_CONF_DIR=/opt/hadoop/etc/hadoop
2. Start HDFS
Start journalnode first, on hadoop2~hadoop4:
$cd / opt/hadoop/
$sbin/hadoop-daemon.sh start journalnode
Format HDFS, and then start namenode. On hadoop1:
$bin/hdfs namenode-format
$sbin/hadoop-daemon.sh start namenode
Synchronize another namenode and start it. On hadoop2:
$bin/hdfs namenode-bootstrapStandby
$sbin/hadoop-daemon.sh start namenode
In this case, both namenode are in standby state, so switch hadoop1 to active (hadoop1 corresponds to nn1 in hdfs-site.xml):
$bin/hdfs haadmin-transitionToActive nn1
Start datanode. On hadoop1 (namenode of active):
$sbin/hadoop-daemons.sh start datanode
Note: after starting, you only need to use sbin/start-dfs.sh. However, since there is no failover configured for zookeeper, HA can only be switched manually. So every time you start HDFS, execute $bin/hdfs haadmin-transitionToActive nn1 to make the namenode of hadoop1 active.
2. Start yarn
On hadoop1 (resourcemanager):
$sbin/start-yarn.sh
-
The HDFS HA configured above does not fail over automatically. If you configure HDFS for automatic failover, you need to add the following steps (stop the cluster first):
1. To deploy zookeeper, omit the steps. Deploy in hadoop2, hadoop3, hadoop4, and launch
2. Add: to hdfs-site.xml:
Dfs.ha.automatic-failover.enabled true
Dfs.ha.fencing.methods sshfence dfs.ha.fencing.ssh.private-key-files / home/exampleuser/.ssh/id_rsa
For a detailed explanation, see the official documentation. This configuration sets the fencing method to close the port of the previous active node through ssh. As long as the first two namenode can SSH each other.
There is another way to configure:
Dfs.ha.automatic-failover.enabled true
Dfs.ha.fencing.methods shell (/ path/to/my/script.sh arg1 arg2...)
This configuration actually uses shell to isolate ports and programs. If you do not want to take the actual action, dfs.ha.fencing.methods can be configured as shell (/ bin/true)
3. Add: in core-site.xml:
Ha.zookeeper.quorum hadoop2:2181,hadoop3:2181,hadoop4:2181
4. Initialize zkfc (execute on namenode)
Bin/hdfs zkfc-formatZK
5. Start the cluster
_
Zkfc: runs on every namenode, is the client of zk, and is responsible for automatic failover
Zk: odd number of nodes, maintaining consistency locks, responsible for electing active nodes
Joural node: odd number of nodes for data synchronization between active and standby nodes. The active node writes data to these nodes, and the standby node reads
-
Change to resourcemanager HA:
Select hadoop2 as another rm node
1. Set hadoop2 for mutual trust with other nodes
2. Compile yarn-site.xml and synchronize it to other machines
3. Copy fairSheduler.xml to hadoop2
4. Start rm
5. Start another rm
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.