How to deploy hadoop2.7.3+HA+YARN+zookeeper High availability Cluster 04/28 Update SLTechnology News&Howtos

How to deploy hadoop2.7.3+HA+YARN+zookeeper High availability Cluster

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces the knowledge of "how to deploy hadoop2.7.3+HA+YARN+zookeeper high availability clusters". In the operation of practical cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

1. Installation version: JDK1.8.0_111-b14hadoophadoop-2.7.3zookeeperzookeeper-3.5.2 2. Installation steps:

The installation of JDK and the dependent environment configuration of the cluster are no longer described.

1. Hadoop configuration

Hadoop configuration mainly involves four hdfs-site.xml,core-site.xml,mapred-site.xml,yarn-site.xml files. The configuration of each file is described in detail below.

The logical name of the configuration fs.defaultFS hdfs://cluster1 HDFS namenode of core-site.xml, that is, namenode HA. This value corresponds to the default placement path of namenode and datanode in dfs.nameservices hadoop.tmp.dir / usr/hadoop/tmp hdfs in hdfs-site.xml. You can also specify the address and port of the ha.zookeeper.quorum master:2181,salve1:2181,salve2:2181 zookeeper cluster in hdfs-site.xml. The number of nodes in a zookeeper cluster must be odd

Configuration of hdfs-site.xml (key configuration) dfs.name.dir / usr/hadoop/hdfs/name namenode data placement directory dfs.data.dir / usr/hadoop/hdfs/data datanode data placement directory dfs.replication 4 data block backups, default is the logical name of 3 dfs.nameservices cluster1 HDFS namenode That's namenode HA dfs.ha.namenodes.cluster1 ns1. Namenode logical name corresponding to ns2 nameservices dfs.namenode.rpc-address.cluster1.ns1 master:9000 specifies the rpc address and port of namenode (ns1) dfs.namenode.http-address.cluster1.ns1 master:50070 specifies the web address of namenode (ns1) and port dfs.namenode.rpc-address.cluster1.ns2 salve1:9000 specifies the address of namenode (ns2) Rpc address and port dfs.namenode.http-address.cluster1.ns2 salve1:50070 specifies the web address and port dfs.namenode.shared.edits.dir qjournal://master:8485 of namenode (ns2) Salve1:8485;salve2:8485/cluster1 this is the uri,active NN of the NameNode read-write JNs group that writes editlog to these JournalNode, while standby NameNode reads these editlog and acts on a directory on the node where dfs.journalnode.edits.dir / usr/hadoop/journal ournalNode resides in the in-memory directory tree to store editlog and other state information. Dfs.ha.automatic-failover.enabled true starts automatic failover. Automatic failover relies on the zookeeper cluster and ZKFailoverController (ZKFC), which is a zookeeper client that monitors NN status information. Each node running NN must run a zkfc dfs.client.failover.proxy.provider.cluster1 org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider configuration HDFS client to connect to a java class dfs.ha.fencing.methods sshfence of Active NameNode to solve the HA cluster brain fissure problem (that is, two master provide services at the same time, resulting in an inconsistent state of the system). In HDFS HA, JournalNode only allows one NameNode to write data, and there will be no problem of two active NameNode. However, when the master / slave switches, the previous active NameNode may still be processing the client's RPC request. For this reason, you need to add the isolation mechanism (fencing) to kill the previous active NameNode. The commonly used fence method is sshfence, which specifies the key dfs.ha.fencing.ssh.private-key-files used for ssh communication and the key dfs.ha.fencing.ssh.connect-timeout 30000 connection timeout used for dfs.ha.fencing.ssh.private-key-files / home/hadoop/.ssh/id_rsa ssh communication

The configuration of mapred-site.xml mapreduce.framework.name yarn specifies that the environment in which mapreduce is run is yarn, which is completely different from hadoop1 where the logs managed by mapreduce.jobhistory.address master:10020 MR JobHistory Server are stored. Mapreduce.jobhistory.webapp.address master:19888 view the web address of the Mapreduce job records that have been run by the history server. You need to start this service to store logs managed by mapreduce.jobhistory.done-dir / data/hadoop/done MR JobHistory Server. Default: / mr-history/done mapreduce.jobhistory.intermediate-done-dir hdfs://mycluster-pha/mapred/tmp MapReduce job generated logs storage location. Default: / mr-history/tmp

Configuration of yarn-site.xml yarn.nodemanager.aux-services mapreduce_shuffle default yarn.nodemanager.auxservices.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.resourcemanager.address master:8032 yarn.resourcemanager.scheduler.address master:8030 yarn.resourcemanager.resource-tracker.address Master:8031 yarn.resourcemanager.admin.address master:8033 yarn.resourcemanager.webapp.address master:8088 yarn.nodemanager.resource.memory-mb 1024 when this value is configured less than 1024 NM cannot be started! Will report an error: NodeManager from slavenode2 doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the NodeManager. 2.zookeeper configuration

The configuration of zookeeper is mainly zoo.cfg and myid files.

Conf/zoo.cfg configuration: change zoo_sample.cfg to zoo.cfgcp zoo_sample.cfg zoo.cfg first

Vi zoo.cfgdataDir: the placement path of the data dataLogDir:log

InitLimit=10syncLimit=5clientPort=2181tickTime=2000dataDir=/usr/zookeeper/tmp/datadataLogDir=/usr/zookeeper/tmp/logserver.1=master:2888:3888server.2=slave1:2888:3888server.3=slave2:2888:3888

Create a new file myidvi myid in the dataDir directory of the [master,slave1,slave2] node

Master node editing: 1

Slave1 node editing: 2

Slave2 node editing: 3

As follows:

[hadoop@master data] $vi myid 1. Start the cluster 1.zookeeper cluster startup 1. Start the zookeeper cluster and start bin/zkServer.sh start2 on all three nodes. Check the cluster zookeeper status: zkServer.sh status, one learer and two follower. [hadoop@master hadoop-2.7.3] $zkServer.sh statusZooKeeper JMX enabled by defaultUsing config: / usr/local/zookeeper-3.5.2-alpha/bin/../conf/zoo.cfgClient port found: 2181. Client address: localhost.Mode: follower [hadoop@slave1 root] $zkServer.sh statusZooKeeper JMX enabled by defaultUsing config: / usr/local/zookeeper-3.5.2-alpha/bin/../conf/zoo.cfgClient port found: 2181. Client address: localhost.Mode: leader [hadoop@slave2 root] $zkServer.sh statusZooKeeper JMX enabled by defaultUsing config: / usr/local/zookeeper-3.5.2-alpha/bin/../conf/zoo.cfgClient port found: 2181. Client address: localhost.Mode: follower3. Verify the zookeeper (optional): execute zkCli.sh [hadoop@slave1 root] $zkCli.shConnecting to localhost:21812016-12-18 002 hadoop@slave1 root 05VR 03115 [myid:]-INFO [main:Environment@109]-Client environment:zookeeper.version=3.5.2-alpha-1750793 Built on 06 Client environment:java.version=1.8.0_1112016 30 built on 13:15 2016 GMT2016-12-18 02 INFO [main:Environment@109]-Client environment:host.name=salve12016-12-18 02 * 05main:Environment@109 03120 [myid:]-INFO [main:Environment@109]-Client environment:java.home=/usr/local/jdk1.8.0_111/jre2016-12-18 02VOR 05main:Environment@109 03120 [myid:]-INFO [main:Environment@109]-Client environment:java.class.path=/usr/local/zookeeper-3.5.2-alpha/bin/../build/classes:/usr/local/zookeeper-3.5.2-alpha/bin/../build/ Lib/*.jar:/usr/local/zookeeper-3.5.2-alpha/bin/../lib/slf4j-log4j12-1.7.5.jar:/usr/local/zookeeper-3.5.2-alpha/bin/../lib/slf4j-api-1.7.5.jar:/usr/local/zookeeper-3.5.2-alpha/bin/../lib/servlet-api-2.5-20081211.jar:/usr/local/zookeeper-3.5.2 -alpha/bin/../lib/netty-3.10.5.Final.jar:/usr/local/zookeeper-3.5.2-alpha/bin/../lib/log4j-1.2.17.jar:/usr/local/zookeeper-3.5.2-alpha/bin/../lib/jline-2.11.jar:/usr/local/zookeeper-3.5.2-alpha/bin/../lib/jetty-util-6.1.26.jar:/usr / local/zookeeper-3.5.2-alpha/bin/../lib/jetty-6.1.26.jar:/usr/local/zookeeper-3.5.2-alpha/bin/../lib/javacc.jar:/usr/local/zookeeper-3.5.2-alpha/bin/../lib/jackson-mapper-asl-1.9.11.jar:/usr/local/zookeeper-3.5.2-alpha/bin/../lib/jackson-core-asl- 1.9.11.JarVERVERVERGUTHAUR local.ZOOKEPERN 3.5.2MULTALHARABING. JarVERGER clips 1.2.jarVERUBING . / conf:.:/usr/local/jdk1.8.0_111/lib/dt.jar:/usr/local/jdk1.8.0_111/lib/tools.jar:/usr/local/zookeeper-3.5.2-alpha/bin:/usr/local/hadoop-2.7.3/bin2016-12-18 002 INFO 05lo 03120 [myid:]-INFO [main:Environment@109]-Client environment:java.library.path=/usr/java/packages/lib/amd64: / usr/lib64:/lib64:/lib:/usr/lib2016-12-18 02 Client environment:java.io.tmpdir=/tmp2016 03121 [myid:]-INFO [main:Environment@109]-Client environment:java.io.tmpdir=/tmp2016-12-18 02 [myid:]-INFO [main:Environment@109]-Client environment:java.compiler=2016-12-18 02 [myid:]-INFO [main:Environment@109]-Client environment:os.name=Linux2016-12-18 02 05main:Environment@109 03121 [myid:]-INFO [main:Environment@109]-Client environment:os.arch=amd642016-12-1802Vl05Vl03121 [myid:]-INFO [main:Environment@109]-Client environment:os.version=3.10.0-327.22.2.el7.x86_642016-12-1802RV 05RV 03121 [myid:]-INFO [main:Environment@109]-Client environment:user.name=hadoop2016-12-18 02RV 05RV 03121 [myid:] -INFO [main:Environment@109]-Client environment:user.home=/home/hadoop2016-12-18 002 myid [myid:]-INFO [main:Environment@109]-Client environment:user.dir=/tmp/hsperfdata_hadoop2016-12-18 02VOV 05PLO 03121 [myid:]-INFO [main:Environment@109]-Client environment:os.memory.free=52MB2016-12-18 02VOV 05RH 03123 [myid:]-INFO [main:Environment@109] -Client environment:os.memory.max=228MB2016-12-18 02 main:Environment@109 05V 03123 [myid:]-INFO [main:Environment@109]-Client environment:os.memory.total=57MB2016-12-18 02 V 05V 03146 [myid:]-INFO [main:ZooKeeper@855]-Initiating client connection ConnectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@593634adWelcome to myid:localhost:2181-myid:localhost:2181-INFO [main-SendThread (localhost:2181): ClientCnxn$SendThread@1113]-Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) JLine support is enabled2016-12-18 02 JLine support is enabled2016 05JLine support is enabled2016 03243 [myid:localhost:2181]-INFO [main-SendThread (localhost:2181): ClientCnxn$SendThread@948]-Socket connection established, initiating session, client: / 127.0.0.1 JLine support is enabled2016: / 127.0.0.1 Server: localhost/127.0.0.1:21812016-12-18 002 myid:localhost:2181-INFO [main-SendThread (localhost:2181): ClientCnxn$SendThread@1381]-Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x200220f5fe30060, negotiated timeout = 30000WATCHER::WatchedEvent state:SyncConnected type:None path:null [zk: localhost:2181 (CONNECTED) 0] 2.hadoop cluster starts 1. Configuration startup for the first time

1.1 start Journalnode deamons on three nodes, then jps, and the JournalNode process appears.

Sbin/./hadoop-daemon.sh start journalnodejpsJournalNode

Format the namenode on the master (any one), and then start the namenode for that node.

Bin/hdfs namenode-formatsbin/hadoop-daemon.sh start namenode

1.3 synchronize metadata information on master on another namenode node slave1

Bin/hdfs namenode-bootstrapStandby

1.4 stop all services on hdfs

Sbin/stop-dfs.sh

1.5 initialize zkfc

Bin/hdfs zkfc-formatZK

1.6 start hdfs

Sbin/start-dfs.sh

1.7 start yarn

Sbin/start-yarn.sh2. Non-first configuration startup

2.1 just start hdfs and yarn directly, and namenode, datanode, journalnode and DFSZKFailoverController will all start automatically.

Sbin/start-dfs.sh

2.2 start yarn

Sbin/start- yarn.sh 4. Check the process of each node 4.1master [hadoop@master hadoop-2.7.3] $jps26544 QuorumPeerMain25509 JournalNode25704 DFSZKFailoverController26360 Jps25306 DataNode25195 NameNode25886 ResourceManager25999 NodeManager4.2slave1 [hadoop@slave1 root] $jps2289 DFSZKFailoverController9400 QuorumPeerMain2601 Jps2060 DataNode2413 NodeManager2159 JournalNode1983 NameNode4.3slave2 [hadoop@slave2 root] $jps11984 DataNode12370 Jps2514 QuorumPeerMain12083 JournalNode12188 NodeManager "how to deploy hadoop2.7.3+HA+YARN+zookeeper high availability clusters". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.