In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
HadoopHA
Data type
Structured data RDMS
Unstructured data with the help of algorithm goole pagerank
The semi-structured algorithm xmljson is carried out through tags.
In general, we can improve the data processing efficiency through the parallel processing mechanism, but the loading of massive data is very slow, so we need to use the distributed file system only need to load from the local disk, so the speed is very fast.
The functions of the hadoop cluster mapreduce framework should be written by yourself.
In a hadoopHA cluster, there are usually two different machines acting as NameNode. Only one machine is in the Active state at any one time; the other machine is in the Standby state.
There are mainly two highly available documents in this film.
Hdfs and yarn
Hdfs is
In this document
Server1server5 as NN and RMserver2,server3,server4 as DNZK and DN.
This document starts with zookeeper. For basic configuration, please refer to http://12237658.blog.51cto.com.
Errors in this lab
1. Memory partition should be sufficient to enable a total of 5 virtual machines this time, server1 and server5 should not be less than 768.
2. In the process of reformatting, the first formatting needs to start the log process sbin/hadoop.daem.sh start journalnode.
[hadoop@server2/3/4 hadoop] $sbin/hadoop-daemon.sh start journalnode
3. During the reboot process, the zookeeper will fail to pick up the service. In this case, it is usually the error of the myid or the port is occupied.
4. Namenode or datanode may not occur when starting the hdfs service. At this time, it is generally necessary to check the log information and make changes according to the log information.
5. Try not to recharge the service many times because it will cause process confusion, such as the process number of zookeeper is inconsistent with the process number in its file. [hadoop@server3 zookeeper] $cat zookeeper_server.pid 1236
Startup of zookeeper
Download the tar package and extract it to hadoop's home directory
[hadoop@server3 ~] $ls
Hadoop java zookeeper zookeeper-3.4.9.tar.gz
Hadoop-2.7.3 jdk1.7.0_79 zookeeper-3.4.9
Modify the configuration file
[hadoop@server3 conf] $cat zoo.cfg
# The number of milliseconds of each tick
All time in zk, a time unit in tickTime=2000//ZK, is based on this time to adjust the heartbeat and timeout in milliseconds.
# The number of ticks that the initial
# synchronization phase can take
During the startup process, initLimit=10//Follower synchronizes all the latest data from Leader and then determines the starting status of its external service.
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
SyncLimit=5 / / during operation, Leader is responsible for communicating with all the machines in the ZK cluster, such as checking the survival status of the machines through some heartbeat detection mechanism.
# the directory where the snapshot is stored.
# do not use / tmp for storage, / tmp here is just
# example sakes.
DataDir=/tmp/zookeeper / / the directory where the snapshot file snapshot is stored. By default, transaction logs are also stored here. It is recommended to configure the parameter dataLogDir at the same time. The write performance of the transaction log directly affects the performance of zk. In a production environment, this directory should be a stable directory rather than under tmp because tmp will be cleared after a certain period of time.
# the port at which the clients will connect
ClientPort=2181 / / the port where the client connects to the server, that is, the external service port
# the maximum number of client connections.
# increase this if you need to handle more clients
# maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
# autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
# autopurge.purgeInterval=1
Server.1=172.25.33.2:2888:3888
Server.2=172.25.33.3:2888:3888
Server.3=172.25.33.4:2888:3888
The number of server should be the same as the number of myid. Myid is under dataDir.
Echo "1" > / tmp/zookeeper/myid....
Start zookeeper//2,3,4. Start up.
[hadoop@server3 zookeeper] $echo "2" > / tmp/zookeeper/myid
[hadoop@server3 zookeeper] $bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: / home/hadoop/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper... STARTED
[hadoop@server3 zookeeper] $bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: / home/hadoop/zookeeper/bin/../conf/zoo.cfg
Mode: follower
Configure hdfs for hadoop
Cat core-site.xml
Fs.defaultFS
Hdfs://master
Ha.zookeeper.quorum
172.25.33.2:2181172.25.33.3:2181172.25.33.3:2181172.25.33.4:2181
Cat hdfs-site.xml
Dfs.nameservices
Masters
Dfs.ha.namenodes.masters
H2,h3
Dfs.namenode.rpc-address.masters.h2
172.25.33.1:9000
Dfs.namenode.http-address.masters.h2
172.25.33.1:50070
Dfs.namenode.rpc-address.masters.h3
172.25.33.5:9000
Dfs.namenode.http-address.masters.h3
172.25.33.5:50070
Dfs.namenode.shared.edits.dir
Qjournal://172.25.33.2:8485;172.25.33.3:8485;172.25.33.4:8485/masters
Dfs.journalnode.edits.dir
/ tmp/journaldata
Dfs.ha.automatic-failover.enabled
True
Dfs.client.failover.proxy.provider.masters
Org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
Dfs.ha.fencing.methods
Sshfence
Shell (/ bin/true)
Dfs.ha.fencing.ssh.private-key-files
/ home/hadoop/.ssh/id_rsa
Dfs.ha.fencing.ssh.connect-timeout
30000
[hadoop@server1 hadoop] $cat slaves
172.25.33.3
172.25.33.4
172.25.33.2
At this point, the highly available configuration file for hdfs has been completed.
The journal service of the server2,3,4 node should be turned on before formatting hdfs because hdfs communicates with port 8485 of 2memo 3re4 when formatting.
[hadoop@server2 hadoop] $sbin/hadoop-daemon.sh start journalnode
Starting journalnode, logging to / home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server2.example.com.out
[hadoop@server2 hadoop] $jps
2629 JournalNode
2398 QuorumPeerMain
2667 Jps
Seeing the jps process in these three nodes means that Hdfs can be formatted.
[hadoop@server1 hadoop] $bin/hdfs namenode-format
[hadoop@server1 hadoop] $scp-r / tmp/hadoop-hadoop/ 172.25.33.5:/tmp
Fsp_w_picpath_0000000000000000000 100% 353 0.3KB/s 00:00
Fsp_w_picpath_0000000000000000000.md5 100% 62 0.1KB/s 00:00
Seen_txid 100% 2 0.0KB/s 00:00
VERSION
Format zookeeper (just execute on H2)
[hadoop@server1 hadoop] $bin/hdfs zkfc-formatZK
Then you can start the service.
At this point, you need to avoid password authentication, so it's time to make ssh-keygen quickly. Sometimes it is necessary to use ssh to try to connect each node to break yes.
[hadoop@server1 hadoop] $sbin/start-dfs.sh
Detect [hadoop@server1/5 hadoop] $jps
11212 DFSZKFailoverController
11482 Jps
10883 NameNode
Access is normal through 50070 of the web page
172.25.33.150070
[hadoop@server1 hadoop] $jps
11212 DFSZKFailoverController
11482 Jps
10883 NameNode
[hadoop@server1 hadoop] $kill-9 10883
172.25.33.550070
Verify that the high availability of successful hdfs has been completed. Configure the high availability of yarn below.
Apache Hadoop YARN Yet Another Resource Negotiator another resource coordinator is a new Hadoop resource manager, which is a general resource management system that provides unified resource management and scheduling for upper-level applications. Its introduction has brought great benefits to the cluster in terms of utilization, unified resource management and data sharing.
Cat mapred-site.xml
Mapreduce.framework.name
Yarn
Cat yarn-site.xml
Yarn.nodemanager.aux-services
Mapreduce_shuffle
Yarn.resourcemanager.ha.enabled
True
Yarn.resourcemanager.cluster-id
RM_CLUSTER
Yarn.resourcemanager.ha.rm-ids
Rm1,rm2
Yarn.resourcemanager.hostname.rm1
172.25.33.1
Yarn.resourcemanager.hostname.rm2
172.25.33.5
Yarn.resourcemanager.recovery.enabled
True
Yarn.resourcemanager.store.class
Org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
Yarn.resourcemanager.zk-address
172.25.33.2:2181172.25.33.3:2181172.25.33.4:2181
Start the yarn service when the configuration file is complete.
[hadoop@server1 hadoop] $sbin/start-yarn.sh
[hadoop@server1 hadoop] $jps
13244 ResourceManager
11212 DFSZKFailoverController
13330 Jps
12814 NameNode
[hadoop@server5 ~] $jps
1598 NameNode
1904 Jps
1695 DFSZKFailoverController
There is no RM on another node and needs to be started manually
[hadoop@server5 hadoop] $sbin/yarn-daemon.sh start resourcemanager
Starting resourcemanager, logging to / home/hadoop/hadoop-2.7.3/logs/yarn-hadoop-resourcemanager-server5.example.com.out
[hadoop@server5 hadoop] $jps
1598 NameNode
2018 Jps
1695 DFSZKFailoverController
1979 ResourceManager
172.25.33.18088
Fail-over
[hadoop@server1 hadoop] $jps
13244 ResourceManager
11212 DFSZKFailoverController
12814 NameNode
13923 Jps
[hadoop@server1 hadoop] $kill-9 13244
172.25.33.58088
Use the command line of zookeeper for testing
[hadoop@server3 zookeeper] $bin/zkCli.sh-server 127.0.0.1 purl 2181
[zk: 127.0.0.1 get 2181 (CONNECTED) 0] get / yarn-leader-election/RM_CLUSTER/ActiveBreadCrumb
RM_CLUSTERrm2
CZxid = 0x100000020
Ctime = Tue Mar 07 22:05:16 CST 2017
MZxid = 0x100000053
Mtime = Tue Mar 07 22:09:58 CST 2017
PZxid = 0x100000020
Cversion = 0
DataVersion = 1
AclVersion = 0
EphemeralOwner = 0x0
DataLength = 17
NumChildren = 0
[zk: 127.0.0.1 quit 2181 (CONNECTED) 1]
Quitting...
At the end of 5, the service RM switches to 1.
ActiveBreadCrumb ActiveStandbyElectorLock
[zk: 127.0.0.1 get 2181 (CONNECTED) 0] get / yarn-leader-election/RM_CLUSTER/ActiveBreadCrumb
RM_CLUSTERrm1
CZxid = 0x100000020
Ctime = Tue Mar 07 22:05:16 CST 2017
MZxid = 0x10000008b
Mtime = Tue Mar 07 22:15:14 CST 2017
PZxid = 0x100000020
Cversion = 0
DataVersion = 2
AclVersion = 0
EphemeralOwner = 0x0
DataLength = 17
NumChildren = 0
[zk: 127.0.0.1 2181 (CONNECTED) 1]
A successful test indicates that the high availability of yarn has been completed.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.