Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

HA of hadoop

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

1. Why build HA?

   before hadoop2.x, NameNode had a single point of failure (SPOF:A Single Point of Failure) in the HDFS cluster. For a cluster with only one NameNode, if the NameNode machine fails (such as downtime or software or hardware upgrade), the entire cluster will not be available. You will have to wait until NameNode is restarted before providing services. This method is absolutely not allowed in the generation environment.

HA of    HDFS: solve the above problem by configuring two NameNodes of Active/Standby to achieve hot backup of NameNode in the cluster. If there is a failure, such as the machine crashes or the machine needs to be upgraded and maintained, the NameNode can be quickly switched to another machine in this way.

2. How does HA work?

Explanation (data consistency and persistence issues):

When the hdfs client connects to the namenode and the namenode starts, the edtis log (operation logging) is submitted to the JN cluster. At this point, in the JN cluster, when half of the cluster services successfully receive the message and return it to namenode, it means that the edits log is uploaded successfully. At this point, NameNode (standBy) needs to retrieve the metadata information from the JN cluster, then integrate the new fsimage, and then push it back to NameNode (active). When NameNode (standBy) requests data from JN, it will first check whether there are any downtime machines in JN, using more than half mechanism (when some of the servers in the cluster are down, the cluster will take most of the available machines as primary, and all other machines will be out of service). To provide external data transmission to the standby (standBy). When resolving block block information, datanode sends block block information to both namenode.

Explanation (to solve the problem of switching between master and slave): at this time, you need to use another cluster-zookeeper,zookeeper cluster is a highly available cluster. Its implementation mechanism is: first, a server provides external services, and other machines are standby machines. When the host goes down, the voting mechanism (logical clock, ID number, and data update degree) is adopted in the zookeeper cluster to select a new master. At this point, the ZKFC service control thread grabs the zookeeper cluster with one hand and the NameNode with the other. Grasp the service control of NameNode (active), monitor the status of NameNode (active), and report to the zookeeper cluster in real time. Live in the service control of NameNode (standBy) and receive the information sent by the zookeeper cluster. If the message sent indicates that the service of the master NameNode has stopped, call the callback function in the service control immediately to make NameNode (standBy) become the host and continue to provide services. The function of  is the same as that of keeplive. The difference in using zookeeper is that if an abnormal exit of a process occurs at the service control, the slave machine will be changed to active status through the zookeeper cluster, and the host may still be active, resulting in the NameNode of two active. In zookeeper, it is solved like this: when a service control process exits abnormally, it will have an invisible hand to connect to another NameNode, and within a controllable range, turn it into an active, and turn the NameNode that he cannot control into a standby. 3. How to build a HA cluster?

Preparation before setting up a cluster: https://blog.51cto.com/14048416/2341450

Construction of zookeeper Cluster: https://blog.51cto.com/14048416/2336178

1) Cluster planning

2) specific installation steps: 1) upload the installation package hadoop-2.6.5-centos-6.7.tar.gz2) decompress to the corresponding installation directory

[hadoop@hadoop01] $tar-zxvf hadoop-2.6.5-centos-6.7.tar.gz-C / home/hadoop/apps/

3) modify the configuration file

Hadoo-env.sh:

Join: export JAVA_HOME= / usr/local/jdk1.8.0_73

Core-site.xml:

Fs.defaultFShdfs://myha01/hadoop.tmp.dir/home/hadoop/data/hadoopdata/ha.zookeeper.quorumhadoop01:2181,hadoop02:2181,hadoop03:2181

Hdfs-site.xml:

Dfs.replication 3 dfs.nameservices myha01 dfs.ha.namenodes.myha01 nn1,nn2 dfs.namenode.rpc-address.myha01.nn1 hadoop01:9000 dfs.namenode.http-address.myha01.nn1 hadoop01:50070 dfs.namenode.rpc-address.myha01.nn2 hadoop02:9000 dfs.namenode.http-address.myha01.nn2 hadoop02:50070 dfs.namenode.shared.edits.dirqjournal://hadoop01:8485;hadoop02:8485 Hadoop03:8485/myha01 dfs.journalnode.edits.dir / home/hadoop/data/journaldata dfs.ha.automatic-failover.enabled true dfs.client.failover.proxy.provider.myha01org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider dfs.ha.fencing.methods sshfence shell (/ bin/true) dfs.ha.fencing.ssh.private-key-files / home/hadoop/.ssh/id_rsa dfs.ha.fencing.ssh.connect-timeout 30000

Mapred-site.xml:

Mapreduce.framework.name yarnmapreduce.jobhistory.addresshadoop02:10020mapreduce.jobhistory.webapp.addresshadoop02:19888

Yarn-site.xml:

Yarn.resourcemanager.ha.enabled true yarn.resourcemanager.cluster-id yrc yarn.resourcemanager.ha.rm-ids rm1,rm2 yarn.resourcemanager.hostname.rm1 hadoop01 yarn.resourcemanager.hostname.rm2 hadoop02 yarn.resourcemanager.zk-address hadoop01:2181,hadoop02:2181,hadoop03:2181 yarn.nodemanager.aux-services mapreduce_shuffle yarn.log-aggregation-enable true yarn.log-aggregation.retain-seconds 86400 yarn.resourcemanager.recovery.enabled true yarn.resourcemanager.store.classorg.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore

Slaves:

Hadoop01hadoop02hadoop034) distribute installation packages to other machines

[hadoop@hadoop01 apps] $scp-r hadoop-2.6.5 hadoop@hadoop02:$PWD

[hadoop@hadoop01 apps] $scp-r hadoop-2.6.5 hadoop@hadoop03:$PWD

5) configure environment variables separately

[hadoop@hadoop01 apps] $vi ~ / .bashrc

Add two lines:

Export HADOOP_HOME=/home/hadoop/apps/hadoop-2.6.5export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

[hadoop@hadoop01 apps] $source ~ / .bashrc

6) Cluster initialization operation

Start the zookeeper cluster first:

Launch: zkServer.sh start

Check whether the startup is normal: zkServer.sh status

Start the journalnode process:

[hadoop@hadoop01 ~] $hadoop-daemon.sh start journalnode

[hadoop@hadoop02 ~] $hadoop-daemon.sh start journalnode

[hadoop@hadoop03 ~] $hadoop-daemon.sh start journalnode

Then use the jps command to see if the journalnode process is started on each datanode node

Perform a format operation on the first namenode:

[hadoop@hadoop01 ~] $hadoop namenode-format

Then some cluster information will be generated in the temporary directory configured in core-site.xml and copied to the same directory of the second namenode.

Hadoop.tmp.dir

/ home/hadoop/data/hadoopdata/

Two namenode nodes. The data structure in this directory is consistent.

[hadoop@hadoop01] $scp-r ~ / data/hadoopdata/ hadoop03:~/data

Or on another namenode node: hadoop namenode-bootstrapStandby

Format ZKFC (you can format it on a cluster):

[hadoop@hadoop01 ~] $hdfs zkfc-formatZK

Start HDFS:

[hadoop@hadoop01 ~] $start-dfs.sh

Start YARN:

[hadoop@hadoop01 ~] $start-yarn.sh

If the resourcemanager of the standby node is not started, start it manually:

[hadoop@hadoop02 ~] $yarn-daemon.sh start resourcemanager

7) add:

View the status of each master node

HDFS:

Hdfs haadmin-getServiceState nn1

Hdfs haadmin-getServiceState nn2

YARN:

Yarn rmadmin-getServiceState rm1

Yarn rmadmin-getServiceState rm2

4. Post-cluster testing of HA

1. Manually kill the namenode of active to see the status of the cluster

two。 Manually kill the resourcemanager of active to see the status of the cluster

3. When uploading files, kill namenode to check the status of the cluster

4. When performing a task, kill resourcemanager and check the cluster status.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report