Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to configure HDFS High availability Environment in Hadoop Framework

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly shows you "how to configure HDFS high availability environment in Hadoop framework", which is easy to understand and well organized. I hope it can help you solve your doubts. Let me lead you to study and learn this article "how to configure HDFS high availability environment in Hadoop framework".

1. High availability of HDFS 1. Basic description

In the case of a single node failure or a small number of node failures, the cluster can also provide services normally. The HDFS high availability mechanism can eliminate the problem of single node failure by configuring two NameNodes nodes of Active/Standby to achieve hot backup for NameNode in the cluster. If a single node fails, you can quickly switch the NameNode to another node.

2. Detailed explanation of the mechanism.

High availability based on two NameNode, dependent on shared Edits files and Zookeeper cluster

Each NameNode node is configured with a ZKfailover process, which is responsible for monitoring the status of the NameNode node.

NameNode maintains a persistent session with the ZooKeeper cluster

If the Active node fails, ZooKeeper notifies the NameNode node of the Standby status

After the ZKfailover process detects and confirms that the failed node is not working

ZKfailover notifies NameNode nodes in Standby status to switch to Active status to continue service

ZooKeeper is very important in big data system, coordinating the work of different components, maintaining and transferring data, for example, automatic failover depends on ZooKeeper components under high availability.

2. HDFS highly available 1. Overall configuration service list HDFS file YARN scheduling single service shared file Zk cluster hop01DataNodeNodeManagerNameNodeJournalNodeZK-hop01hop02DataNodeNodeManagerResourceManagerJournalNodeZK-hop02hop03DataNodeNodeManagerSecondaryNameNodeJournalNodeZK-hop032, configuration JournalNode

Create a directory

[root@hop01 opt] # mkdir hopHA

Copy the Hadoop directory

Cp-r / opt/hadoop2.7/ / opt/hopHA/

Configure core-site.xml

Fs.defaultFS hdfs://mycluster hadoop.tmp.dir / opt/hopHA/hadoop2.7/data/tmp

Configure hdfs-site.xml and add the following

Dfs.nameservices mycluster dfs.ha.namenodes.mycluster nn1,nn2 dfs.namenode.rpc-address.mycluster.nn1 hop01:9000 dfs.namenode.rpc-address.mycluster.nn2 hop02:9000 dfs.namenode.http-address.mycluster.nn1 hop01:50070 dfs.namenode.http-address.mycluster.nn2 hop02:50070 dfs.namenode.shared.edits.dir qjournal://hop01:8485;hop02:8485 Hop03:8485/mycluster dfs.ha.fencing.methods sshfence dfs.ha.fencing.ssh.private-key-files / root/.ssh/id_rsa dfs.journalnode.edits.dir / opt/hopHA/hadoop2.7/data/jn dfs.permissions.enable false dfs.client.failover.proxy.provider.mycluster org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

Start the journalnode service in turn

[root@hop01 hadoop2.7] # pwd/opt/hopHA/hadoop2.7 [root@hop01 hadoop2.7] # sbin/hadoop-daemon.sh start journalnode

Delete data under hopHA

[root@hop01 hadoop2.7] # rm-rf data/ logs/

NN1 formats and starts NameNode

[root@hop01 hadoop2.7] # pwd/opt/hopHA/hadoop2.7bin/hdfs namenode-formatsbin/hadoop-daemon.sh start namenode

NN2 synchronizes NN1 data

[root@hop02 hadoop2.7] # bin/hdfs namenode-bootstrapStandby

NN2 starts NameNode

[root@hop02 hadoop2.7] # sbin/hadoop-daemon.sh start namenode

View current status

Start all DataNode on NN1

[root@hop01 hadoop2.7] # sbin/hadoop-daemons.sh start datanode

NN1 switches to Active state

[root@hop01 hadoop2.7] # bin/hdfs haadmin-transitionToActive nn1 [root@hop01 hadoop2.7] # bin/hdfs haadmin-getServiceState nn1active

3. Failover configuration

Configure hdfs-site.xml. Add the following content: synchronize the cluster

Dfs.ha.automatic-failover.enabled true

Configure core-site.xml. Add the following content: synchronize the cluster

Ha.zookeeper.quorum hop01:2181,hop02:2181,hop03:2181

Shut down all HDFS services

[root@hop01 hadoop2.7] # sbin/stop-dfs.sh

Start the Zookeeper cluster

/ opt/zookeeper3.4/bin/zkServer.sh start

Hop01 initializes the state of HA in Zookeeper

[root@hop01 hadoop2.7] # bin/hdfs zkfc-formatZK

Hop01 starts the HDFS service

[root@hop01 hadoop2.7] # sbin/start-dfs.sh

NameNode node starts ZKFailover

Here, the service status that hop01 and hop02 start first is Active, and here hop02 is started first.

[hadoop2.7] # sbin/hadoop-daemon.sh start zkfc

End the NameNode process of hop02

Kill-9 14422

Wait a minute to check the hop01 status

[root@hop01 hadoop2.7] # bin/hdfs haadmin-getServiceState nn1active III, YARN High availability 1, basic description

The basic flow and ideas are similar to the HDFS mechanism, relying on Zookeeper clusters. When the Active node fails, the Standby node will switch to Active state continuous service.

2. Detailed explanation of configuration

The environment is also demonstrated based on hop01 and hop02.

Configure yarn-site.xml to synchronize services under the cluster

Yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.ha.enabled true yarn.resourcemanager.cluster-id cluster-yarn01 yarn.resourcemanager.ha.rm-ids rm1 Rm2 yarn.resourcemanager.hostname.rm1 hop01 yarn.resourcemanager.hostname.rm2 hop02 yarn.resourcemanager.zk-address hop01:2181,hop02:2181,hop03:2181 yarn.resourcemanager.recovery.enabled true yarn.resourcemanager.store.class org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore

Restart the journalnode node

Sbin/hadoop-daemon.sh start journalnode

Format and start in the NN1 service

[root@hop01 hadoop2.7] # bin/hdfs namenode-format [root@hop01 hadoop2.7] # sbin/hadoop-daemon.sh start namenode

Synchronize NN1 metadata on NN2

[root@hop02 hadoop2.7] # bin/hdfs namenode-bootstrapStandby

Start DataNode under the cluster

[root@hop01 hadoop2.7] # sbin/hadoop-daemons.sh start datanode

NN1 is set to Active statu

Start hop01 first, and then start hop02.

[root@hop01 hadoop2.7] # sbin/hadoop-daemon.sh start zkfc

Hop01 starts yarn

[root@hop01 hadoop2.7] # sbin/start-yarn.sh

Hop02 starts ResourceManager

[root@hop02 hadoop2.7] # sbin/yarn-daemon.sh start resourcemanager

View statu

[root@hop01 hadoop2.7] # bin/yarn rmadmin-getServiceState rm1

These are all the contents of the article "how to configure HDFS High availability Environment in Hadoop Framework". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report