Hdfs namenode HA high availability scheme 07/02 Update SLTechnology News&Howtos

Hdfs namenode HA high availability scheme

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Monday, 2019-2-18

Hdfs namenode HA high availability scheme

1. Introduction of hadoop-ha cluster operation mechanism.

The so-called HA, that is, high availability (7 / 24 hours without interruption of service) / / hadoop 2.x has a built-in HA scheme

The most important thing to achieve high availability is to eliminate single point of failure.

Strictly speaking, hadoop-ha should be divided into HA mechanisms of individual components.

Tip:

When there was no HA mechanism before, secondary namenode was very different from standay namenode.

Secondary namenode can not replace namenode;, but standay namenode can completely replace namenode.

HA Technical Essentials: metadata Management State Management of 2 namenode how to prevent brain fissure

HA Mechanism of HDFS

Eliminate single point of failure through dual namenode

Key points for the coordination of dual namenode:

A. Metadata management needs to be changed:

Each store a copy of the metadata in memory

There can be only one Edits log, and only namenode nodes in Active status can do write operations

Both namenode can read edits

Shared edits is managed in a shared storage (two mainstream implementations of qjournal and NFS)

1. When the client accesses the fsimages in active namenode / / a pair of namenode, it is exactly the same and empty at the beginning.

2. When writing data, when writing data to active namenode memory metadata, it will also be updated to the qjonal cluster edits log file system in real time.

3. Standay will read the edits file every once in a while and update it to its own metadata memory to keep the minimum difference from active.

4. Every once in a while, fsimage and edits in standay are updated to keep them locally.

Note: edits belongs to neither active nor standay, relying on third-party qjonal clusters to be completely independent.

Suppose active namenode goes down, and there is a little difference between standay and active, but the difference is very small. Standay quickly updates the latest old active operation from the edits log system, with exactly the same metadata as the old active, so it is necessary to quickly provide services to outsiders.

B. A state management function module is required.

Implements a zkfailover that resides in the node where each namenode resides

Each zkfailover is responsible for monitoring its own namenode node and using zk for status identification.

When state switching is needed, zkfailover is responsible for switching.

It is necessary to prevent the brain split phenomenon when switching.

1. Zkfc on active monitors the health information of its own namenode in real time.

2. If an exception occurs, the zkfc of standay will be controlled.

3. After receiving an exception, the zkfc of standay will go to kill-9 active namenode

4. If the zkfc of standay does not successfully get the return value after kill-9, then start the script to kill the active namenode script at / bin/true

5. After killing active namenode, you successfully get the access value

6. Standay's zkfc notification standay namenode is called active external service.

What is zkfc: a fail-over controller based on zookeeper

How to avoid brain split (brain fissure) during state switching?

Brain fissure: when the active namenode is not working properly, the zkfc writes some data in the zookeeper to indicate the abnormality, and the zkfc in the standby namenode reads the abnormal information and sets the standby node to active. However, if the previous active namenode is not really dead, there is a fake death (normal after dying for a while), so that there are two namenode working at the same time. This phenomenon is called cerebral fissure. Solution: when standby namenode senses that there is an exception in the active node, it will not immediately switch the state. Zkfc will first remotely kill the namenode process (kill-9 process number) of the active node through ssh. If the namenode node of standby does not receive a receipt of a successful kill execution within a period of time, the standby node executes a custom script to ensure that there is no brain fissure problem! This mechanism is called fencing in hadoop (including two guarantees for ssh to send kill instructions and execute custom scripts).

As you can see from the solution, when an active node collapse occurs, hadoop will do the following two actions:

1) drop the namenode process of the active node through ssh kill

2) execute custom script

Original: https://blog.csdn.net/qq_22310551/article/details/85700978

How not to get the kill success return information in time, in calling a user-specified shell script.

[root@hadoop-node01 bin] # ls-l / bin/true / / script is located under bin

-rwxr-xr-x. 1 root root 21112 October 15 2014 / bin/true

In cdh, this program is used in the

HDFS High Availability defense method

Dfs.ha.fencing.methods

List of defense methods used for service defense. Shell (. / cloudera_manager_agent_fencer.py) is a defense mechanism designed to use Cloudera Manager Agent. The sshfence method uses SSH. If you use custom defenses (which may communicate with shared storage, power devices, or network switches), invoke them using shell.

Timeout of Cloudera Manager defense strategy

Dfs.ha.fencing.cloudera_manager.timeout_millis 10000

Timeout period (milliseconds) used by Cloudera Manager agent-based defenses

The role of zookeeper in HA Mechanism

1. QJN cluster needs zk to implement coordination service.

2. Who is active and who is standay in namenode is recorded in zk

3. Zkfc implements fail-over controller based on zookeeper.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.