Why HDFS has High availability in Hadoop2.2.0 04/26 Update SLTechnology News&Howtos

Why HDFS has High availability in Hadoop2.2.0

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article focuses on "Why HDFS in Hadoop2.2.0 has high availability". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn why HDFS has high availability in Hadoop2.2.0.

Before Hadoop2.0.0, NameNode (NN) had a single point of failure (single point of failure) in the HDFS cluster, and there was a NameNode in each cluster. If the machine in which the NN is located fails, it will make the entire cluster unavailable until NN restarts or starts the NN daemon thread on another host.

The availability of HDFS is affected in two main ways:

(1) in unpredictable circumstances, if the machine where NN is located crashes, the entire cluster will not be available until NN is restarted

(2) under predictable circumstances, such as the need to upgrade the hardware or software of the machine where the NN is located, it will lead to cluster downtime.

The high availability of HDFS will solve the above two problems by running two NN (active NN & standby NN) in the same cluster, which allows you to quickly enable a new NN to recover from a failure in the event of a machine crash or machine maintenance.

In a typical HA cluster, there are usually two different machines acting as NN. Only one machine is in the Active state at any one time; the other machine is in the Standby state. Active NN is responsible for the operation of all clients in the cluster, while Standby NN is mainly used for standby, which mainly maintains an adequate state and, if necessary, provides rapid failure recovery.

In order to keep the state of the Standby NN synchronized with the Active NN, that is, the metadata, they will both communicate with the JournalNodes daemon. When Active NN performs any changes to the namespace, it needs to persist to more than half of the JournalNodes (through edits log persistence storage), while Standby NN is responsible for observing edits log changes, it can read edits information from JNs and update its internal namespace. Once the Active NN fails, the Standby NN will ensure that all the Edits is read from the JNs and then switch to the Active state. Standby NN reads all edits to ensure that it has a fully synchronized namespace state with Active NN before a failover occurs.

In order to provide fast failure recovery, Standby NN also needs to save the storage location of each file block in the cluster. To achieve this, all Database in the cluster will configure the location of Active NN and Standby NN and send them the location of the block file and the heartbeat.

It is extremely important that only one NN in the cluster is in the Active state at any one time. Otherwise, the NameSpace state will differ between the two Active NN states, which will lead to data loss and other incorrect results. To ensure that this does not happen, JNs allows only one NN to act as a writer at any time. During failure recovery, the NN that is about to become Active takes the role of writer and prevents another NN from continuing to be in the Active state.

In order to deploy a HA cluster, you need to prepare the following:

(1), NameNode machines: machines running Active NN and Standby NN require the same hardware configuration

(2), JournalNode machines: the machine running JN. JN daemons are relatively lightweight, so they can be run on the same machine by other daemon threads, such as NN,YARN ResourceManager. In a cluster, at least 3 JN daemons need to be run, which will give the system some fault tolerance. Of course, you can also run more than 3 JN, but in order to increase the fault tolerance of the system, you should run an odd number of JN (3, 5, 7, etc.). When running N JN, the system will tolerate a maximum of 2 JN crashes.

In a HA cluster, Standby NN also executes checkpoints of namespace state, so it is not necessary to run Secondary NN, CheckpointNode, and BackupNode;. In fact, it is wrong to run these daemons.

At this point, I believe you have a deeper understanding of "Why HDFS in Hadoop2.2.0 has high availability". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.