Analysis of the principle of High availability of NameNode+ResourceManager in Hadoop2.7.1 configuration 07/13 Update SLTechnology News&Howtos

Analysis of the principle of High availability of NameNode+ResourceManager in Hadoop2.7.1 configuration

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

The files that need to be configured for the high reliability of NameNode are core-site.xml and hdfs-site.xml.

The files that need to be configured for the high reliability of ResourceManager are yarn-site.xml.

Logical structure:

How NameNode-HA works:

In a typical HA cluster, it is best to have two independent machines to configure the NameNode role. At any time, only one NameNode in the cluster can be used as the Active state, and the other is the Standby state. The NameNode in the Active state is responsible for all the client operations in the cluster. In fact, the underlying mechanism of the HDFS is relevant. Only one writer is allowed to occupy a file at a time, if there are multiple. Then the file offset will be confused, resulting in unavailable data formats. Of course, NameNode with a state of Standby only plays the role of a Slave at this time, so that whenever the NameNode of Active dies, it can immediately take over its task and become the master NameNode, achieving the effect of a hot backup. In the HA architecture, the cold backup role of SecondaryNameNode no longer exists, in order to keep the metadata of the slave NameNode consistent with that of the master NameNode. They interact with each other through a series of guarded lightweight processes JournalNode. When any modification operation is performed on the main NameNode, it also records the modification log to at least half of the JornalNode. When the NameNode with the status of Standby detects a change in the synchronous log in the JournalNode, it reads the modified log in the JornalNode and then synchronizes it to its own directory mirror tree. When a failure occurs, the NameNode of the Active is hung up. Standby's NameNode reads the modification logs in all JournalNode before it becomes Active NameNode, so that it is highly reliable to ensure that it is consistent with the directory image tree of the dead NameNode, and then seamlessly takes over its responsibility to maintain requests from the client, thus achieving a high availability.

In order to quickly grasp the overall situation of fault tolerance, the Standby role will also accept block information reported by the DataNode role. The previous description only introduces the working principle of NameNode fault tolerance. The following description shows why NameNode-HA can achieve unattended and automatic switching fault tolerance after the introduction of Zookeeper.

What Zookeeper can do on active / standby switching:

(1) failure detection registers a persistent node on the Zookeeper when each NameNode is started. When the NameNode goes down, its session will be terminated. After the Zookeeper discovers it, it will notify the backup NameNode,Hi. Dude, it's time for you to take up your post.

(2) Election mechanism. Zookeeper provides a simple exclusive lock to obtain the function of Master. If the NameNode finds that it has the lock, it indicates that the NameNode will be activated to Active state.

Of course, in practice, Hadoop provides the ZKFailoverController role, on each NameNode node, referred to as zkfc, its main responsibilities are as follows:

(1) in health monitoring, zkfc periodically sends health detection commands to the NameNode it monitors to determine whether a NameNode is in a healthy state. If the machine goes down and the heartbeat fails, then zkfc marks it in an unhealthy state.

(2) session management. If the NameNode is healthy, the zkfc will keep an open session in the zookeeper. If the NameNode is also in the Active state, then zkfc will also have a transient type of znode in the Zookeeper when the NameNode dies.

The znode will be deleted, and then the standby NameNode will get the lock, upgrade to the primary NameNode, and mark the status as Active. When the down NameNode restarts, it will register the zookeper again and automatically change to the Standby state when it finds that it already has a znode lock. This cycle ensures high reliability. It should be noted that only a maximum of 2 NameNode are supported.

(3) master election, as mentioned above, implements a preemptive locking mechanism by maintaining a transient type of znode in the zookeeper to determine which NameNode is in an Active state.

In hdfs-site.xml

Xml code

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.