How does the enterprise solve the HDFS single point problem 07/15 Update SLTechnology News&Howtos

How does the enterprise solve the HDFS single point problem

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article shows you how to solve the HDFS single point problem, the content is concise and easy to understand, it can definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Preface

In the early days when Hadoop came out, the HDFS single point problem was not solved, which means that when the NameNode server goes down, the whole cluster will be paralyzed, which is very dangerous, so under the constant updates of Hadoop, Hadoop HA is proposed to solve the NameNode single point problem. Next, let's talk.

To solve the HDFS single point problem, you can deploy two NameNode to solve the HDFS ordering problem, but there is only one external service. If you deploy two NameNode, do you need to share the metadata information between them? otherwise, there will be problems when one of the NameNode fails and the metadata information is not synchronized.

According to the solution proposed by appche, there are three solutions as follows

Scheme 1. Directory sharing

Directory sharing is proposed in the appche community but is not referenced now. Directory sharing is also a single point of problem. Will it also cause HDFS to hang up when the directory share dies? So it was abandoned by some enterprises.

Option 2. Use the JournalNode scheme

If we use JN to save metadata information, it will not cause a single point of problem. JN is also a cluster. When we deploy JN, we usually choose cardinality, such as 3, 5, 7, 7, 9, and so on. JN has a policy that as long as the number of surviving nodes is greater than 1/2, it is a normal service.

Note: let's not choose a component that is also a single point problem to solve the NameNode single point problem, which is not solved at all.

The information in JN is all the same, so why is it that one of the NameNode is to write data and one of them is to read data?

In fact, NameNode is also written as standby for action. In the highly available architecture, there is only one NameNode that really provides external services, and users will only deal with action's NameNode. Let's take an understanding: suppose we have two leaders (at the same level) in our work. We ask for leave and one of them agrees that one of them agrees that this holiday will be repaired or not? Isn't that a mess? That is, in a highly available architecture, only one is in charge at a time.

Option 3. Use the zookeeper scheme

In fact, many enterprises also use zookeeper to take instead. Let's think about what problems JN has solved, not in terms of data consistency and single point of failure. We are thinking about whether zookeepr also exists, so enterprises have changed the source code of zookeeper to use this solution.

Overall architecture

Can the above solution solve the NameNode single point problem? suppose the NameNode of action hangs up in the wee hours of the morning. Do we need to switch manually? Whether the switch is not timely will lead to the unavailability of the entire cluster. Next I realize the automatic switching.

After successful startup, both NameNode will register with zookeeper and there will be a lock in the zookeeper. That NameNode registration is action. When other NameNode registers, they find that they have already been registered and become standby.

Each NameNode deploys ZKFC to monitor the situation of NameNode. When the NameNode of action fails, ActionZKF subscribes to the temporary zNode transformation through zookeeper to delete the temporary zNode (release lock) ZKF in the StandBy state. If the zNode disappears, the ZKFC in the StandBy state immediately passes standby NameNode. StandByNameNode remotely logs in to actionNameNode to execute kill-9 actionNameNode. StandByNameNode informs StandByZkfc to register zNode on zookeeper, and the registration is successfully converted to action status. In this way, you can achieve your own transformation.

The above content is how the enterprise solves the HDFS single point problem. Have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.