Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the function of Hadoop Journal Node?

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

Editor to share with you what the role of Hadoop Journal Node, I believe that most people do not know much about it, so share this article for your reference, I hope you will learn a lot after reading this article, let's go to understand it!

Sharing data between NameNode (NFS, Quorum Journal Node (used more))

In order to synchronize data, two NameNode communicate with each other through a set of independent processes called JournalNodes. When there is any change in the namespace of the NameNode in the active state, most JournalNodes processes are notified. NameNode in standby state has the ability to read changes in JNs, and keeps monitoring changes in edit log, applying changes to its own namespace. Standby ensures that the namespace state is fully synchronized when the cluster goes wrong.

The NameNode in Hadoop is like the human heart. It is very important and must never stop working. In the age of hadoop1, there was only one NameNode. If the NameNode data is lost or does not work, the entire cluster cannot be recovered. This is a single point problem in hadoop1, and it is also a sign that hadoop1 is unreliable, as shown in figure 1. Hadoop2 solved this problem.

Figure 1

The high reliability of HDFS in hadoop2.2.0 (HA) means that two NameNode can be started at the same time. One of them is working and the other is on standby. In this way, when the server where a NameNode is located goes down, you can manually or automatically switch to another NameNode to provide services without losing data.

These NameNode share data to ensure that the state of the data is consistent. Data is shared among multiple NameNode, either through Nnetwork File System or Quorum Journal Node. The former is a file system shared through linux, which belongs to the configuration of the operating system, while the latter belongs to the hadoop itself and belongs to the configuration of the software.

Here we talk about the configuration method using Quorum Journal Node, which is manually switched.

When the cluster starts, you can start two NameNode at the same time. Only one of these NameNode is active, and the other belongs to the standby state. The active state means providing the service, and the standby state means being dormant, only data synchronization, and always ready to provide the service, as shown in figure 2.

Figure 2

Architecture

In a typical HA cluster, each NameNode is a separate server. At any one time, only one NameNode is in the active state and the other is in the standby state. Among them, the NameNode in the active state is responsible for all the client operations, and the NameNode in the standby state is in a subordinate position, maintaining the data state and ready to switch at any time.

In order to synchronize data, two NameNode communicate with each other through a set of independent processes called JournalNodes. When there is any change in the namespace of the NameNode in the active state, most JournalNodes processes are notified. NameNode in standby state has the ability to read changes in JNs, and keeps monitoring changes in edit log, applying changes to its own namespace. Standby ensures that the namespace state is fully synchronized when the cluster goes wrong, as shown in figure 3.

Figure 3

To ensure fast switching, it is necessary for the NameNode in the standby state to know the location of all data blocks in the cluster. To do this, all datanodes must configure two NameNode addresses and send block location information and heartbeats to both of them.

For HA clusters, it is critical to ensure that only one NameNode is in the active state at a time. Otherwise, the data state of the two NameNode will diverge and may lose data or produce incorrect results. To ensure this, JNs must ensure that only one NameNode can write data to itself at a time.

Hardware resources

In order to deploy a HA cluster, you should prepare the following:

* NameNode server: the server running NameNode should have the same hardware configuration.

* JournalNode server: the JournalNode process running is very lightweight and can be deployed on other servers. Note: at least 3 nodes must be allowed. Of course you can run more, but it must be odd, such as 3, 5, 7, 9, and so on. When running N nodes, the system can tolerate the failure of at least (N Mel 1) / 2 (N at least 3) nodes without affecting the normal operation.

In a HA cluster, NameNode in standby state can complete checkpoint operations, so there is no need to configure Secondary NameNode, CheckpointNode, and BackupNode. If it is really configured, it will also report an error.

These are all the contents of this article "what is the purpose of Hadoop Journal Node?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report