What is Hadoop's Secondary NameNode? 04/19 Update SLTechnology News&Howtos

What is Hadoop's Secondary NameNode?

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "what is Hadoop's Secondary NameNode". In daily operation, I believe many people have doubts about what Hadoop's Secondary NameNode is. Xiaobian consulted all kinds of information and sorted out simple and easy operation methods. I hope to help you answer the question of "what is Hadoop's Secondary NameNode"! Next, please follow the small series to learn together!

Secondary NameNode From its name, it gives the impression that it is a backup of NameNode. But it's not. What role does Secondary NameNode play in HDFS?

From the name point of view, Secondary NameNode and NameNode both contain NameNode. Is there some relationship between the two? Let's first see what NameNode does.

NameNode

NameNode is mainly used to store HDFS metadata information, such as namespace information, fast information, etc. This information is stored in memory while it is running. This information is also persisted to disk.

fsimage: is a snapshot of the entire file system at NameNode startup

edits: A sequence of changes to the file system after NameNode startup

Edits are merged into the fsimage file only when NameNode restarts, resulting in an up-to-date snapshot of the file system. However, NameNodes are rarely restarted in a production cluster, which means that when NameNodes run for a long time, the edits file becomes large. The following questions arise:

Edits files can get big, and managing them is a challenge.

NameNode restarts can take a long time because there are many changes in edits that need to be merged into the fsimage file.

If NameNode dies, many changes are lost because the fsimage file is very old.

Secondary NameNode can help solve the above problem. Its responsibility is to merge the edits of NameNode into fsimage.

Secondary NameNode

HDFS file system writes are not modified directly into fsimage, but into edits, and the Secondary NameNode node is responsible for integrating the two.

Checkpoint process is as follows:

Secondary Namenode requests Namenode to stop using the edits file and temporarily log new writes to a new file, such as edits.new.

Secondary Namenode node fetches fsimage and edits files from Namenode node (using HTTP GET)

Secondary Namenode loads the fsimage file into memory, executes the operations in the edits file one by one, and creates a new fsimage file

Secondary Namenode sends the new fsimage file back to Namenode (using HTTP POST)

The Namenode node replaces the old fsimage file with the fsimage file received from the Secondary Namenode node, and replaces the old edits file with the edits.new file generated in step 1 (that is, renames). Also update the fstime file to record when checkpoints are executed

Note: Starting with Hadoop 0.21.0, the auxiliary Namenode has been abandoned and replaced by checkpoint nodes with the same functionality. The new version also introduces a new Namenode called BackupNode.

The whole purpose of Secondary NameNode is to provide a Checkpoint Node in HDFS, which is just a helper node for NameNode.

Now, we understand that what Secondary NameNode does is set a Checkpoint on the file system to help NameNode work better; it does not replace NameNode, nor is it a backup of NameNode.

Checkpoint process initiation for Secondary NameNode is controlled by two configuration parameters:

fs.checkpoint.period, specifies the maximum time interval between two consecutive checkpoints; the default is 1 hour.

fs.checkpoint.size defines the maximum number of edits log files that can be exceeded to force a checkpoint (even if the maximum time interval between checkpoints is missed). The default value is 64MB.

When do I write changes to NameNode in edit logs?

This operation is actually triggered by DataNode's write operation. When we write a file to DataNode, DataNode will communicate with NameNode and tell NameNode what block of the file is placed in it. NameNode will write this metadata information to the edit logs file at this time.

At this point, the study of "What is Hadoop's Secondary NameNode" is over, hoping to solve everyone's doubts. Theory and practice can better match to help you learn, go and try it! If you want to continue learning more relevant knowledge, please continue to pay attention to the website, Xiaobian will continue to strive to bring more practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.