What is the use of SecondaryNameNode in Hadoop 07/16 Update SLTechnology News&Howtos

What is the use of SecondaryNameNode in Hadoop

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the use of SecondaryNameNode in Hadoop, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.

Action

1. The main function of the SecondaryNameNode node is to periodically merge the namespace image file of the metadata node with the modification log to prevent the log file from being too large.

For the mirror backup of 2.NameNode, if the NameNode goes down, you can recover the data from SecondaryNameNode, but there will be data loss (the last time SecondaryNameNode reads the image and modifies the log to the data loss in the middle of downtime)

working principle

NameNode is mainly used to store HDFS metadata information, such as namespace information, block information, and so on. To ensure efficiency, Namenode loads this information into memory when it starts; at the same time, it persists this information to the hard disk, usually resulting in the following files: spatial named image file (fsimage) and modification log file (edits). The following figure shows the file directory structure of NameNode:

When NameNode starts, it reads the fsimage file and merges the edits file. However, in general, namenode rarely restarts in the cluster, which leads to the gradual enlargement of edits files, which leads to various problems such as difficult management of edits files, slow restart, damage and loss of too much data in edits files.

Therefore, Hadoop uses SecondaryNameNode to merge fsimage and edits files to reduce the workload of NameNode and improve the reliability of Hadoop clusters.

The workflow of SecondaryNameNode is as follows:

The SecondaryNameNode node tells the NameNode node to generate a new log file, and all subsequent logs are written to the new log file.

The SecondaryNameNode node uses http get to get the fsimage file and the old log file from the NameNode node.

The SecondaryNameNode node loads the fsimage file into memory, performs the operation in the log file, and then generates a new fsimage file.

The SecondaryNameNode node sends the new fsimage file back to the NameNode node using http post.

The NameNode node can replace the old fsimage file and the old log file with the new fsimage file and the new log file (generated in the first step), and then update the fstime file to write the time of the checkpoint.

In this way, the fsimage file in the NameNode node stores the latest checkpoint metadata information, and the log file starts over and does not become very large.

The checkpoint process for Secondary NameNode starts and is controlled by two configuration parameters:

Fs.checkpoint.period (new version dfs.namenode.checkpoint.period), which specifies the maximum time interval between two consecutive checkpoints. The default value is 1 hour.

Fs.checkpoint.size (the new version has changed, but nothing has changed) defines the maximum value for edits log files, beyond which a checkpoint is enforced (even if the maximum interval between checkpoints is not reached). The default value is 64MB.

SecondaryNameNode runs on another non-NameNode machine

The SecondaryNameNode process runs on the machine of the NameNode node by default. If this machine goes wrong and goes down, it will be a great disaster to restore the HDFS file system. A better way is to configure the SecondaryNameNode process to run on another machine. As for why the SNN process should be run on a non-NameNode machine, there are two main considerations:

Scalability: to create a new HDFS snapshot, you need to copy all the metadata information from load to memory in namenode, which requires the same memory as namenode. Because the memory allocated to the namenode process is actually a limitation on the HDFS file system, if the distributed file system is very large, then the memory of the namenode machine may be fully occupied by the namenode process.

Fault tolerance: when snn creates a checkpoint, it copies the checkpoint into several copies of metadata. Running this operation to another machine also provides the fault tolerance of a distributed file system.

Thank you for reading this article carefully. I hope the article "what is the use of SecondaryNameNode in Hadoop" shared by the editor will be helpful to you. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.