What is the function of hadoop Secondary NameNode? 07/13 Update SLTechnology News&Howtos

What is the function of hadoop Secondary NameNode?

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article is to share with you about the role of hadoop Secondary NameNode. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

Literally, it is easy for some beginners to think that SecondaryNameNode (snn) is the hot standby process of NameNode (nn). It's not. Snn is a part of HDFS architecture, but its real purpose is often misunderstood because of its name. In fact, its real use is to save backups of HDFS metadata information in namenode and to reduce the time it takes to restart namenode. For the hadoop process, there is still some work to be done to configure and use snn correctly. The default configuration of hadoop allows the snn process to run on the namenode machine by default, but in this way, if this machine goes wrong and goes down, it will be a great disaster to restore the HDFS file system. It is better to configure the snn process to run on another machine.

In hadoop, namenode is responsible for the persistent storage of HDFS's metadata and handles interactive feedback from the client on various operations of HDFS. To ensure the speed of interaction, the metadata of the HDFS file system is load into the memory of the namenode machine, and the data in memory is saved to disk for persistent storage. In order to ensure that this persistence process will not become a bottleneck in HDFS operations, hadoop takes the following approach: the snapshot of the current file system is not persisted at any time, and the most recent operation on HDFS list is saved to a file called Editlog in namenode. When namenode is restarted, in addition to the load fsImage accident, the HDFS operations recorded in this EditLog file are replay to restore the final state before HDFS restart.

SecondaryNameNode, on the other hand, periodically merges the operations on HDFS recorded in EditLog into a checkpoint, and then empties the EditLog. So the restart of namenode will Load the latest checkpoint and record the hdfs operations in replay EditLog. Since EditLog records the list of operations since the last checkpoint to the present, it will be relatively small. Without the periodic merge process of snn, it would take a long time every time namenode is restarted. Such periodic mergers can reduce restart time. At the same time, it can also ensure the integrity of HDFS system.

That's what SecondaryNameNode does. So snn does not share the pressure on namenode for HDFS interactive operations. However, when the namenode machine goes down or the namenode process goes wrong, the namenode daemon process can manually copy a metadata from the snn to restore the HDFS file system.

As for why the SNN process should be run on a non-NameNode machine, there are two main considerations:

Scalability: to create a new HDFS snapshot, you need to copy all the metadata information from load to memory in namenode, which requires the same memory as namenode. Because the memory allocated to the namenode process is actually a limitation on the HDFS file system, if the distributed file system is very large, then the memory of the namenode machine may be fully occupied by the namenode process.

Fault tolerance: when snn creates a checkpoint, it copies the checkpoint into several copies of metadata. Running this operation to another machine also provides the fault tolerance of a distributed file system.

Configure SecondaryNameNode to run on another machine

A run instance of HDFS is started through the $HADOOP_HOME/bin/start-dfs.sh (or start-all.sh) script on the namenode machine. This script starts the namenode process on the machine on which the script is running, while the DataNode process starts on the slaves machine, and the list of slave machines is saved in the conf/slaves file, one machine at a time. And a snn process is started on another machine, which is specified by the conf/masters file. Therefore, it is important to note that the machine specified in the conf/masters file does not mean that the jobtracker or namenode processes are running on this machine, because these processes are running on launch bin/start-dfs.sh or bin/start-mapred.sh (start-all.sh) machines. Therefore, the file name masters is very confusing, and it would be more appropriate to call it secondaries. Then, through the following steps:

Write all the machines that want to run the secondarynamenode process to the masters file, one line at a time.

Modify the conf/hadoop-site.xml file on the machine configured in the masters file by adding the following options:

Dfs.http.address namenode.hadoop-host.com:50070

Core-site.xml: there are two parameters that can be configured, but generally speaking we do not modify them. Fs.checkpoint.period indicates how often a mirror of hdfs is recorded. The default is 1 hour. Fs.checkpoint.size indicates how much size is recorded at a time. The default is 64m.

Fs.checkpoint.period 3600 The number of seconds between two periodic checkpoints. Fs.checkpoint.size 67108864 The size of the current edit log (in bytes) that triggers a periodic checkpoint even if the fs.checkpoint.period hasn't expired. Fs.checkpoint.dir / data/work/hdfs/namesecondary Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy.

3. Configuration check. After the configuration is complete, we need to check to see if it is successful. We can determine if the configuration is successful by looking at the file directory on the machine running secondarynamenode. First enter jps to see if there is a secondarynamenode process. If so, check to see if there is a backup record in the corresponding directory. This directory generally exists under hadoop.tmp.dir/dfs/namesecondary/.

IV. Recovery

Manufacturing namenode downtime

1) kill the process of dropping namenode

[root@master name] # jps 11749 NameNode 12339 Jps 11905 JobTracker [root@master name] # kill 11749

2) Delete the folder pointed to by dfs.name.dir, here is / data/work/hdfs/name

[root@master name] # rm-rf *

Delete everything under the name directory, but you must make sure that the name directory exists

3) remotely copy namesecondary files from secondarynamenode to namesecondary of namenode

[root@master hdfs] # scp-r slave-001:/data/work/hdfs/namesecondary/. /

4) start namenode

[root@master / data] # hadoop namenode-importCheckpoint

After normal startup, a lot of log will be displayed on the screen, and namenode can be accessed normally at this time.

5) check

Use the hadoop fsck / user command to check the integrity of the file Block

Hadoop fsck /

6) stop namenode, use crrl+C or end the session

7) Delete the files in the namesecondary directory (save clean)

[root@master namesecondary] # rm-rf *

8) officially launch namenode

[root@master bin] #. / hadoop-daemon.sh start namenode

The recovery work is complete. Check the hdfs data.

9) balancer

When using start-balancer.sh, the default is to use the speed of 1M/S (1048576) to move data (so slowly...)

Modify the hdfs-site.xml configuration. Here we use 20m/S.

Dfs.balance.bandwidthPerSec 20971520 Specifies the maximum bandwidth that each datanode can utilize for the balancing purpose in term of the number of bytes per second.

Then the result is that the operation of job becomes unstable, there are some unexpected long map units, and some reduce processing time becomes longer (when the whole cluster is full, plus the balance of 20m/s). It is said that Taobao is 10m/s, which needs to be adjusted to see what the situation is.

Hadoop balancer-threshold 5

V. Summary

1. Multiple secondarynamenode can be configured, and a few more can be written in the master file.

2. Be sure to remember that if you want to recover the data, you need to copy it to the namenode machine manually. It is not automatic (see the restore operation written above).

3. The cycle time of mirror backup can be modified. If you don't want to back up once an hour, you can change it for a shorter time. The fs.checkpoint.values in core-site.xml

Thank you for reading! This is the end of this article on "what is the role of hadoop Secondary NameNode?". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.