Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the function of Secondary NameNode

2025-01-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces what is the function of Secondary NameNode. It is very detailed and has certain reference value. Friends who are interested must finish reading it.

Recently, a friend asked me whether the role of Secondary NameNode is a backup of NameNode. Is it to prevent the single point problem of NameNode? Indeed, when you first come into contact with Hadoop, it is easy to regard Secondary NameNode as a backup node; in fact, this is a misunderstanding, we can not literally understand, read the official documents, we can know, in fact, this is not the case, let's repeat the role of Secondary NameNode.

In Hadoop, there are some naming modules that are not very satisfactory, and Secondary NameNode is one of the typical examples. Judging from its name, it feels like a backup node for NameNode, but it is not. Many Hadoop beginners wonder what role Secondary NameNode plays in it and what role it plays in HDFS. Next, let me explain:

In terms of its name, it does have something to do with NameNode; so before we delve into Secondary NameNode, let's take a look at what NameNode does.

2.1NameNodeNameNode is mainly used to store HDFS metadata information, such as namespace information, block information, and so on. When it runs, this information is stored in memory. But this information can also be persisted to disk. As shown in the following figure:

The figure above shows how NameNode saves metadata to disk. Here are two different files:

Fsimage:

It is a snapshot of the entire file system when NameNode starts.

Edits:

It is the sequence of changes to the file system after NameNode starts.

Only when NameNode is restarted will edits be merged into the fsimage file to get an up-to-date snapshot of the file system. However, NameNode is rarely restarted in a production cluster, which means that when NameNode is running for a long time, the edits file becomes very large. In this case, the following problems arise:

The edits file will become very large, how to manage this file?

The restart of NameNode will take a long time because there are many changes to be merged into the fsimage file.

If NameNode goes down, we will lose a lot of changes because the timestamp of the fsimage file is relatively old.

So to overcome this problem, we need an easy-to-manage mechanism to help us reduce the size of the edits file and get an up-to-date fsimage file, which will also reduce the pressure on NameNode. Secondary NameNode is proposed to help solve the above problems, its responsibility is to merge the edits of NameNode into the fsimage file. As shown in the figure:

I will also repeat the working principle of the above picture here:

First, it periodically goes to NameNode to get edits and updates it to fsimage.

Once it has a new fsimage file, it copies it back to NameNode.

NameNode will use this new fsimage file the next time it restarts, thus reducing the restart time.

The whole purpose of Secondary NameNode is to provide a Checkpoint Node in HDFS, and it is clear by reading the official documentation that it is only a helper node of NameNode, which is why it is considered to be Checkpoint Node in the community.

Now, we understand that what Secondary NameNode does is set up a Checkpoint on the file system to help NameNode work better; it is not a replacement for NameNode, nor is it a backup of NameNode.

The checkpoint process for Secondary NameNode starts and is controlled by two configuration parameters:

Fs.checkpoint.period, which specifies the maximum time interval between two consecutive checkpoints. The default value is 1 hour.

Fs.checkpoint.size defines the maximum value for edits log files, which causes a checkpoint to be enforced (even if the maximum interval between checkpoints is not reached).

The default value is 64MB.

If all other historical images and edits files are lost on NameNode except the latest checkpoint, NameNode can introduce this latest checkpoint. The following actions can achieve this function.

Create an empty folder at the location specified by the configuration parameter dfs.name.dir

Assign the location of the checkpoint directory to the configuration parameter fs.checkpoint.dir

Start NameNode and add-importCheckpoint.

NameNode reads the checkpoint from the fs.checkpoint.dir directory and saves it in the dfs.name.dir directory. If there is a legitimate image file in the dfs.name.dir directory, NameNode will fail to start. NameNode will check the consistency of the image file in the fs.checkpoint.dir directory, but will not change it.

Note: when did NameNode write the changes to edit logs? This operation is actually triggered by the write operation of DataNode. When we write a file to DataNode, DataNode will communicate with NameNode to tell NameNode what block of the file is placed there, and NameNode will write the metadata information to the edit logs file at this time.

The official documentation is attached below:

The NameNode stores modifications to the file system as a log appended to a native file system file, edits. When a NameNode starts up, it reads HDFS state from an image file, fsimage, and then applies edits from the edits log file. It then writes new HDFS state to the fsimage and starts normal operation with an empty edits file. Since NameNode merges fsimage and edits files only during start up, the edits log file could get verylarge over time on a busy cluster. Another side effect of a larger edits file is that next restart of NameNode takes longer.

The secondary NameNode merges the fsimage and the edits log files periodically and keeps edits log size within a limit. It is usually run ona different machine than the primary NameNode since its memory requirements are on the same order as the primary NameNode.

The start of the checkpoint process on the secondary NameNode is controlled by two configuration parameters.

* dfs.namenode.checkpoint.period, set to 1 hour by default, specifies the maximum delay between two consecutive checkpoints, and* dfs.namenode.checkpoint.txns, set to 1 million by default, defines the number of uncheckpointed transactions on the NameNode which will force an urgent checkpoint, even if the checkpoint period has not been reached. The secondary NameNode stores the latest checkpoint in a directory which is structured the same way as the primary NameNode's directory. So that the check pointed image is always ready to be read by the primary NameNode if necessary. The above is all the content of this article "what is the function of Secondary NameNode?" Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report