How fsimage and edits merge 07/13 Update SLTechnology News&Howtos

How fsimage and edits merge

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article is about how fsimage and edits merge. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

During the run of NameNode, all update operations of HDFS are written directly to edits. Over time, the edits file will become very large. Although this has no effect on the NameNode runtime, we know that when NameNode restarts, NameNode first maps all the contents of fsimage into memory, and then executes the records in edits one by one, when the edits file is very large, it will cause the NameNode startup operation to be very slow, and during this time the HDFS system is in safe mode, which is obviously not required by users. Can you make the edits file smaller while NameNode is running? In fact, it is possible, this article is mainly for Hadoop 1.x version, explain how it merges edits and fsimage files, Hadoop 2.x version edits and fsimage files merge is different.

Hadoop 1.x

Users who have used Hadoop should know that there is a SecondaryNamenode process in Hadoop, and from its name, it is easy to think of it as a hot backup process for NameNode. In fact, this is not the case. SecondaryNamenode is an integral part of the HDFS architecture, which is set to save the backup of HDFS metadata information in namenode and reduce the time it takes to restart namenode! SecondaryNamenode is usually run on a separate machine, so how does SecondaryNamenode namenode the restart time? Let's take a look at how SecondaryNamenode works:

(1) SecondaryNamenode will communicate with NameNode periodically, asking it to stop using the edits file and temporarily write the new write operation to a new file edit.new. This operation is done in an instant, and the upper log function does not feel the difference at all.

(2) SecondaryNamenode obtains the fsimage and edits files from NameNode by HTTP GET, and downloads them to the corresponding local directory

(3) SecondaryNamenode loads the downloaded fsimage into memory, and then performs various update operations in the edits file one by one to keep the fsimage in memory up to date; this process is the merging of edits and fsimage files.

(4) after SecondaryNamenode executes the operation (3), the new fsimage file will be sent to the NameNode node by post.

(5) NameNode replaces the old fsimage file with the new fsimage received from SecondaryNamenode and replaces the edit.new with the edits file. Through this process, the edits becomes smaller! The execution of the entire process can be illustrated by the following figure:

As we can see from the above description, SecondaryNamenode is not a hot backup of Namenode at all, it just merges fsimage and edits. The fsimage it owns is not up to date because when he downloads the fsimage and edits files from NameNode, the new update operation has been written to the edit.new file. And these updates are not synchronized in SecondaryNamenode! Of course, if there is something really wrong with the fsimage in the NameNode, you can replace the fsimage on the NameNode with the fsimage in the SecondaryNamenode. Although it is no longer the latest fsimage, we can minimize the loss!

Hadoop 2.x

We know that the single point of failure of NameNode has been solved in Hadoop 2.x; at the same time, SecondaryName is no longer in use, while in previous Hadoop 1.x, fsimage and edits were merged through SecondaryName to reduce the size of edits files, thereby reducing the time it takes for NameNode to restart. SecondaryName is no longer used in Hadoop 2.x, so how does it achieve the merger of fsimage and edits? First of all, we need to know that the HA mechanism (to solve the NameNode single point of failure) is provided in Hadoop 2.x, and you can implement HA by configuring an odd number of JournalNode, but let's not talk about how to configure it today! The HA mechanism solves the single point of failure of NameNode by running two NN (active NN & standby NN) in the same cluster. At any time, only one machine is in Active state; the other machine is in Standby state. Active NN is responsible for the operation of all clients in the cluster, while Standby NN is mainly used for standby, which mainly maintains an adequate state and, if necessary, provides rapid failure recovery.

In order to keep the state of the Standby NN synchronized with the Active NN, that is, the metadata, they will both communicate with the JournalNodes daemon. When Active NN performs any changes to the namespace, it needs to persist to more than half of the JournalNodes (through edits log persistence storage), while Standby NN is responsible for observing edits log changes, it can read edits information from JNs and update its internal namespace. Once the Active NN fails, the Standby NN will ensure that all the Edits is read from the JNs and then switch to the Active state. Standby NN reads all edits to ensure that it has a fully synchronized namespace state with Active NN before a failover occurs

Configuration Properti

Configure the following properties in hdfs-site.xml:

Dfs.namenode.checkpoint.period 3600 The number of seconds between two periodic checkpoints. Dfs.namenode.checkpoint.check.period 60 The SecondaryNameNode and CheckpointNode will poll the NameNode every 'dfs.namenode.checkpoint.check.period' seconds to query the number of uncheckpointed transactions. Thank you for reading! This is the end of the article on "how to merge fsimage and edits". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.