HDFS:edit log & fsimage 03/31 Update SLTechnology News&Howtos

HDFS:edit log & fsimage

2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

Under the ${dfs.namenode.name.dir} / current directory of NameNode, there are several files:

In the database system, log is used to record the log of write operations, and use the Log to backup, restore data and so on. There are two kinds of records of write operations: relational database log,HBase, WALs, and so on.

HDFS uses a similar mechanism. In HDFS, the first file operation is treated as a transaction. For example, the creation of a file, file content append, file movement and other write operations. From this point of view, fsp_w_picpath files are equivalent to database files for HDFS metadata, while edit log is equivalent to operating log files.

Fsp_w_picpath:

Each fsp_w_picpath file includes all the directory and file inodes information in the entire file system. Each inode is a representative file or directory metadata within the HDFS. If inode represents a file, it includes: file backup level, modification time, access time, access rights, block size, composition of all blocks and other information. If inode represents a directory, it includes: modification time, permissions, other related metadata, and so on.

Edit log:

Logically, it is an instance (that is, it can be understood as an object) and is actually made up of multiple files. Each file is called a segment prefixed with edits_, and the file name is followed by a transaction id.

For example, the above: edits_0000000000011403382-00000000011403408 means that what is put in the file is

The transaction Id is information for those transactions between 0000000000011403382 and 0000000000011403408. When the client completes a write operation and receives a response code from namenode's success, Namenode writes the transaction information to the editlog file.

Why not write the transaction information directly to fsp_w_picpath after processing?

If you do this, that is, when each write operation is finished, update the fsp_w_picpath file, in a large file system, the file will become very large, on the GB is possible, then it will be a slow process.

Write it into edit log first, how can it be merged into fsp_w_picpath?

The solution is to start a Secondary namenode. It exists to generate the fsp_w_picpath file of Primary NameNode in memory. The process goes like this:

1. Secondary tells Primary to scroll its in-progress edits file so that the new write operation will be placed in a new edit file. Primary also updates the seen_txid file.

2. Secondary obtains the latest fsp_w_picpath and edits files from Primary through HTTP GET.

3. Secondary loads fsp_w_picpath into memory, and fetches each transaction from the edits file and applies it to fsp_w_picpath, resulting in a new merged fsp_w_picpath file.

4. Secondary sends the newly merged fsp_w_picpath file to Primary,Primary via HTTP PUT and saves it to a temporary .ckpt file.

5. Primary renames the temporary file and makes it available.

At the same time, this is why Secondary requires a memory configuration similar to Primary and needs to be deployed on a separate machine.

Why doesn't NameNode do the merge itself, but Seconary NameNode does it?

It's not that NameNode doesn't do it itself, it just does it at startup.

First of all, all writes are processed by NameNode, so the contents of fap_w_picpath are in the memory of NameNode.

There must be the same one. So during run time, there is no need to merge to ensure consistency with memory.

Second, NameNode does the merge operation only at startup.

It is precisely because of these two reasons that it needs to be done by Secondary NameNode. Otherwise, NameNode has been running for a long time, such as accumulating a large amount of editlog, and fsp_w_picpath is the state of the merge after NameNode was started. Then it is necessary to merge for a long time after NameNode is restarted.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.