NameNode and DataNode in HDFS 07/09 Update SLTechnology News&Howtos

NameNode and DataNode in HDFS

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

HDFS clusters run in Master-Slave mode, and there are two main types of nodes: one Namenode node (that is, master) and multiple Datanode nodes. Namenode manages the file system's Namespace. He maintains the metadata of the file system tree and all files and folders in the file tree.

Hdfs architecture diagram:

Namenode:

Namenode manages the Namespace of the file system. It maintains the Metadata of the file system tree and all files and folders in the file tree. There are two files that manage this information, the Namespace image file (Namespace p_w_picpath) and the operation log file (edit log), which are Cache in RAM and, of course, persistent storage on the local disk. Namenode records the location information of the data nodes in which each block is located in each file, but it does not persist this information because it is rebuilt from the data and nodes when the system is rebooted.

Abstract diagram of Namenode structure:

The client interacts with namenode and datanode on behalf of the user to access the entire file system. The client provides a series of file system interfaces, so when programming, we hardly need to know datanode and namenode to complete the functions we need.

Datanode:

Datanode is the working node of the file system. They store and retrieve data according to the scheduling of the client or namenode, and periodically send a list of their stored blocks (block) to namenode.

Namenode fault tolerance mechanism:

You can't work without Namenode,HDFS. In fact, if the machine running namenode goes down, the files in the system will be completely lost, because there is no other way to rebuild files that are located on different datanode. Therefore, the fault-tolerant mechanism of namenode is very important, and Hadoop provides two fault-tolerant mechanisms.

The first way is to persist the file system metadata stored on the local disk. Hadoop can be configured to have Namenode write its persistence state to different file systems. This write operation is synchronous and atomized. A more common configuration is to write the persistence state to the local disk as well as to the remotely mounted network file system.

The second way is to run an auxiliary Namenode (Secondary Namenode). Secondary Namenode cannot be used as a Namenode in real time. Its main function is to merge the namespace image with the operation log file (edit log) periodically to prevent the operation log file (edit log) from becoming too large. Typically, Secondary Namenode runs on a separate physical machine, because a backup of the merged nameSpace image can be used if namenode goes down. But the secondary namenode always lags behind the namenode, so when the namenode goes down, data loss is inevitable. In this case, generally, it is necessary to use the namenode metadata file in the remotely mounted network file system (NFS) mentioned in the first way, copy the namenode metadata file in nfs to the secondary namenode and run the secondary namenode as namenode.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.