An example Analysis of the principle of hdfs Architecture of Hadoop 04/05 Update SLTechnology News&Howtos

An example Analysis of the principle of hdfs Architecture of Hadoop

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article will explain in detail the sample analysis of the hdfs architecture principle of Hadoop, Xiaobian thinks it is quite practical, so share it with you as a reference, I hope you can gain something after reading this article.

1. HDFS architecture

Several services related to hdfs appeared in article 3 of this series.

If the configuration files written during hadoop configuration are different, the services started are different.

Following the configuration in article 2 of this series, the following services are started: namenode, journalnode, datanode, zkfc. The relationship is as follows:

As you can see from the diagram, namenode is the absolute central node, and all nodes interact with it. There are two namenode, one active and one standby. Active provides namenode service normally, standby does not provide service externally, it is responsible for synchronizing active data in time, and converting to active to continue providing service when active fails.

Below the namenode are three datanodes.

Datanodes are responsible for storing the data in the cluster and reporting back to namenode on how they store the data.

On either side of namenode are two zkfc.

It is responsible for the failover of namenode, and when active namenode fails, zkfc converts standby namenode to active. Connected above zkfc is zookeeper, which relies on zookeeper for failover to namenode.

Above the namenode are three journalnode clusters.

journalnode is responsible for storing log files of namenode, which are written to journalnode by active namenode, standby namenode does not write logs to journalnode, standby mainly reads log files from it

Note that the log file here is not an ordinary run log, but a namenode operation log. For example, when a client uploads a file to hdfs, the namenode performs a series of operations to complete the upload, and these operations are written to the operation log (journalnode) along with the operation content, through which the upload operation can be restored.

2. namenode Introduction namenode As the core of HDFS, its main role is to manage file metadata.

Metadata mainly includes three categories: file namespace, file and block correspondence, block storage location.

Block in file and block correspondence

This is because hdfs does not store the entire file on a single datastore when storing the file, but cuts the file into a certain number of blocks according to the specified size.

namenode manages hdfs metadata

This means that all operations related to hdfs need to interact with namenode. This way namenode can't be too slow, so namenode stores metadata in memory. However, data cannot be stored only in memory, so it is necessary to persist the data to the hard disk.

namenode data persistence, using a log to speed up the way

Log is the action log mentioned above, snapshot is the data state in memory directly serialized to the hard disk. A snapshot file named fsimage is created when the namenode is formatted during cluster installation. Then when namenode runs it writes the log to the folder where the fsimage file is located. The path of writing varies according to the configuration. This log file is also written to journalnode if the configuration in article 2 of this series is used.

Finally, there will be a program that reads the snapshot file and the log file

Restore the data to the latest state before updating the original snapshot file. The next time you read snapshots and log files, read only the most recent files. The programs here will vary according to configuration, which is the standby namenode according to the configuration in article 2 of this series. Why not use active namenode to update fsimage file directly, but use standby namenode to read active log first, and then repeat the operation log recovery data and update fsimage file by standby namenode? This is because updating fsimage is a time-consuming operation, performed by an active namenode, which causes the entire cluster to be unavailable.

About "Hadoop hdfs architecture principle example analysis" This article is shared here, I hope the above content can be of some help to everyone, so that you can learn more knowledge, if you think the article is good, please share it to let more people see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.