What is the HDFS architecture? 07/04 Update SLTechnology News&Howtos

What is the HDFS architecture?

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail what the HDFS architecture is, and the editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.

The whole Hadoop architecture is mainly through HDFS to achieve the underlying support for distributed storage, and through MR to achieve the program support for distributed parallel task processing.

HDFS adopts the master-slave (Master/Slave) structure model, and a HDFS cluster is made up of one NameNode and several DataNode (multiple NameNode configurations have been implemented in the latest version of Hadoop2.2-this is also a function implemented by some large companies by modifying the hadoop source code, which has been implemented in the latest version). NameNode acts as the primary server, managing the file system namespace and client access to files. DataNode manages the stored data. HDFS supports data in file form.

Internally, the file is divided into several blocks, which are stored on a set of DataNode. NameNode performs the namespace of the file system, such as opening, closing, renaming files or directories, etc., and is also responsible for mapping data blocks to specific DataNode. DataNode is responsible for handling the file reading and writing of the file system client, and the creation, deletion and replication of the database under the unified scheduling of NameNode. NameNode is the manager of all HDFS metadata, and user data never passes through NameNode.

Three roles are involved in the figure: NameNode, DataNode, and Client. NameNode is the manager, DataNode is the file store, and Client is the application that needs to acquire the distributed file system.

File write:

1) Client initiates a request for file writing to NameNode.

2) NameNode returns information about the DataNode it manages to Client based on file size and file block configuration.

3) Client divides the file into multiple block, and writes the block sequentially to the DataNode block according to the address of the DataNode. File read:

1) Client initiates a request to read the file to NameNode. 2) NameNode returns the DataNode information stored in the file. 3) Client reads file information.

As a distributed file system, HDFS can be used for reference in data management:

File block placement: a Block will have three backups, one on the DateNode specified by the NameNode, one on the DataNode on the same machine as the specified DataNode, and one on the DataNode on the same Rack for the specified DataNode. The purpose of backup is for data security, which is adopted to take into account the failure of the same Rack and the performance problems caused by different data copies.

This is the end of this article on "what is HDFS Architecture?". I hope the above content can be helpful to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.