What is the architecture of HDFS 04/18 Update SLTechnology News&Howtos

What is the architecture of HDFS

2026-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces what is the architecture of HDFS. It is very detailed and has certain reference value. Friends who are interested must finish it.

Distributed file system (HDFS) is a distributed file system, which is designed to run on commercial hardware. It has many similarities with the existing distributed file system. However, it is also very different from other distributed file systems. HDFS aims to have high fault tolerance, deploy at low cost, and provide high throughput access to application data. Suitable for applications with large datasets.

HDFS adopts a typical Master/Slave system architecture. A HDFS cluster usually contains a NameNode node and several DataNode nodes. A file is divided into one or more data blocks and stored on a set of DataNode. DataNode nodes can be distributed in different racks. NameNode performs operations such as opening, closing, and renaming files or directories of the file system namespace, and is responsible for managing the mapping of data blocks to specific DataNode nodes. Under the unified scheduling of NameNode, DataNode is responsible for handling read / write requests from file system clients and completing the creation, deletion and replication of data blocks.

NameNode and DataNode

HDFS has a master-slave architecture, the NameNode node is responsible for the task scheduling of the cluster, the DataNode node is responsible for executing tasks and storing data blocks, NameNode manages the namespace of the file system, and maintains the file directory tree of the entire file system and the index directory of these files. This information is stored in the local file system in the form of namespace mirroring and editing logs. From NameNode, you can get the location of each block of each file stored in the DataNode node, and NameNode will dynamically reconstruct this information each time the system is started. The client obtains metadata information through NameNode and interacts with DataNode to access the entire file system.

A single NameNode: a master server that manages file system namespaces and administrative client access to files. In addition, there are many DataNode: usually one per node in the cluster, which is used to manage the storage connected to the node on which they are running.

DataNode is the working node of the file system, which is called by the client and NameNode and performs specific tasks, storing file blocks. When the heartbeat mechanism is established, DataNode sends the stored file block information to NameNode to report its working status.

Data block

A data block is the smallest unit in which a disk reads / writes data. Files are stored on disk in blocks, and the file system can manipulate data that is an integral multiple of the block size each time. Files in HDFS are also divided into logical blocks for storage. The size of the database in HDFS affects the addressing overhead. The smaller the data block, the greater the addressing overhead. If the database is set up large enough, the time to transfer data from disk will be significantly longer than the time needed to locate the starting location of the data block. Therefore, the time to transfer a file consisting of multiple blocks depends on the disk transfer efficiency, and the user must make an optimal choice on the block size setting.

As a distributed system, HDFS has the advantages of using abstract data blocks:

Through the cluster scalability, you can store files of any size that are larger than the capacity of any disk in the network.

Using the abstract block instead of the whole file as the storage unit simplifies the storage subsystem, and the fixed block size facilitates the separate storage of metadata and file data blocks.

Facilitate backup and data fault tolerance and provide system availability.

These are all the contents of the article "what is the Architecture of HDFS?" Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.