Analysis of HDFS Architecture of Hadoop distributed File system 07/06 Update SLTechnology News&Howtos

Analysis of HDFS Architecture of Hadoop distributed File system

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

In this article, the editor introduces "Hadoop distributed file system HDFS architecture analysis" in detail, the content is detailed, the steps are clear, and the details are handled properly. I hope this "Hadoop distributed file system HDFS architecture analysis" article can help you solve your doubts.

Hadoop distributed file system (HDFS) is a distributed file system based on Java. It has the advantages of fault tolerance, scalability and scalability. It can be run on commercial hardware or deployed on low-cost hardware. HDFS is a distributed storage Hadoop application that provides an interface closer to data.

HDFS architecture

The hdfs architecture diagram is shown in the following figure:

HDFS has a master / slave architecture. A HDFS cluster consists of a single NameNode and multiple datanode.

NameNode: the master server that manages the file system namespace and manages client access to files, such as opening, closing, and renaming files and directories. Responsible for managing the correspondence between file directories, files and block, as well as between block and datanode, maintaining the directory tree and taking over users' requests. As shown in the following figure:

1, save the metadata of the file in a file directory tree 2, save it as fsimage and edits3 on disk, save the data information of datanode, and read it into memory when the system starts.

DataNode: (data nodes) manage the storage connected to the node on which they are running and handle read and write requests from the file system client. DataNodes also performs block creation, deletion

Client: (client) represents the user to access the entire file system through interaction with nameNode and datanode. HDFS opens the file namespace to the outside world and allows user data to be stored as files. The user communicates with HDFS through the client (Client).

* * Block and replication: * * We all know that the default size of disk blocks in the linux operating system is 512, while the default size of blocks in the hadoop2.x version is 128m, so why are the storage blocks in hdfs designed to be so large? The purpose is to reduce the overhead of addressing. As long as the block is large enough, the data transfer time of the disk must be significantly longer than the addressing time of the block.

So why store the file in blocks instead of the whole file? 1. Because a file can be very large and can be larger than the capacity of a disk, it can be stored in blocks and can be used to store files of any size. 2. Simplify the design of the storage system. Because blocks are fixed in size, it is much easier to calculate the storage capacity of disks. 3. Storage in the form of blocks does not need to be stored on one disk, but can be distributed on disks of various file systems, which is conducive to replication and fault tolerance. Data localization computing

Blocks and copies are distributed in the hdfs architecture as shown in the following figure:

Since namenode manages the namespace of the file system and maintains the file system tree and all the files and directories in the whole tree, this information is permanently saved on the local disk in the form of files, asking the namespace image file fsimage and editing log file Edits respectively. Datanode is the working node of the file, stores and retrieves blocks as needed, and periodically sends a list of blocks they store to namenode. Then you know how important namenode is. Once namenode is down, the entire distributed file system cannot be used, so fault tolerance for namenode is particularly important. Hadoop provides two fault tolerance mechanisms for this:

By persisting the metadata that make up the file system, ask the namespace mirror file fsimage (the directory tree of the file system) and edit the log file Edits (records of changes made to the file system). The image FsImage on disk is a Checkpoint, a milestone reference point, synchronization point. With a Checkpoint, NameNode only operates on the directory image in memory for a long time, and also operates on the Edits on the disk until it shuts down. The next time you boot, NameNode will load the directory image FSImage from disk, which is actually the old Checkpoint, which may be the image saved after the last boot, and all changes to the file system since the last boot until the shutdown are recorded in the Edits file; replay the operation recorded in Edits in the previous image, you will get the new image this time, and write it back to disk as the new Checkpoint (that is, fsImage). But this has a big disadvantage, if the Edits is very large, the process of generating the original image after boot will be very long, so improve it: whenever the Edits grows to a certain extent, or every certain time, do a Checkpoint, but this will cause a lot of load to the namenode, will affect the performance of the system. So there is the need for SecondaryNameNode, which is the equivalent of NameNode's assistant, who specializes in Checkpoint for NameNode. Of course, the load of SecondaryNameNode is light by comparison. So if you have a hot backup for NameNode, you can make hot backup part-time without having a full-time SecondaryNameNode. So the architecture diagram is shown below:

SecondaryNameNode working schematic diagram:

SecondaryNameNode is mainly responsible for downloading fsImage file and Edits file in NameNode, merging them to generate new fsImage files, and pushing them to NameNode. The working principle is as follows:

1. Secondarynamenode requests the master namenode to stop using the edits file and temporarily record the new write operation in a new file; 2, secondarynamenode gets the fsimage and edits files from the main namenode (through http get) 3, secondarynamenode loads the fsimage file into memory, and performs the operations in the edits file one by one to create a new fsimage file. 4. Secondarynamenode sends the new fsimage file back to the main namenode (using http post). 5. Namenode replaces the old fsimage file with the fsimage file received from secondarynamenode; the old edits file is replaced with the edits file generated in step 1. At the same time, the fstime file is updated to record the checkpoint execution time. 6. Eventually, the main namenode has the latest fsimage file and a smaller edits file. When namenode is in safe mode, the administrator can also call the hadoop dfsadmin-saveNameSpace command to create a checkpoint.

From the above process, we can clearly see why secondarynamenode and the main namenode have similar memory requirements (because secondarynamenode also loads fsimage files into memory). Therefore, in a large cluster, secondarynamenode needs to run on a dedicated machine.

The trigger condition for creating a checkpoint is controlled by two configuration parameters. Typically, secondarynamenode creates checkpoints every hour (with the fs.checkpoint.period property setting), and also when the edit log reaches the size of 64MB (with the fs.checkpoint.size property setting). The system checks the size of the edit log every five minutes.

Third, HDFS read data flow

The HDFS read data flow is shown in the following figure:

1. The client opens the file you want to read through the open () method of the FileSystem object (DistributedFileSystem).

2. DistributedFileSystem calls namenode through remote call (RPC) to get the start and end location of each file. For each block, namenode returns the datanode of the copy of the block. These datanode are sorted according to their distance from the client (the network topology of the cluster), and if the client itself is one of the datanode, then the data is read on that datanode. After the DistributedFileSystem remote call, a FSDataInputStream (input stream supporting file location) object is returned to the client to facilitate reading the data, and then FSDataInputStream encapsulates a DFSInputStream object. This object manages the IO of datanode and namenode.

3. The client calls the read () method on this input stream, and the DFSInputStream that stores the datanode address of the first few blocks of the file connects to the datanode of the first block in the nearest file, and repeatedly calls the read () method through the data stream to transfer the data from the datanode to the client. When the block is read, DFSInputStream closes the connection to the datanode and addresses the next datanode with the best location.

When the client reads data from the stream, the blocks are read in the order in which the new connection between DFSInputStream and datanode is opened. It also needs to ask namenode to retrieve the location of the datanode for the next batch of required blocks. Once the client finishes reading, the close () method is called on the FSDataInputStream.

Note: if DFSInputStream encounters an error in communicating with datanode while reading data, it will try to read data from another adjacent datanode in this block. He also remembers the faulty datanode to ensure that subsequent blocks on that node are not read repeatedly later. DFSInputStream also verifies the completeness of the data sent from datanode by checksum. If a corrupted block is found, DFSInputStream notifies namenode before attempting to read a copy of a block from another datanode.

After reading this, the article "HDFS Architecture Analysis of Hadoop distributed File system" has been introduced. If you want to master the knowledge points of this article, you still need to practice and use it yourself. If you want to know more about related articles, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.