What is the main architecture of HDFS? 04/27 Update SLTechnology News&Howtos

What is the main architecture of HDFS?

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "what is the main structure of HDFS". In daily operation, I believe that many people have doubts about the structure of HDFS. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the doubts about "what is the main structure of HDFS?" Next, please follow the editor to study!

Preface

HDFS distributed file system, namely Hadoop Distributed Filesystem, adopts master/slave architecture and has reliability, availability and expansibility, which are also the three key indicators of distributed system.

I. HDFS architecture

The architecture of HDFS is mainly introduced from five aspects: basic framework, read-write process, high availability and compression serialization. This article introduces: basic framework.

1. The basic framework of the following HDFS

1.1 NameNode

It is used to store the metadata of the file system (the metadata of the file / directory and the list of blocks corresponding to each file) and to accept RPC requests from the client. The file includes: fsimage, editlog, that is to say, the data in memory of NameNode consists of two parts: the increasing editlog of fsimage+.

Fsimage: is the metadata mirror file of the file system.

Editlog: operation log files of the file system (add or delete file records)

1.2 DataNode

The place where the data is actually stored is the working node of the file system (scheduled by the client or namenode) and periodically sends a list of blocks to namenode. File block block (logical concept, easy to find and store quickly, Replication is the storage instance of block, Replication is multiple copies. The default is three. ) is the basic unit of storage. The default Block size of HDFS is 128MB. If a file size is 1G, then there are 8 block (1024 block 128x8), and each bock has multiple replicas distributed on different nodes. DataNode will always keep the number of copies, and if one copy is damaged, the system will read a copy from other nodes and copy it to a functioning machine to ensure that the number of copies returns to the normal value. Keep in touch with NameNode by sending a heartbeat (once every 3 seconds). If the NameNode10 does not receive a heartbeat from DataNode in a minute, it is considered to have lost and copy the block on it to other DataNode.

two。 The relationship between NameNode and SecondaryNameNode is described below.

2.1SecondaryNameNode

SecondaryNameNode is not a backup of NameNode, but a secondary NameNode, which plays an important role in editing logs and spatial mirrors on a regular basis (by default, one hour) to prevent editing logs from getting too large. This secondary NameNode typically runs on a separate physical computer because it takes up the same amount of CPU time and memory as NameNode to perform the merge operation.

2.2CheckPoint Node

It may be because Secondary NameNode is easy to mislead people, so after Hadoop 1.0.4, it is recommended not to use Secondary NameNode, but to use CheckPoint Node. The role and configuration of Checkpoint Node and Secondary NameNode are exactly the same, except that the startup command is different bin/hdfs namenode-checkpoint.

2.3 Federal HDFS

NameNode keeps the reference relationship between each file and each data block in the file system in memory, which means that in a cluster with a large number of files, memory will become a bottleneck limiting the scale-out of the system. Federated HDFS was introduced in version 2.x, allowing extensions to be implemented by adding NameNode, where each NameNode manages a portion of the file system. For example, one NameNode manages all the files in the / user directory, and another NameNod may manage all the files in the / share directory.

At this point, the study on "what is the main structure of HDFS" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.