Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Introduction of HDFS file system

2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

The main content of this article is "introduction to the HDFS file system". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn the introduction to the HDFS file system.

In the last section, we briefly introduced that hadoop is mainly composed of three blocks: distributed File system (HDFS), distributed Computing Framework (MapReduce), and distributed Scheduler (yarn). From this lesson, we introduce these technologies in detail one by one. In this lesson, you will learn more about the distributed file system-- HDFS.

I. pre-class knowledge:

File system: file system (file system) is a logical storage and recovery system for naming and placing files. My understanding is a software system for managing file naming and storage.

Common formats are: windows: FAT\ FAT32\ NTFS Linux: Ext2 Ext3 Mac OS: HFS

Distributed system: distributed file system (Distributed File System,DFS) means that the physical storage resources managed by the file system are not necessarily directly connected to the local node, but are connected to the node (which can be simply understood as a computer) through the computer network.

Common ones are: GFS, HDFS, Lustre, Ceph, GridFS, mogileFS, TFS, FastDFS and so on. Each is suitable for different fields. They are not system-level distributed document systems, but application-level distributed file storage services.

File system is a huge concept, and this is just a simple introduction.

II. HDFS file system

HDFS is the foundation of big data's system, it provides the basic storage function, the idea of HDFS is very simple, that is, namenode is responsible for the recording of data storage location, and datanode is responsible for data storage. The user client will first visit namenode to ask where the data is stored, and then go to datanode for storage; the writing process is basically similar, asking where to write on namenode, and then storing the data on the corresponding datanode. So namenode as the soul of the whole system, once it is dead, the whole system can not be used.

1. Here are several concepts:

Namenode: master node, mainly responsible for HDFS cluster management and metadata information management

Datanode: slave node, mainly responsible for storing user data

SecondaryNameNode: assist namenode to manage metadata information and cold backup of metadata information

Note: how to understand metadata, the simplest analogy, we in the windows system in a file right mouse button selection attribute display is metadata, in short, it describes the file name,

Data about location, size, etc. The HDFS system uses namenode to save where the file is, how big it is, permissions and so on.

2. Storage space of HDFS

From the figure above, we can see the sum of all the disks in the total storage capacity cluster of HDFS. For example, in the figure above, HDFS provides a total of 60T of space for the upper application system, and the upper application does not need to know.

The file you saved is on that server.

Third, the principle of HDFS storage

In Hadoop, a file is divided into multiple file blocks of fixed size, which are distributed and stored in nodes in the cluster. We call this partitioned block: the size of the block,block block can be specified through the configuration file in hdfs-site.xml. In hadoop1, the block block of the file defaults to 64m, and 3 defaults to 128m.

From the above picture, we can see: several concepts:

1. Blocak division: a 300m file, divided into three pieces according to 128m, is blk1, blk2 and blk3.

2. Copy: we can see that the first block block has three copies, which are stored on different machines in the same cluster. (note: the number of copies can be set according to the configuration file.)

Note: if there is a file size of 1KB, it will also take up a block block, but the actual disk space is still the size of 1KB. Therefore, HDFS is not suitable for scenarios where small files are stored, such as ours

Some attachments in the application system are uploaded, usually the files are not too large (less than 64m), so it is not suitable to use HDFS to store.

Ask a question: why can't we modify the configuration file to make HDFS suitable for storing small files. (hint: related to namenode, the amount of data in namenode can be very large).

IV. HDFS architecture

First steal a picture to take a look, the Internet basically uses this picture to tell things:

HDFS clusters include NameNode and DataNode as well as Secondary Namenode.

NameNode is responsible for managing the metadata of the entire file system, including the hdfs directory tree, what blocks each file has, and which datanode each block is stored in.

DataNode is responsible for managing the user's file blocks, each of which can store multiple copies on multiple datanode.

A secondary daemon used by Secondary NameNode to monitor the status of HDFS, taking snapshots of HDFS metadata at regular intervals. The most important function is to assist namenode in managing metadata information.

At this point, I believe that everyone on the "introduction to the HDFS file system" have a deeper understanding, might as well to the actual operation of it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 290

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report