Simple usage of distributed File system HDFS 07/19 Update SLTechnology News&Howtos

Simple usage of distributed File system HDFS

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "the simple use of distributed file system HDFS". In daily operation, I believe that many people have doubts about the simple use of distributed file system HDFS. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "simple use of distributed file system HDFS". Next, please follow the editor to study!

In the modern enterprise environment, the single machine capacity is often unable to store a large amount of data, so it needs to be stored across machines. Unified management of file systems distributed on clusters is called distributed file systems.

HDFS

HDFS (Hadoop Distributed File System) is a subproject of the Apache Hadoop project. Hadoop is ideal for storing large data (such as TB and PB), which uses HDFS as a storage system. HDFS uses multiple computers to store files and provides a unified access interface.

HDFS is designed based on Google's paper "The Google File System".

The four basic components of HDFS: HDFS Client, NameNode, DataNode, and Secondary NameNode.

Client

Client is the client. HDFS Client file is segmented. When the file is uploaded to the HDFS, the Client splits the file into a Block and stores it. Client provides commands to manage and access HDFS, such as starting or shutting down HDFS.

NameNode

NameNode is master, which is a supervisor and manager. Manage HDFS metadata (file path, file size, file name, file permissions, file block slice information).

NameNode manages Block replica policy: default 3 replicas to handle client read and write requests.

DataNode

DataNode is Slave. NameNode gives the command, and DataNode performs the actual operation.

DataNode stores the actual data blocks and performs read / write operations on the data blocks. Report block information to namenode regularly.

Secondary NameNode

SecondaryNameNode is not a backup of NameNode. When NameNode dies, it does not immediately replace NameNode and provide services.

Assist NameNode and share its workload. In case of emergency, the recovery of NameNode can be assisted.

Copy mechanism

HDFS is designed to store very large files reliably across machines in a large cluster. It stores each file into a series of blocks called block, all of which are of the same size except the last one.

For fault tolerance, all block of the file will have copies. The block size and replica factor for each file are configurable.

In hadoop2, the block block size of the file defaults to "128m" (134217728 bytes).

As shown in the figure above, when a 300m a.txt is uploaded to HDFS, it needs to be split into 128m, and less than 128m is divided into another block.

HDFS basic command

HDFS is easy to use

According to the deployed service, our HDFS root directory is hdfs://192.168.147.128:9820. Let's try to create a subdirectory user under the root directory, as shown in the following command:

[hadoop@node01 ~] $hadoop fs-mkdir / user

Then open HDFS on the Hadoop page.

The user folder will be seen at this point.

Next, upload a file with a size of 300m to the user folder in HDFS

Then you can see the file you just uploaded on the Hadoop page.

At this time, three block are separated.

Click download to download.

At this point, the study on the "simple use of distributed file system HDFS" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.