Hadoop core components: four steps to understand HDFS 04/22 Update SLTechnology News&Howtos

Hadoop core components: four steps to understand HDFS

2025-04-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Hadoop distributed file system (HDFS) is designed to be suitable for distributed file systems running on general hardware. It provides high throughput to access application data, and is suitable for applications with large datasets. So how do we operate and use it in practical applications?

1. HDFS operation mode:

1. Command line operation

-FsShell:

$hdfs dfs

2. Other computing frameworks-such as spark

Through URI, such as: hdfs://nnhost:port/file... Call the protocol, host, port or external service media and files of HDFS to access HDFS in the program of spark.

3. Other programs:

(1) API of Java can access HDFS with the help of other computing frameworks or analysis tools, such as Sqoop loading data into HDFS,Flume loading log to HDFS,Impala query based on HDFS

(2) REST API: access HDFS through HTP.

Second, focus on the way of HDFS command line:

(1) copy the file foo.txt from the local disk to the user directory of HDFS

-the file will be copied to / user/username/foo.txt

(2) get the directory list of the user's home directory

(3) obtain the root directory of HDFS

(4) display the contents of HDFS file / user/fred/bar.txt

(5) copy the file to the local disk and name it baz.txt

(6) create an input directory under the user home directory

(7) Delete the input_old directory and all its contents

Third, operate through HUe.

Through File Browser, you can browse and manage HDFS directories and files, create, move, rename, modify, upload, download and delete directories and files, and view file contents

IV. HDFS recommendation

HDFS is the warehouse of all data, and its directories (such as log directories, data directories) should be reasonably planned and organized when using HDFS. The best practice is to define the standard directory structure and separate the temporary data in the phase. Examples of planning are as follows:

(1) / user- user directory, which stores data and configuration information belonging to individual users

(2) data of etl-ETL phase

(3) / data temporarily generated by tmp- users to share among users

(4) / data- datasets used for analysis and processing throughout the organization

(5) / app- non-data files, such as configuration files, JAR files, SQL files, etc.

Mastering the above four steps has an important role and significance for the application of HDFS, but we should step by step according to their own situation and pay attention to practice in order to make continuous progress. I usually like to find some cases for analysis, so as to exercise and improve my skills. I prefer the Wechat service platform "big data cn". But true knowledge comes more from practice, and only by learning and understanding the experiences of others can we go higher and further. I love to follow Wechat Subscription account's "big data era Learning Center" and study the experience sharing of data Daniel. It is of great significance to promote my personal technological growth.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.