Example Analysis of HDFS Command Line Interface 07/19 Update SLTechnology News&Howtos

Example Analysis of HDFS Command Line Interface

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

Editor to share with you the example analysis of the HDFS command line interface. I hope you will get something after reading this article. Let's discuss it together.

Now we will interact with HDFS from the command line. HDFS has many other interfaces, but the command line is the simplest and most familiar to many developers.

When we set up the pseudo-distribution configuration, there are two properties that need to be further explained. The first is fs.default.name, which is set to hdfs://localhost/, to set the default file system for Hadoop. The file system is specified by URI, and here we have used a hdfs URI to configure HDFS as the default file system for Hadoop. The HDFS daemon will use this attribute to determine the host and port of the HDFS name node. We will run on localhost, and the default port is 8020. In this way, HDFS users will know where the name node is running through this attribute so that they can connect to it.

The second attribute, dfs.replication, is set to 1 so that HDFS does not make 3 copies of the file system block by default. When running on a single data node, HDFS cannot copy the block to three data nodes, so it continues to warn that there are not enough copies of the block. This setting solves this problem.

Basic file system operation

The file system is ready, and we can perform operations like all other file systems, such as reading files, creating directories, moving files, deleting data, listing index directories, and so on. Enter the hadoop fs-help command to see the detailed help files for all commands.

First copy a file from the local file system to HDFS:

1.% hadoopfs-copyFromLocal input/docs/quangle.

Txt hdfs://localhost/user/tom/quangle.txt

This command invokes the shell command fs of the Hadoop file system, providing a series of subcommands. Here, we are implementing-copyFromLocal. The local file quangle.txt is copied to the / user/tom/quangle.txt file in the HDFS entity running on localhost. In fact, we can omit the format and host of URI and choose the default setting, that is, omit hdfs://localhost, as specified in core-site.xml.

1.% hadoop fs-copyFromLocal input/docs/quangle.

Txt / user/tom/quangle.txt

You can also use a relative path and copy the files to the home directory, / user/tom:

1.% hadoop fs-copyFromLocal input/docs/quangle.txt quangle.txt

Let's copy the file back to the local file system to see if it is the same:

1.% hadoop fs-copyToLocal quangle.txt quangle.copy.txt

2.% md5 input/docs/quangle.txt quangle.copy.txt

3. MD5 (input/docs/quangle.txt) = a16f231da6b05e2ba7a339320e7dacd9

4. MD5 (quangle.copy.txt) = a16f231da6b05e2ba7a339320e7dacd9

The results of the MD5 analysis are the same, indicating that the file survived and was complete during the trip to HDFS.

Finally, let's take a look at the list of HDFS files. Let's create a directory to see how it appears in the list:

1.% hadoop fs-mkdir books

2.% hadoop fs-ls.

3. Found 2 items

4. Drwxr-xr-x-tom supergroup 0

2009-04-02 22:41 / user/tom/books

5.-rw-r--r-- 1 tom supergroup 118

2009-04-02 22:29 / user/tom/quangle.txt

The result of the information returned is very similar to the output of the Unix command ls-l, with only slight differences. The first column shows the file format. The second column is the number of copies of this file (this is not available in the Unix file system). Since the default number of replicas we set is 1 within the scope of the site, what is shown here is also 1. The opening catalog column of this column is empty because the concept of copy is not applied-the directory is used as metadata and exists in the name node, not the data node. The third and fourth columns show the users and groups of the file. The fifth column is the file size, shown in bytes, and the directory size is 0. The sixth and seventh columns are the date and time when the document was last modified. The last eighth column is the absolute path to the file or directory.

File license in HDFS

HDFS has a licensing model very similar to POSIX for files and directories.

There are three forms of license: read license (r), write license (w), and execute license (x). A read license is required to read a file or list the contents of a directory. Write permission is required to write to a file, or to create or delete a file or directory on a directory. Execution licenses can be ignored for files because files cannot be executed in HDFS (unlike POSIX), but are required when accessing children of a directory.

Each file and directory has a user, group, and mode. This model consists of the license of the user to which it belongs, the license of other members of the group, and the license of other users.

The identity of the client is determined by the username (name) and groups (group) of the process it is running. Because the client is remote, anyone can simply create an account on the remote system to access it. Therefore, licenses can only be used by users in a cooperative community as a mechanism for sharing file system resources and preventing accidental loss of data, and cannot protect resources in a hostile environment. However, despite these shortcomings, it is worthwhile to use licenses to prevent users or automated tools and programs from accidentally modifying or deleting important parts of the file system (this is also the default configuration, see the dfs.permissions property).

If license checking is enabled, both the user license and the group license are checked to confirm that the user name of the user is the same as the user license, and that he is a member of this user group; if not, check for other licenses.

Here is the concept of a superuser, which is the identity of the name node process. For superusers, the system does not perform any license checks.

After reading this article, I believe you have some understanding of "sample Analysis of HDFS Command Line Interface". If you want to know more about it, you are welcome to follow the industry information channel. Thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.