Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the HDFS quota

2025-03-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly introduces "what is HDFS quota". In daily operation, I believe many people have doubts about what is HDFS quota. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the question of "what is HDFS quota"! Next, please follow the editor to study!

[quota]

In HDFS, quotas are used to limit the usage of directories, which can be divided into name quotas and space quotas.

A space quota refers to the total size limit for all files in a single directory, and the size of file copies is taken into account.

Name quota refers to the maximum number of files and directories in the root tree, that is, the number of files and directories under the subdirectory and grandchild directory is calculated recursively.

You can set quotas for specific directories with the following command:

# set name quota # max_number to the maximum number of files / directories # dirname to the specified directory hdfs dfsadmin-setQuota # set the space quota # bytes to the maximum number of storage bytes hdfs dfsadmin-setSpaceQuota

You can view the quota of a directory with the following command:

Hdfs dfs-count-Q / tmp/hncscwc# displays the result # name quota size name quota remaining size space quota remaining size quota remaining size directory number of file size file size directory path none inf 536870912 536870912 100 / tmp/hncscwc

You can clear quotas with the following command:

# clear name quota hdfs dfsadmin-clrQuota # clear space quota hdfs dfsadmin-clrSpaceQuota

[internal implementation]

Storage of quotas in memory

Quotas are stored in memory along with directory information.

In the internal implementation of NN, the INodeDirectory class records the information about the directory, and each directory has a specific instance object, which inherits from the abstract class INodeWithAdditionalFields, and there is a member variable of features in the parent class, which stores all the features on the inode, including ACL, quotas, snapshots, additional properties, and so on. The quota attribute records the space quota, name quota and current usage of the inode.

Persistent preservation of quotas

The quota information set will eventually be persisted to the editlog as an operation, recording the path of the directory, the corresponding space quota, and the name quota.

With the execution of checkpoint, the operation information in editlog will eventually be recorded and saved in fsiamge.

Use of quotas

When NN handles requests such as creating files, directories, or writing new files, or existing files in append, it will check and judge the corresponding directory quotas (including the current directory quota, the quota going up to the parent directory step by step, the grandfather directory quota, etc.). If the set quota is not exceeded, it is allowed to operate, and the usage of the current directory is updated in memory.

[can quota be set for users]

There is the concept of user and user group in HDFS, that is, each file / directory belongs to the specified user and user group. At the same time, you can set access permissions for files / directories by opening ACL. In this way, HDFS can support multiple users.

In a real multi-user scenario, quotas are usually set on a user-by-user basis, that is, how much space a user can use. Against the quotas of HDFS, you may need to specify which directories a user can write to, and then set quotas for these directories to achieve the function of user quotas.

So, can HDFS support quotas directly based on users? In other words, what changes does HDFS need to make if it is to support quotas per user?

The first thing to consider is that users' quota information needs to be stored persistently, so the corresponding editlog operation needs to be added, and at the same time, the corresponding storage (that is, changing the storage information of fsiamge) needs to be carried out in fsiamge. Then when each file carries out write operations, file copies, snapshots and other operations, you need to determine whether the user's quota is exceeded. In addition, in the federal scenario, the situation is more complicated.

So far, the official version does not support setting quotas for users.

In the community, there are similar issues discussed, but there are no practical conclusions or plans for corresponding design and development.

[FAQ]

If a quota is set for a directory, what happens if the directory is renamed?

As you can see from the internal implementation above, the quota is part of the directory attribute, and the directory is renamed through mv. In HDFS, the corresponding inode of the directory remains unchanged, so the quota information is still followed by the directory.

Hdfs dfsadmin-setSpaceQuota 536870912 / tmp/hncscwchdfs dfs-count-Q / tmp/hncscwc none inf 536870912 536870912 100 / tmp/hncscwchdfs dfs-mv / tmp/hncscwc / tmp/spurshdfs dfs-count-Q / tmp/hncscwc none inf 536870912536870912100 / tmp/spurs

Can the quota of the subdirectory be greater than that of the parent directory?

The quota of a subdirectory can be greater than that of the parent directory, that is, in the implementation of HDFS, the quota of the parent directory is not determined step by step when setting the quota.

However, when the file is actually stored, it is determined step by step whether the quota for the parent directory, grandfather directory, and so on is exceeded, and if so, the write fails.

Hdfs dfs-count-Q / tmp/hncscwc/ tmp/hncscwc/hadoop none inf 536870912 536870912 200 / tmp/hncscwc none inf 1073741824 1073741824 100 / tmp/hncscwc/hadoop

What if I look at the space already used in the current directory?

Through "dfs-count", you can see the quota of a specific directory and the space already left, so you can infer the actual usage space. However, for directories where no quota is set, the quota is displayed as none and the remaining space is displayed as inf, so it is not possible to infer the actual usage space of the directory.

Looking at the source code, it is found that the quota and the actual used space of a specific directory can be obtained through the client's getQuotaUsage API.

In fact, the "dfs-count" command calls the interface to get the relevant information, but adds a judgment, and if the quota is empty, the remaining space is not calculated.

At this point, the study on "what is the HDFS quota" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report