How to configure disk storage policy in Hdfs 07/02 Update SLTechnology News&Howtos

How to configure disk storage policy in Hdfs

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail how to configure disk storage strategy in Hdfs. The content of the article is of high quality, so the editor will share it with you for reference. I hope you will have some understanding of the relevant knowledge after reading this article.

1. Hdfs disk storage policy 1. Specify local directory storage policy

Data directory is DISK corresponding to Hot policy.

Data1 directory is ARCHIVE corresponding to Cold policy.

Dfs.datanode.data.dir [DISK] / opt/beh/data/namenode/dfs/data, [ARCHIVE] / opt/beh/data/namenode/dfs/data1

Restart hdfs

$stop-dfs.sh$ start-dfs.sh2, specify the storage policy of the hdfs directory

View hdfs storage policy

$hdfs storagepolicies-listPoliciesBlock Storage Policies: BlockStoragePolicy {COLD:2, storageTypes= [ARCHIVE], creationFallbacks= [], replicationFallbacks= []} BlockStoragePolicy {WARM:5, storageTypes= [DISK, ARCHIVE], creationFallbacks= [DISK, ARCHIVE], replicationFallbacks= [DISK, ARCHIVE]} BlockStoragePolicy {HOT:7, storageTypes= [DISK], creationFallbacks= [], replicationFallbacks= [ARCHIVE]} BlockStoragePolicy {ONE_SSD:10, storageTypes= [SSD, DISK], creationFallbacks= [SSD, DISK], replicationFallbacks= [SSD, SSD]} DISK {DISK StorageTypes= [SSD], creationFallbacks= [DISK], replicationFallbacks= [DISK]} BlockStoragePolicy {LAZY_PERSIST:15, storageTypes= [RAM _ DISK, DISK], creationFallbacks= [DISK], replicationFallbacks= [DISK]}

Create 2 hdfs directories

$hadoop fs-mkdir / Cold_data $hadoop fs-mkdir / Hot_data

Specify hdfs directory storage policy

$hdfs storagepolicies-setStoragePolicy-path hdfs://breath:9000/Cold_data-policy COLD Set storage policy COLD on hdfs://breath:9000/Cold_data$ hdfs storagepolicies-setStoragePolicy-path hdfs://breath:9000/Hot_data-policy HOT Set storage policy HOT on hdfs://breath:9000/Hot_data

Check whether the storage policy of the 2 directories is correct

$hdfs storagepolicies-getStoragePolicy-path / Cold_dataThe storage policy of / Cold_data:BlockStoragePolicy {COLD:2, storageTypes= [ARCHIVE], creationFallbacks= [], replicationFallbacks= []} $hdfs storagepolicies-getStoragePolicy-path / Hot_data The storage policy of / Hot_data:BlockStoragePolicy {HOT:7, storageTypes= [DISK], creationFallbacks= [], replicationFallbacks= [ARCHIVE]} 3, Storage Test

View the size of the storage directory for unuploaded files

$cd / opt/beh/data/namenode/dfs$ du-sh * 38m data16K data130M name14M namesecondary

Generate a file with a size of 1000m

$dd if=/dev/zero of=test.txt bs=1000M count=1 recorded 1'0 read in and 1'0 write out 1048576000 bytes (1.0 GB) replicated, 3.11214 seconds, 337 MB/ seconds

Upload the generated files to the / Cold_data directory

$hadoop fs-put test.txt / Cold_data

[X] View the size of the storage directory at this time

$du-sh * 38m data1008M data130M name14M namesecondary4, test result description

All uploaded files are stored in the data1 directory.

Because the / Cold_data on hdfs specifies the COLD policy, which corresponds to the data1 directory of the ARCHIVE policy in hdfs-site.xml, the file storage achieves the purpose of testing.

2. Hdfs reserved space configuration 1. Parameter modification

Modify hdfs-site.xml configuration file and add parameters

Dfs.datanode.du.reserved 32212254720 dfs.datanode.data.dir [ARCHIVE] / opt/beh/data/namenode/dfs/data

Description

Set the dfs.datanode.du.reserved parameter. 32212254720 indicates that the reserved space is 30g.

Modify dfs.datanode.data.dir to keep only one local storage directory

-restart hdfs

$stop-dfs.sh$ start-dfs.sh2, upload files

View disk space

$df-h file system capacity used available mount point / dev/mapper/centos-root 46G 14G 32G 31% / devtmpfs 7.8G 07.8G 0% / devtmpfs 7.8G 07.8G 0% / dev/shmtmpfs 7.8G 8.5m 7. 8G 1% / runtmpfs 7.8G 0 7.8G 0% / sys/fs/cgroup/dev/vda1 497M 125M 373M 25% / boottmpfs 1.6G 0 1.6G 0% / run/user/0tmpfs 1.6G 01.6G 0% / run/user/1000

Upload files to hdfs, one 2G file at a time

$hadoop fs-put test1.txt / Cold_data/test1.txt $hadoop fs-put test1.txt / Cold_data/test2.txt. $hadoop fs-put test1.txt / Cold_data/test7.txt$ hadoop fs-put test1.txt / Cold_data/test8.txt16/11/12 16:30:54 INFO hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed (PBHelper.java:2239) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream (DFSOutputStream.java:1451) at org.apache.hadoop.hdfs. DFSOutputStream$DataStreamer.nextBlockOutputStream (DFSOutputStream.java:1373) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run (DFSOutputStream.java:600) 16:30:54 on 16-11-12 INFO hdfs.DFSClient: Abandoning BP-456596110-192.168.134.129-1450512233024:blk_1073744076_325416/11/12 16:30:54 INFO hdfs.DFSClient: Excluding datanode DatanodeInfoWithStorage [10.10.1.31 virtual 50010 DS-01c3c362-44f4-46eb-a8d8-57d2c2d5f196authorised] 16:30:54 on 16-11-12 WARN hdfs.DFSClient: DataStreamer Exceptionorg.apache.hadoop.ipc.RemoteException (java.io.IOException): File / Cold_data/test8.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (= 1). There are 1 datanode (s) running and 1 node (s) are excluded in this operation. At org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock (BlockManager.java:1541) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock (FSNamesystem.java:3289) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock (NameNodeRpcServer.java:668) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock (AuthorizationProviderProxyClientProtocol.java:212) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB .addBlock (ClientNamenodeProtocolServerSideTranslatorPB.java:483) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod (ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call (ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call (RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run (Server.java:2044) at org.apache.hadoop.ipc.Server$Handler$1.run ( Server.java:2040) at java.security.AccessController.doPrivileged (Native Method) at javax.security.auth.Subject.doAs (Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run (Server.java:2038) at org.apache.hadoop.ipc.Client.call (Client.java:1468) at org.apache .hadoop.ipc.Client.call (Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke (ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy9.addBlock (Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock (ClientNamenodeProtocolTranslatorPB.java:399) at sun.reflect.NativeMethodAccessorImpl.invoke0 (NativeMethod) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:57) At sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod (RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke (RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy10.addBlock (Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock (DFSOutputStream.java:1544) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream (DFSOutputStream.java:1361) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run (DFSOutputStream.java:600) put: File / Cold_data/test8.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (= 1) There are 1 datanode (s) running and 1 node (s) are excluded in this operation.

Analysis.

At this point, the space size of the data directory / opt/beh/data/namenode/dfs is as follows

$cd / opt/beh/data/namenode/dfs$ du-sh * 15G data12K data134M name19M namesecondary

[X] View disk space at this time

$df-h File system capacity used available mount point / dev/mapper/centos-root 46G 27G 19G 59% / devtmpfs 7.8G 07.8G 0% / devtmpfs 7.8G 07.8G 0% / dev/shmtmpfs 7.8G 8.5m 7.8g 1% / runtmpfs 7.8G 0 7.8G 0% / sys/fs/cgroup/dev/vda1 497M 125M 373M 25% / boottmpfs 1.6G 0 1.6G 0% / run/user/0tmpfs 1.6G 01.6G 0% / run/user/10003, Summary

An error indicates that the disk reserved space configuration takes effect, but if you check the disk space, you can see that the remaining free space in the local directory is not the reserved space set by Hdfs.

Hdfs determines that the free storage of a data directory is the total space of the disk on which the current directory resides (46G in this case) and is not the free space of the current directory.

The actual HDFS's remaining space calculation:

Total space of the current directory (disk) 46G-the total space used by Hdfs 15G=31G

At this time, the reserved space is 30g, so the remaining free space in hdfs is 1G, so the above error occurs when uploading a file with a size of 2G again.

Because the storage of / directory is directly used in the test here, other non-Hdfs takes up part of the space. When the data directory of hdfs corresponds to a single disk one by one, and the remaining free space of each disk is equal to the value of reserved space configuration, data will not be written to the disk.

On how to configure disk storage strategy in Hdfs to share here, I hope the above content can be of some help to you, you can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.