What is HDFS in Hadoop distributed file system 04/27 Update SLTechnology News&Howtos

What is HDFS in Hadoop distributed file system

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

In this issue, the editor will bring you about what HDFS is in the Hadoop distributed file system. The article is rich in content and analyzes and describes it from a professional point of view. I hope you can get something after reading this article.

HDFS design

Super large file PB level

Streaming data access

Write once and read multiple times is the most efficient access mode

Commercial hardware (does not require expensive and highly available hardware)

The node failure rate is high, and some applications are not suitable.

Data access with low time latency

Data access that requires low latency, for example, a range of tens of milliseconds is not suitable for running on HDFS

A large number of small files

Namenode stores the metadata of the file system in the content, so the total number of files that the file system can store is limited by the memory capacity of the namenode

According to experience, the storage information of each file, directory and data block is about 150 bytes.

Multi-user write, arbitrarily modify the file

Files in HDFS support only a single writer, and write operations always write data at the end of the file in an "add-only" manner, which does not support multiple writers or modifications anywhere in the file.

HDFS concept

Block default 128mb

Files whose HDFS is less than one block size will not occupy the whole block space.

Hdfs fsck /-files-blocks

Namenode and datanode

1. To back up the files that make up the persistent state of the file system metadata, the general configuration is to write the persistent state to the local disk while writing to NFS

Running the auxiliary namenode generally runs on a separate physical machine. After the primary node is dead, enabling the secondary node does not guarantee data integrity

Namenode manages the namespace of the file system. It maintains the file system tree and all files and directories in the entire tree. This information is permanently saved on the local disk in the form of two files: the namespace image file and the edit log file. Namenode also records the data node information of each block in each file, but it does not permanently preserve the location information of the block, because this information is rebuilt based on the data node information when the system starts.

Datanode stores and retrieves blocks as needed, and periodically sends a list of blocks they store to namenode

Namenode fault-tolerant mechanism

Block caching

By default, a block is cached in the memory of an datanode

Improve the performance of reading

Cache pool

Federal HDFS

Federated HDFS allows the system to extend by adding namenode

Each namenode manages a portion of the file system namespace

The pair do not communicate with each other

High availability of HDFS

Namenode needs to share editing logs through highly available shared storage.

Datanode needs to send block processing reports to both namenode at the same time, because the block mapping information is stored in namenode memory, not disk

The client needs to use a specific mechanism to deal with the failure of namenode, which is transparent to users.

The role of the secondary namenode is contained in the standby namenode

Select NFS or Group Log Manager (QJM) for shared storage

Failover and avoidance

Revoke namenode access to the shared storage directory

Block network port

Power outage and so on.

High availability cannot be achieved by backing up namenode

Architectural support for HA

Command line interface

Namenode running port 8020

From the local copy file to the hdfs / user/hadoop/ directory

Hadoop fs-copyFromLocal max_temperature.sh a.sh

Copy files back to the local file system

Hadoop fs-copyToLocal a.sh my.sh

Md5sum max_temperature.sh my.sh

Create a new directory

Hadoop fs-mkdir books

Hadoop fs-ls.

Security measures are disabled by default, and dfs.premissions.enabled should be enabled in the production environment

Execution permissions make no sense to a file

Hadoop file system

HDFS is just one of the implementations

Hadoop file system Local,HDFS,WebHDFS,Secure WebHDFS, HAR, View, FTP, S3, Azure, Swift

List the files hadoop fs-ls file:/// under the root directory of the local file system

Although running MapReduce programs can access any file system, it is recommended that you choose a distributed file system that has the advantage of data localization, such as HDFS, when dealing with large datasets

Interface

The user control file system allows the user to integrate the implemented file system as a Unix file system.

By using the Fuse-DFS module, HDFS can be mounted as a standard local file system.

NFS gateway is a better solution and should be chosen.

It is possible to mount HDFS as a local client file system using hadoop's NFSv3 gateway.

You can use the Unix program to interact with the file system

HDFS can only write files in append mode, so you can add data to the end of the file, but you cannot modify the file at random

The HTTP interface is slower than the native Java client, so don't use it to send out extra-large data

Direct access by access method (the web server embedded in namenode and datanode runs as the end node of WebHDFS, dfs.webhdfs.enabled is set to true)

Access through proxy

The HttpFS agent provides the same HTTp interface as WebHDFS to use httpfs.sh scripts

HTTP

C language (libhdfs,libwebhdfs) the underlying code is java

NFS

FUSE

Java interface

Hadoop jar hadoopdemo-1.0-SNAPSHOT.jar URLCat hdfs:///user/hadoop/output/part-r-00000

Public class FileSystemCat {public static void main (String [] args) throws Exception {String uri = args [0]; Configuration conf = new Configuration (); FileSystem fs = FileSystem.get (URI.create (uri), conf); InputStream in = null; try {in = fs.open (new Path (uri)); IOUtils.copyBytes (in, System.out, 4096, false);} finally {IOUtils.closeStream (in) }}}

Hadoop jar hadoopdemo-1.0-SNAPSHOT.jar FileSystemCat hdfs:///user/hadoop/output/part-r-00000

Hadoop jar hadoopdemo-1.0-SNAPSHOT.jar FileSystemDoubleCat hdfs:///user/hadoop/output/part-r-00000

Create a new file

FSDataOutputStream

Directory public boolean mkdirs (Path f) throws IOException

Query file system

The FileStatus class encapsulates the metadata for files and directories in the file system

Public boolean exists (Path f) throws IOException

Write data

FSDataOutputStream create (FileSystem fs,Path file, FsPermission permission) throws IOException FSDataOutputStream append (Path f) throws IOException / / Code String localSrc = args [0]; String dst = args [1]; InputStream in = new BufferedInputStream (new FileInputStream (localSrc)); Configuration conf = new Configuration (); FileSystem fs = FileSystem.get (URI.create (dst), conf) OutputStream out = fs.create (new Path (dst), new Progressable () {public void progress () {System.out.println (".");}}); IOUtils.copyBytes (in, out, 4096, false)

FSDataInputStream seek () is a relatively expensive operation and needs to be used cautiously. It is recommended to use streaming data to build the access pattern of the application, rather than executing a large number of seek () method hadoop jar hadoopdemo-1.0-SNAPSHOT.jar FileSystemDoubleCat hdfs:///user/hadoop/output/part-r-00000

Through FileSystem

Read data FsUrlStreamHandlerFactory IOUtils through URL

This is what the HDFS in the Hadoop distributed file system shared by the editor is. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.