Basic components and basic Operation of HDFS 04/27 Update SLTechnology News&Howtos

Basic components and basic Operation of HDFS

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

The schematic description of HDFS component structure:

1. As shown in the figure, NameNode plays the role of master, and its responsibilities include: managing the namespace of the document system (namespace); adjusting client access to the required files (files stored in DateNode)

2. DataNodes plays the role of slaves. Usually, only one Datenode is deployed on a machine to store the data needed by MapReduce programs.

3. Namenode regularly receives feedback from Heartbeat and Blockreport from DataNodes.

4. Heartbeat feedback is used to ensure that there is no abnormal function of the DataNode.

5. Blockreport contains the Block collection stored by DataNode

II. Design principles of HDFS

1. Files are stored in block.

2. Each block is much larger than that of most file systems (default 64m)

3. Improve reliability and read throughput through copy mechanism

4. Each block is assigned to at least three DataNode (generally, raid1 configuration for namenode and raid5 configuration for datanode)

5. A single master (NameNode) to coordinate storage metadata (metadata)

6. The client does not have a caching mechanism for files (No data caching)

3. NameNode (NN)

NameNode main function provides name query service, it is a jetty server (an open source servlet container, embedded web server)

NameNode saves metadate information including

Files owership and permissions

What blocks does the file contain

Which DataNode is the Block saved in (reported by DataNode when started)

1. The metadate information of NameNode will be loaded into memory after startup.

2. Metadata is stored to disk with the file name "fsp_w_picpath".

The location information of Block is not saved to fsp_w_picpath

IV. DataNode (DN)

Save Block

Block information is reported to NN when the DN thread is started

Hadoop fs-cmd

Cmd: for specific operations, it is basically the same as the command of UNIX

Args: parameter

Hdfs resource URL format: scheme://bigdata/path

Scheme: protocol name, file or hdfs

Bigdata:namenode Hostnam

Path: path

Eg:hdfs://localhost:9000/user/chunk/test.txt

Assume that fs.default.name=hdfs://localhost:9000 has been set in core-site.xml

You can only use / user/chunk/test.txt

Hdfs command example: the stored data is stored as a file, each resource is distinguished by an absolute path, and / is added when creating a directory to store the resource.

# create a directory

Hadoop fs-mkidr / myFirstDir

# View the created directory

Hadoop fs-ls / myFirstDir # returned empty, no data has been stored yet

# store files for the currently created directory

Hadoop fs-put / etc/shadow / myFirstDir

# View the files in the directory

Hadoop fs-ls / myFirstDir

# copy the file to the specified location

File / local file path under hadoop fs-get / hadoop

Hadoop fs-get / myFirstDir/shadow / home/# download shadow to / home directory

# create an empty file

Hadoop fs-touchz / myFirstDir/newFile.txt

# rename a file on hadoop

Hadoop fs-mv / myFirstDir/newFile.txt / myFirstDir/bigdata.txt

# Save all the contents in the directory specified by hadoop as a file, and down to the local file at the same time

Hadoop dfs-getmerge / myFirstDir/bigdata.txt / home/a

# check the contents of the file

Hadoop fs-cat / myFirstDir/shadow

# View the last 1000 bytes of data

Hadoop fs-tail / myFirsDir/shadow

# Delete files\ directory

Hadoop fs-rm-R / myFirstDir/shadow

Hadoop fs-rm-R / myFirstDir/Secondary

# View the files under HDFS

Hadoop fs-ls /

# View the information of the cluster data and log in to the master node to view

Http://192.168.1.114:50070

Management and update

# View basic statistics of HDFS

Hadoop dfsadmin-report

# in and out of security mode

Hadoop dfsadmin-safemode enter

Hadoop dfsadmin-safemode leave

# Node add

Add a new DataNode node and install Hadoop on the Singapore node first

To use the same configuration as NameNode (which can be copied directly from NameNode), modify the $HADOOP_HOME/conf/master file and add the NameNode hostname.

Then modify the $HADOOP_HOME/conf/slaves file on the NameNode node, add the new node name, and establish a password-free SSH connection for the new node. Run the startup command as follows:

/ bin/start-all.sh

# load balancing

HDFS data can be unevenly distributed across DataNode, especially when DataNode nodes fail or new DataNode nodes are added.

When adding data blocks, NameNode's selection strategy for DataNode nodes may also lead to uneven distribution of data blocks.

You can use the command to rebalance the distribution of blocks on the DataNode:

Start-balancer.sh

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.