Example Analysis of Reading and Writing process in HDFS 07/04 Update SLTechnology News&Howtos

Example Analysis of Reading and Writing process in HDFS

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Xiaobian to share with you the HDFS reading and writing process example analysis, I hope you have some gains after reading this article, let's discuss it together!

The process of reading documents

1 Client opens a file with FileSystem's open() function

2 DistributedFileSystem uses RPC to call metadata node to get data block information of file. For each data block, the metadata node returns the address of the data node holding the data block, and DistributedFileSystem returns FSDataInputStream to the client for reading the data.

3 The client calls the read() function of stream to start reading data, DFSInputSteam connection saves the nearest data node of the first data block of this file

4 Data reads from the data node to the client. When the data block is read, DFSInputStream closes the connection to this data node and then connects to the nearest data node of the next data block of this file.

5 When the client finishes reading data, call the close() function of FSDataInputStream

If the client has an error communicating with the data node while reading the data, it attempts to connect to the next data node that contains this data block.

Failed data nodes are logged and no longer connected.

To summarize briefly: client opens the file with FileSystem---> FileSystem gets the bloc location of the file from namenode returns FSDataInputStream to client ---> client reads data with FSDataInputStream--> closes the connection with this datanode after reading one data block---> closes FSDataInputStream close() until all data blocks of the file are read

The process of writing documents

The client calls create() to create a file.

DistributedFileSystem uses RPC to call the metadata metadata node to create a new file in the namespace of the file system.

The metadata node first determines that the file does not exist and that the client has permission to create the file, and then creates the file. DistributedFileSystem returns DFSOutputStream client for writing data

The client starts writing data, DFSOutputStream divides the data into blocks and writes them to the data queue.

4. Data queue is read by Data Streamer and notifies metadata nodes to allocate data nodes, which are used to store data blocks (each block is copied by default). The allocated data nodes are placed in a pipeline. That is to say (DFSOutStream is written to the block in the data queue, that is, to be written to the data node in the pipeline, and the reading and writing data of this block is responsible for the Data Streamer)

The Data Streamer writes the data block to the first data node in the pipeline. The first data node sends a block of data to the second data node. The second data node sends data to the third data node. In turn...

5. DFSOutStream saves the ack queue for the data block sent out, waiting for the data node in the pipeline to inform that the data has been written successfully.

If the data node fails during the write:

Close pipeline and place data blocks in ack queue at the beginning of data queue

The current data block is given a new label by the metadata node in the data node that has been written, and the error node can detect that its data is expired after restarting, and will be deleted.

Failed data nodes are removed from the pipeline and additional data blocks are written to two other data nodes in the pipeline.

The metadata node is notified that the block is not enough copies and that a third backup will be created in the future

When the client ends writing data, call the close function of stream. This operation writes all data blocks to the data nodes in the pipeline and waits for the ack queue to return successfully. Finally, notify the completion of metadata node writing.

from summary

create client--> FileSystem create Call namnode Create fsimage file--> Return DFSOutputStream to client---> Client uses DFSOutputStream to start writing blocks into data queue

--->Data Streamer is responsible for fetching data from data queue and writing it into pipeline (data node list)--->DFSOutputStream waits for ack queue returned by pipeline--> Finally notifies metadata node that writing is completed

safe mode

1: Name When starting, first load fsimage(mirror image) into memory, and perform (replay) various operations of editing log editlog.

2: Once the file system metadata mapping is established in memory, create a new fsimage file (this process does not require a secondaryNameNode) and an empty editlog

3: In safe mode, each datanode sends the latest chunk list to the namenode

4: Namenode is currently running in safe mode. That is, the file system of namenode is read-only for the client (display directory, display file content, etc., write, delete, rename will fail)

5: NameNode starts listening for RPC and HTTP requests

RPC (Remote Procedure Call Protocol)-Remote Procedure Call Protocol (RPC)-RPC is a protocol for sending data over a network

A protocol that requests services on a computer program without understanding the underlying network technology.

6: The location of data blocks in the system is not maintained by the namenode, but stored in the datanode as a list.

7: During normal operation of the system, namenode keeps mapping information of all block information in memory (where the block is in datano)

8 : Entering and leaving safe mode:

See what state namenode is in

hadoop dfsadmin -safemode get

Enter safe mode (hadoop starts in safe mode)

hadoop dfsadmin -safemode enter

Leave safe mode

hadoop dfsadmin -safemode leave

After reading this article, I believe you have a certain understanding of "HDFS reading and writing process example analysis", if you want to know more related knowledge, welcome to pay attention to the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.