Example Analysis of Hadoop File Writing 07/09 Update SLTechnology News&Howtos

Example Analysis of Hadoop File Writing

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail the example analysis of Hadoop file writing. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.

The client creates a file by calling the create () function on the DistibutedFileSystem object (step 1). DistributedFileSystem creates a RPC call to namenode and creates a new file in the namespace of the file system, at which point there is no corresponding block in the file (step 2). Namenode performs various checks to ensure that a file does not exist and that the client has permission to create the file. If these checks pass, namenode creates a new file record and a record; otherwise, file creation fails and an IOException exception is thrown to the client. DistributedFileSystem returns a FSDataOutputStream object to the client so that the client can start writing data. Just like a read event, FSDataOutputStream encapsulates a DFSOutputStream object that handles communication between datanode and namenode.

When the client writes data (step 3), DFSOutputStream divides it into packets and writes it to an internal queue, called "data queue" (data queue). DataStream handles the data list, and its responsibility is to require namenode to allocate suitable new blocks to store data backups according to the datanode list. This set of datanode forms a pipeline-we assume that the number of replicas is 3, so there are three nodes in the pipeline.DataStreamer streams the packet to the first datanode in the pipeline, which stores the packet and sends it to the second datanode in the pipeline. Similarly, the second datanode stores the packet and sends it to the third (and last) datanode in the pipeline (step 4). DFSOutputStream also maintains an internal packet queue waiting for an acknowledgement receipt from datanode, called an acknowledgement queue (ack queue). The packet is not removed from the acknowledgement queue until all datanode acknowledgements are received in the pipeline (step 5).

If the datanode fails during data writing, do the following, which is transparent to the client writing the data, first close the pipeline and confirm that any packets in the queue are added back to the front end of the data queue to confirm that the datanode downstream of the failed node will not miss any packets. Specify a new identity for the current block stored in another normal datanode and pass that identity to the namenode so that datanode can delete some of the stored blocks after recovery. Remove the failed data node from the pipeline and write the remaining data blocks to the two normal datanode in the pipeline. When namenode notices that there is not enough copy of the block, it creates a new copy on another node. Subsequent data blocks continue to be processed normally.

Multiple datanode may fail at the same time during a block write, but it is rare that as long as the number of replicas of dfs.replication.min is written (the default is 1), the write operation succeeds, and the block can be replicated asynchronously in the cluster until its target number of replicas is reached (the default for dfs.replication is 3). After the client finishes writing the data, the close () method (step 6) is called on the data stream, which writes all remaining packets to the datanode pipeline and waits for confirmation (step 7) before contacting namenode and sending the file write completion signal (step 7). Namenode already knows which blocks the file consists of (asks for the allocation of the database through DataStreamer), so it only needs to wait for the minimum amount of data blocks before returning successfully.

This is the end of this article on "sample Analysis of Hadoop File Writing". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it out for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.