Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Talking about the Writing process of HDFS

2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

1. Use the client Client provided by HDFS to initiate a RPC request to the remote Namenode

2. Namenode will check whether the file to be created already exists and whether the creator has permission to operate. If it succeeds, it will create a record for the file, otherwise it will cause the client to throw an exception.

3. When the client starts to write to the file, the client will split the file into multiple packets, manage these packets internally in the form of data queue "data queue", and apply to Namenode for blocks to obtain the appropriate datanode list for storing replicas. The size of the list depends on the setting of replication in Namenode.

4. Start writing packet to all replicas in the form of pipeline (pipe). The development library writes the packet to the first datanode as a stream, which stores the packet and then passes it to the next datanode in the pipeline until the last datanode, which writes data in pipelined form.

5. After the last datanode is successfully stored, an ack packet (acknowledgement queue) will be returned, which will be passed to the client in pipeline, and the "ack queue" will be maintained inside the client's development library. When the ack packet returned by datanode is successfully received, the corresponding packet will be removed from "ack queue".

6. If a datanode fails during transmission, the current pipeline will be closed, the faulty datanode will be removed from the current pipeline, and the remaining block will continue to be transmitted in the form of pipeline in the remaining datanode. At the same time, the Namenode will assign a new datanode to maintain the number set by replicas.

7. After the client finishes writing the data, the client will call the close () method on the data stream to close the data flow

8. As long as the number of replicas of dfs.replication.min is written (the default is 1), the write operation will succeed, and the block can be replicated asynchronously in the cluster until it reaches its target number of replicas (the default value of dfs.replication is 3). Because namenode already knows which blocks the file consists of, it only needs to wait for the minimum number of data blocks before returning a successful copy.

The client writes data to the flow chart of HDFS

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report