Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The principle and structure of HDFS

2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

HDFS architecture

HDFS is a service with master/slave structure, in which NameNode is master and usually only starts on one node; DataNode is slave and usually each node starts one; DataNode constantly actively sends heartbeats and reports block information to NameNode; in order to back up NameNode, there will be a SecondaryNameNode

Create directories for all kinds of HDFS operations

Client interacts directly with namenode to create a directory node in INode and write the operation to edit log without the participation of datanode. Delete a file

The client interacts with namenode to delete files, and namenode only marks it for deletion, but does not actively notify datanode.

When the corresponding datanode sends a heartbeat to namenode, namnode will put the delete instruction in the return value

Therefore, in general, deletion is not immediately deleted, but there is a certain delay. Read the file

Client first interacts with namenode, obtains the node where the file block is located through the getBlockLocatitions method, and then client interacts with datanode to obtain specific data.

Block may not return all at once, and getBlockLocatitions may need to be called multiple times.

If the DN fails when the client reads the data, it will then read the next data block and record the faulty node; the read data return contains the checksum of the data, and if an error is found, it will be reported to NN and read from other copies. Write a file

Client first interacts with NN to create a new file in the NN namespace

In the second step, client interacts with NN to get where to write before actually writing. AddBlock returns a LocateBlock object containing the database flag and version number.

LocateBlock also provides a data flow pipeline that interacts with DN. The data written by client to the pipeline is divided into file packets, which are put into an output queue.

In the third step, client interacts with DN to write data. After the first node finishes writing, the first DN writes to the second DN, and the ack confirmation message is returned. If the ack confirmation message is received, the package is deleted from the queue.

After writing a block, DN interacts with NN and submits the block to NN.

In the case of a DN failure:

1. Close the data flow channel first. The packet being written will not be deleted from the queue and the data will not be lost because it has not received the ack.

2, the data block on the normal DN will be given a new version number and notify NN. The main reason is that after the failed node is restored, it will be deleted automatically when it is found that the version number is not consistent with that on NN.

3. The data flow pipeline deletes the error node and re-establishes the pipe, and continues to write data on the normal time.

4. After the file is closed, NN finds that the database does not meet the replica requirement and selects a new DN to copy the data block.

SecNameNode backup

The metadata of hdfs is saved in the Inode object, but namenode is a single point of master, and failure recovery cannot be performed if the data is all in memory.

Hdfs has a checkpoint mechanism that persists Inode in memory at a point in time to an fsimage file and writes each operation record to edit log.

SecNameNode is responsible for merging fsimage and edit log on namenode.

1Magee SecNN constantly acquires the size of the editlog on NN. If it is too small, it will not do any processing.

2. If the editlog is large, SecNN notifies NN to initiate a checkpoint operation

3MaginNN will produce a new editlog--edit.new, after which all operations on metadata will be written to the edit.new file.

4, while SecNN pulls the fsimage and editlog on the NN to the SecNN node through the http interface, merges them in memory and outputs the file fsimage.ckpt

5Jing SecNN then actively notifies NN that the image merge has been completed.

6Magi NN pulls the fsimage.ckpt through http interface and overwrites the original fsimage, and finally changes the edit.new back to editHDFS HA

Https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop-name-node/index.html

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report