Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The Mechanism and principle of HDFS Storage in big data Hadoop

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article introduces the relevant knowledge of "the mechanism and principle of HDFS storage in big data Hadoop". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

The storage mechanism of HDFS is mainly from its three entities!

Data block

Each disk has a default block size, which is the basic unit for reading and writing to the disk. A file system built on a single disk manages blocks in that file system through disk blocks. The blocks in this file system are generally integral multiples of disk blocks. Disk blocks are typically 512 bytes. HDFS also has the concept of blocks, which defaults to 64MB (the size of data processed by an map). Files on HDFS are also divided into multiple chunks of block size, and unlike other file systems, files smaller than one block size in HDFS do not occupy the entire block of space.

The first obvious benefit of HDFS using block storage is that the size of a file can be larger than the capacity of any disk in the network, and blocks can be stored using any disk in the disk. The second simplifies the design of the system, sets the control unit as a block, simplifies storage management, and calculates how many blocks a single disk can store is relatively easy. At the same time, it also eliminates concerns about metadata, such as permission information, which can be managed separately by other systems.

DataNode node

DataNode is the working node of the HDFS file system. They store and retrieve data blocks as needed and are scheduled by NameNode nodes. And periodically send a list of the blocks they store to NameNode.

NameNode node

NameNode manages the namespace of the HDFS file system, which maintains the file system tree and all the files and directories of the entire tree. These files are permanently saved on the local disk in the form of two files (the namespace image file and the edit log file). NameNode records the data node information where each block in each file resides but does not permanently save the location information of those blocks, because the information is rebuilt by the data node when the system boots.

Without NameNode, the file system will not be available. If the machine providing the NameNode service is damaged and all files on the file system are lost, we cannot rebuild the file based on the block of DataNode. Therefore, fault tolerance for NameNode is very important. The first mechanism is to back up the files that make up the persistent state of the file system metadata. Configure NameNode to write to a remotely mounted network file system while preserving the persistent state of metadata on multiple file systems or writing the data to the local disk. Of course, these operations are atomic operations. The second mechanism is to run a secondary NameNode that saves a copy of the merged namespace image and enables it in the event of a Name/Node failure. But auxiliary NameNode preservation. The state always lags behind the main node, so it is inevitable to lose data after all the failure of the main node. In this case, the data stored in the remotely mounted network file system is generally copied to the secondary NameNode and run as the new primary NameNode.

This is the end of the introduction of "the mechanism and principle of HDFS storage in big data Hadoop". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report