The second of big data's Technical hadoop introduction Theory Series & A brief introduction to mdash;HDFS Architecture 07/11 Update SLTechnology News&Howtos

The second of big data's Technical hadoop introduction Theory Series & A brief introduction to mdash;HDFS Architecture

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Brief introduction of HDFS

HDFS, whose full name is Hadoop Distribute File System, is a distributed file system that can run on ordinary commercial hardware.

Significantly different from other distributed file systems are:

HDFS is a highly fault-tolerant system and can run on a variety of low-cost hardware; it provides high throughput and is suitable for storing large data sets; HDFS provides streaming data access mechanism.

HDFS originated from Apache Nutch and is now the core subproject of the Apache Hadoop project. HDFS design assumptions and target hardware errors are the norm

In the data center, hardware anomalies should be regarded as normal rather than abnormal.

In a big data environment, the hdfs cluster is composed of a large number of physical machines, each machine is made up of a lot of hardware, and the whole probability of making a mistake due to a component error is very high.

Therefore, one of the core design goals of HDFS architecture is to be able to quickly detect hardware failures and quickly recover from failures. Streaming access requirements

Applications running on HDFS clusters require streaming access to data, and HDFS is designed for batch rather than interactive processing, so architecture design places more emphasis on high throughput rather than low latency.

For POSIX's standard access mechanism, such as random access, which severely reduces throughput, HDFS will ignore this mechanism. Big data Collection

Assuming that the typical file size of HDFS is the size of GB or even TB, HDFS design focuses on supporting large files and supporting larger clusters by expanding the number of machines.

A single cluster should provide a large number of files to support a simple consistency model.

The access model provided by HDFS is a model that reads multiple reads at a time. Leaving the file intact after writing simplifies the data consistency model and achieves higher throughput for applications.

File appends are also supported. Mobile computing costs less than mobile data

HDFS makes use of the data localization principle of computer system, and believes that the closer the data is to the CPU, the higher the performance.

HDFS provides an interface for applications to perceive the physical storage location of data. Compatibility of heterogeneous hardware and software platforms

HDFS is designed to easily migrate from one platform to another for HDFS scenarios

Combining the above design assumptions and the following architectural analysis, HDFS is particularly suitable for the following scenarios:

Sequential access

For example, provide streaming media services and other large file storage scenarios full access to large files

For example, full access to massive data is required, OLAP, etc.

The overall budget is limited

Want to take advantage of the convenience of distributed computing, but do not have enough budget to buy HPC, high-performance minicomputers and other scenarios

Its performance is not satisfactory in the following scenarios:

Low latency data access

Low-latency data access means fast data location, such as 10ms-level response, if the system is busy responding to such requirements

Runs counter to the assumption that large amounts of data are returned quickly.

A large number of small files

A large number of small files will take up a large number of file blocks, resulting in greater waste and a serious challenge to metadata (namenode). Multi-user concurrent writes

Concurrent writes violate the data consistency model, and the data may be inconsistent. Real-time update

HDFS supports append, and real-time updates will reduce data throughput and increase the cost of maintaining data consistency. HDFS architecture

This article will analyze the HDFS architecture from the following aspects and explore how the HDFS architecture meets the design goals.

Overall architecture of HDFS

The following HDFS architecture diagram is from the official hadoop website.

As can be seen from the above, HDFS adopts the master-slave Candle S architecture, and the nodes of HDFS are divided into two roles: NameNode

NameNode provides the storage and operation functions of file metadata, access logs and other attributes.

The basic information of the file is stored in NameNode, and the centralized storage scheme is adopted.

DataNode

DataNode provides the functions of storing and manipulating the contents of files.

The file blocks themselves are stored in different DataNode, and the DataNode can be distributed in different racks.

The Client of HDFS accesses NameNode and DataNode to get the meta-information and content of the file, respectively. The Client of the HDFS cluster will

Access NameNode and DataNode directly, and transfer related data directly from NameNode or DataNode to the client.

HDFS data organization mechanism

The data organization of HDFS is divided into two parts, the first is the NameNode part, and the second is the DataNode data part. The organization chart of the data is as follows:

NameNode

In HDFS based on Yarn architecture, NameNode adopts master-slave design, and the host is mainly responsible for the requirements of client access to metadata and storage block information.

The slave master is responsible for real-time backup of the host and periodically merges the user operation records and file records into the block storage device and writes them back to the host.

When the host fails, the slave takes over all the work of the host. Master-slave NameNode works in the following ways: Technorati tags: HDFS,hadoop,NameNode, big data, architecture

DataNode

DataNode is responsible for storing real data. Files in DataNode are based on blocks, and the blocks are fixed in size. In the whole cluster, the same data block

Multiple copies will be saved and stored in different DataNode. The block size and the number of copies are determined by the configuration file parameters of hadoop. Block size,

The number of replicas can be modified after the cluster is started, and the modified parameters will take effect after restart, without affecting the existing files.

After DataNode starts, it scans the number of physical blocks in the local file system and reports the corresponding block information to NameNode. HDFS data access mechanism

The file access mechanism of HDFS is a streaming access mechanism, that is, after a data block of a file is opened through API, a file can be read or written sequentially, and cannot be specified.

Read the file and then perform the file operation.

Because there are multiple roles in HDFS, and the corresponding application scenarios are mainly written once and read multiple times, the ways of reading and writing are quite different. Both read and write operations are performed by

The client initiates and controls the entire process, and the server roles (NameNode and DataNode) are passive responses.

The following are described respectively:

Read flow

When the client initiates a read request, it first connects with the Namenode machine, which also needs a hdfs configuration file, so it knows the relevant information about each server. Connection establishment

After completion, the client requests to read a certain data block of a file, and NameNode retrieves it in memory to see if there is a corresponding file and file block. If not,

Notifies the client that the corresponding file or block does not exist. If so, inform the client which servers the corresponding data blocks exist on, and the client determines that after receiving the information, the corresponding data

Connect one after another and start network transmission. The client randomly selects one of the copy data for reading operation.

Process analysis? Use the client development library Client provided by HDFS to initiate a RPC request to the remote Namenode

? Namenode returns part or all of the block list of the file as appropriate, and for each block,Namenode, it returns the DataNode address with a copy of the block.

? The client development library Client will select the DataNode closest to the client to read the block;. If the client itself is DataNode, it will obtain the data directly from the local.

? After reading the data of the current block, close the connection with the current DataNode and find the best DataNode for reading the next block

? When the block of the list is read and the file reading is not finished, the client development library continues to get the next batch of block lists from Namenode.

? Checksum verification is performed after reading a block, and if an error occurs while reading the datanode, the client notifies the Namenode and then continues reading from the next datanode that owns the copy of the block. Writing process

When the client initiates a write request, NameNode retrieves it in memory to see if there is a corresponding file and file block, and if so, notifies the client that the corresponding file or block already exists

If not, notify the client of a server as the write master server. NameNode also notifies the write-in master server that when the client communicates with the master server and writes data

The master write server writes data to the physical disk, communicates with NameNode to obtain the address of the next replica server after the write is completed, and passes the data to it after confirming the address.

Write with a baton until the number of copies is set, and when the last copy is written, the success or failure of the write will also be returned to the client as a baton, and finally

The client informs NameNode that the block was written successfully, and if one of them fails, the entire write fails.

Process analysis

? Use the client development library Client provided by HDFS to initiate a RPC request to the remote Namenode

Namenode will check whether the file to be created already exists and whether the creator has permission to operate. If it succeeds, it will create a record for the file, otherwise it will cause the client to throw an exception.

? When the client starts writing to the file, it splits the file into multiple packets, manages the packets internally as a data queue "data queue", and requests a new blocks from the Namenode to get the appropriate datanodes list to store the replicas, depending on the size of the replication set in the Namenode.

? Start writing packet to all replicas in the form of pipeline (pipes). The packet is written to the first datanode as a stream, and the datanode stores the packet and then passes it to the next datanode in the pipeline until the last datanode, which writes data in pipelined form.

? After the last datanode is successfully stored, an ack packet is returned, which is passed to the client in pipeline, and the "ack queue" is maintained inside the client's development library. When the ack packet returned by datanode is successfully received, the corresponding packet is removed from "ack queue".

? If a datanode fails during transmission, the current pipeline will be closed, the failed datanode will be removed from the current pipeline, the remaining block will continue to be transmitted in the form of pipeline in the remaining datanode, and the Namenode will allocate a new datanode to maintain the number set by replicas.

HDFS data security mechanism

The security mechanism of HDFS file system adopts ACL security access mechanism similar to linux. Each file inherits the access permissions of its parent object, the directory, by default, and the default users and subordinate groups come from the

The user who uploaded the client. The related control method is similar to linux in that you can specify a user's read and write permissions to a file through a command or API. When the user does not have the corresponding permissions

If you read and write the file, you will get the corresponding error prompt.

HDFS high availability mechanism

HDFS as a high availability cluster, its usability design is very careful, mainly reflected in:

NameNode master-slave design

The master-slave design ensures the reliability of metadata and solves the problem of single point failure in HDFS 1. 0. For more information, please see the data copy mechanism described above.

The data copy mechanism ensures that when a file block stored on a server is destroyed for some reason, the whole cluster can still be provided to the outside world.

File access service. For details, please refer to the data access mechanism section above. Data recovery mechanism

The data recovery here means that HDFS provides a window of estoppel for a certain period of time, and the deleted files in the default system are moved to the trash directory.

HDFS is cleaned up after a period of time, and this mechanism is commonly used in cloud storage. If a data block fails, it can be restored through the copy mechanism. Rack-aware mechanism

The organization of large clusters is organized in the form of racks, and the machines are composed of a fixed number of servers and corresponding network devices to form a cabinet. Generally speaking, the network IO across racks is always higher than that of the same rack, and of course, the cost is higher across computer rooms. So HDFS always tries to save data in better performance servers to improve performance, and tries to save data to different racks to ensure data fault tolerance. A typical rack topology and copy are shown in the following figure:

When an application reads data, HDFS always chooses a server that is closer to the application. Snapshot mechanism automatic error detection and recovery mechanism

Machine failure detection is completed by heartbeat detection. If DataNode or NameNode cannot return heartbeat within a period of time, the master NameNode will mark it as a down server, and new IO requests will not be forwarded to this server. At the same time, if the number of copies of the corresponding files cannot reach the specified number due to the downtime of a server, HDFS will re-copy some copies of the files to ensure the reliability of the entire cluster. Checksum mechanism

Checksum means that a checksum is generated for each data block. when the data is read again, the client calculates it and compares it with the checksum on the server to ensure that the data will not be tampered with because of network transmission or other ways. HDFS cluster expansion mechanism

The dynamic expansion mode of the cluster makes it convenient for users to expand and reduce the capacity of the cluster in a dynamic way. If a new server joins, the subsequent IO will have more chances to be

Send to the new server for execution, the full distribution of the existing files in the cluster can be carried out through the command, but the data redistribution will only take up a small amount of network IO, so as to ensure that the applications on the cluster will not be significantly affected by redistribution. Similarly, the removal of the machine from the shelf is also carried out by command, and the cluster shows a situation similar to machine downtime, which will no longer send IO requests to it and re-replicate to ensure the number of replicas.

Reference: introduction to the principle, architecture and features of HDFS Design DocumentHDFS

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.