Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Hadoop2.6.0 Learning Notes (3) HDFS Architecture

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Lu Chunli's work notes, who said that programmers should not have literary style?

See HDFS Architecture:

Http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html

Or downloaded tar package after decompression

Hadoop-2.6.0/share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html

Hadoop distributed File system (Hadoop Distributed File System, referred to as HDFS) is a distributed file system that runs on cheap hardware. HDFS is highly fault-tolerant, designed to be deployed on low-cost hardware, provides high-throughput access to application data, and is suitable for applications with large datasets.

Description: the data in the distributed application is scattered on different machines, and through the distributed file system, it provides a unified way of data access, shielding the differences of the underlying data storage location.

The vision and goal of HDFS

1. Hardware failure

In a centralized environment, it is always assumed that the machine can run stably and continuously; in a distributed environment, on the contrary, it regards hardware errors as normal rather than abnormal. HDFS consists of hundreds of computer nodes, each of which stores part of the data. In a large number of nodes, any node may be unable to provide services because of faults, so fault detection, fast and automatic fault recovery is the core architecture goal of HDFS design.

2. Streaming data access

Applications running on HDFS need to read datasets in a streaming manner. HDFS is not designed for general purposes, it is mainly designed to achieve batch data processing, rather than interactive data operations; high-throughput data processing, rather than low-latency data computing.

3. Collection of big data

HDFS aims to support big data collections. A typical file size stored above generally ranges from GB to TB. A single HDFS instance should be able to support tens of millions of files.

4. Consistency model

HDFS applications require write-one-read-many access model for files. A file does not need to be changed after it has been created, written, and closed. This assumption simplifies the problem of data consistency and makes high-throughput data access possible.

5. Transfer from mobile data to mobile computing

Moving the computation near the data is obviously better than moving the data to the application, which HDFS provides an interface to.

6. Portability across heterogeneous hardware and software platforms

HDFS is designed to easily move from one platform to another.

HDFS Architecture

HDFS adopts master / slave (master/slave) structure, and a HDFS cluster consists of a NameNode (NN) node and a certain number of DataNode (DN) nodes.

NN is the master server of the HDFS cluster and the core of the entire HDFS cluster. It manages the namespace of the HDFS cluster and receives requests from clients (specific requests are still handled by DN). DN typically one is configured on each slave machine to manage the data stored on the slave machine.

When users report data, the data is stored on the data node in the form of data blocks (Block), and a file may be divided into one or more data blocks.

NN performs namespace operations on the file system, such as opening, closing, and renaming files and directories, while maintaining a block-to-DN mapping. DN creates, deletes and replicates block under the command of NN, and processes data read and write requests of client according to the mapping relationship between block and DN maintained by NN.

Description:

The above is a single NN mode. NN stores metadata information. In the event of a failure, the entire HDFS cluster will be dead.

When deploying, a host can configure both NN and DN, or multiple DN, but few people do that.

Data blocks have copies, default to 3, and are stored as local machines, machines in the same rack, and other in-rack machines (rack-aware).

Read and write requests on the Client side have to say hello to NN first, and then do the real work through DN.

HDFS Namespace

HDFS supports traditional hierarchical file organization, and like most other file systems, users can create directories and create, delete, move, and rename files in between. HDFS does not support user quotas and access rights, nor does it support link, but the current architecture does not rule out the implementation of these features. Namenode maintains the namespace of the file system, and any changes to the file system namespace and file properties will be recorded by Namenode. The application can set the number of copies of files saved by HDFS. The number of copies of files is called the replication factor of files, and this information is also saved by Namenode.

Description:

You can limit directories through hdfs dfsadmin-setQuota.

Hdfs dfs-setrep can dynamically adjust the number of copies.

Any actions on hdfs are recorded in the edits log file, and the storage path is specified by the parameter dfs.namenode.edits.dir.

HDFS Data Replication

HDFS is designed to store large amounts of files reliably across machines in a large cluster. It stores each file as a block sequence, and all block are the same size except for the last block. All block of the file is copied for fault tolerance. The block size and replication factor of each file are configurable. The Replication factor can be configured when the file is created or can be changed later.

The file in HDFS is write-one, and it is strictly required that there is only one writer at any time. Namenode has full management of block replication, periodically receiving heartbeats and a Blockreport from each Datanode in the cluster. The reception of the heartbeat indicates that the Datanode node is working properly, and the Blockreport includes a list of all the block on the Datanode.

1. Copy storage

By default, the replication factor is 3PowerHDFS. The storage policy of HDFS is that the first copy is stored on the DN node to which the client is connected, one copy is placed on another node on the same rack, and the last copy is placed on a node on a different rack.

Description:

The rack awareness of HDFS is not adaptive and requires configuration by managers.

2. Copy selection

To reduce overall bandwidth consumption and read latency, HDFS tries to get reader to read the nearest copy.

3. Safety mode

When Namenode starts, it enters a special state called Safemode, in which Namenode does not copy data blocks.

Namenode receives heartbeats and Blockreport from all Datanode. Blockreport includes a list of all the blocks of a Datanode.

Each block has a specified minimum number of replicas (dfs.namenode.replication.min), and when the Namenode confirms the minimum number of replicas of a block, the block is considered secure.

If a certain percentage of block detection (dfs.namenode.safemode.threshold-pct) is confirmed to be secure, Namenode exits the Safemode state, and then it determines which other blocks have less than the specified number of copies and copies those block to other Datanode.

Persistence of file system metadata

NN stores HDFS Namespace metadata, and NN records any modifications to file metadata using a transaction log called Editlog. For example, if you create a file in HDFS, NN will insert a record in Editlog to represent it; similarly, modifying the replication factor of the file will also insert a record into Editlog. NN stores Editlog files on the local OS file system. The namespace of the entire file system, including block-to-file mapping and file attributes, is stored in a file called FsImage, which is also stored on the file system where NN resides.

NN preserves the mapping of the entire file system namespace and file Blockmap in memory, and a NN with 4G memory is sufficient to support a large number of files and directories. When NN starts, it reads Editlog and FsImage from the hard disk, says that all transactions in Editlog act on FsImage files, and flush the new version of FsImage from memory to the hard disk, and then truncate the old Editlog, because the old Editlog has been applied to FsImage, a process called checkpoint.

DN says that the file is saved on the local file system, but it doesn't know anything about the file. Instead of storing all files in one directory, DN uses heuristics to determine the optimal number of files in each directory and creates subdirectories when appropriate. When a DN starts, it scans the local file system, generates a corresponding list of all HDFS blocks for these local files, and sends a report to DN, which is BlockReport.

Communication protocol

All HDFS communication protocols are built on the TCP/IP protocol. Clients communicate with NN through a configurable port and with NN through ClientProtocol, while DN uses DatanodeProtocol to communicate with NN. Both ClientProtocol and DatanodeProtocol protocols are encapsulated based on RPC protocols. In the design of HDFS, NN does not initiate RPC actively, but responds passively to requests from DN and client.

Robustness

The main goal of HDFS is to achieve the reliability of data storage in the event of failure. The three common failures are NN failure, DN failure and network partition.

(1) hard disk data error, heartbeat detection and re-replication

DN sends heartbeat information to NN periodically, but network cutting may cause some DN to lose its connection to NN. NN detects this by missing heartbeats, marks these DN as dead, and does not send new IO requests to them. HDFS will no longer be able to access the data stored on the DN in the dead state, so the number of copies of some block will be below the configured threshold. NN keeps track of the blocks that need to be replicated and initiates replication whenever needed. Re-replication may be triggered in the following cases: DN failure, a block corruption, DN node disk error, or replica factor increase.

(2) Cluster equilibrium

The automatic equalization plan of HDFS cluster, that is, when the DN node runs out of disk space, it automatically moves data to other nodes, or when the number of requests for a file suddenly increases, copying data to other nodes in the cluster has met the application needs, but this mechanism has not been implemented yet.

(3) data integrity

Because the data block may be damaged due to storage device failure, network failure or software BUG during client access, HDFS implements the data content checksum and automatically checks the integrity of the data block when reading the data. If there is a problem, it will try to read the data from other DN.

(4) metadata disk error

FsImage and Editlog are the core data structures of HDFS, and their corruption will make the entire HDFS cluster unusable. Therefore, NN can be configured to store multiple copies for FsImage and Editlog (dfs.namenode.name.dir configures multiple paths separated by commas), and all operations update multiple copies synchronously.

Description:

NN has a SPOF problem in the HDFS cluster. When the NN node fails, it must be manually processed if HA is not configured, or NN is configured in HA mode.

(5) snapshots

HDFS does not currently support snapshots but will in a future release.

NameNode

The location where the NN node data is stored locally is specified by the parameter ${dfs.namenode.name.dir} in the hdfs-site.xml file.

Dfs.namenode.name.dir / usr/local/hadoop2.6.0/data/dfs/name

Check the ${dfs.namenode.name.dir} directory:

[hadoop@nnode name] $pwd/usr/local/hadoop2.6.0/data/dfs/name [hadoop@nnode name] $lscurrent [hadoop@nnode name] $

After deploying the Hadoop cluster for the first time, you need to format the disk on the NN node:

$HADOOP_HOME/bin/hdfs namenode-format

After formatting, the following directory structure is generated under the ${dfs.namenode.name.dir} / current directory:

There are four types of files in the current directory:

Transaction logging file (edits)

File system image (fsp_w_picpath)

The transaction number (seen_txid) applied to the current file system image

VERSION file.

[hadoop@nnode current] $ll-h hadoop hadoop RWFY RWFLY-1 hadoop hadoop 42 Nov 21 17:21 edits_0000000000000035252-0000000000000035253Kui RWLY-1 hadoop hadoop 42 Nov 21 17:23 edits_0000000000000035254-00000000000000000000000000000000000035257 RWFLY-1 hadoop hadoop RWRAN-1 hadoop hadoop 42 Nov 21 17:27 edits_0000000000000035258-00000000000035257 RWFLY 21 17:27 edits_0000000000000035258-00000000000035259 RWMAY RWFor-1 hadoop hadoop 42 Nov 21 17:28 Edits_0000000000000035260-00000000000035261When RW edits_0000000000000035260-1 hadoop hadoop 231K Nov 21 22:29 fsp_w_picpath_0000000000000035842-rw-rw-r-- 1 hadoop hadoop 62 Nov 21 22:29 fsp_w_picpath_0000000000000035842.md5-rw-rw-r-- 1 hadoop hadoop 231K Nov 21 23:29 fsp_w_picpath_0000000000000035930-rw-rw-r-- 1 hadoop hadoop 62 Nov 21 23:29 fsp_w_picpath_0000000000000035930.md5-rw-rw-r-- 1 hadoop hadoop 6 Nov 21 17:27 seen_txid-rw-rw-r-- 1 hadoop hadoop 204 Nov 21 23:29 VERSION

The VERSION file is a properties file:

[hadoop@nnode current] $cat VERSION # Sat Nov 21 23:29:03 CST 2011file system unique ID, which is set when the first format. This value is not known before DN registers to NN, so you can distinguish the newly created DNnamespaceID=339614018# Cluster identity. DN is consistent with the need here for the creation time of clusterID=CID-61347f43-0510-4b07-ad99-956472c0e49f# NN storage. When NN is upgraded, this value will update the cTime=0# storage type (DATA_NOME/NAME_NODE) storageType=NAME_NODE# block pool identity. It will be introduced later (192.168.1.117 is the originally configured IP address, but the background has been changed to 192.168.137.117) blockpoolID=BP-892593412-192.168.1.117-143159821285 represents the version number layoutVersion=-60 [hadoop@nnode current] $of the HDFS permanent data structure as a negative integer.

Seen_txid is the file that records the transaction ID. The format is followed by 0, which indicates the Mantissa of the edit-* file in the NN. The transaction was applied to the id of fsp_w_picpath for the last time.

-rw-rw-r-- 1 hadoop hadoop 42 Nov 21 17:27 edits_0000000000000035258-00000000000035259 hadoop hadoop RWQUA-1 hadoop hadoop 42 RWQA 21 17:28 edits_0000000000000035260-000000000035261Mustang RWQA-1 hadoop hadoop 236143 Nov 21 22:29 fsp_w_picpath_0000000000000035842-rw-rw-r-- 1 hadoop hadoop 62 Nov 21 22:29 fsp_w_picpath_0000000000000035842.md5-rw-rw-r-- 1 hadoop hadoop 236027 Nov 21 23:29 fsp_w _ picpath_0000000000000035930-rw-rw-r-- 1 hadoop hadoop 62 Nov 21 23:29 fsp_w_picpath_0000000000000035930.md5-rw-rw-r-- 1 hadoop hadoop 6 Nov 21 17:27 seen_txid-rw-rw-r-- 1 hadoop hadoop 204Nov 21 23:29 VERSION [hadoop@nnode current] $cat seen_txid 35260 [hadoop@nnode current] $

The edits file is the transaction log file saved by NN. The storage location is specified by ${dfs.namenode.edits.dir}. By default, the location of the NN node data file is ${dfs.namenode.name.dir}.

The fsp_w_picpath file stores the metadata file for the NN node.

Description:

When the namenode-format command is executed, a namespaceID value is generated in the VERSION of both the namenode node and the datanode node, and the two values must be the same. If formatting is performed again, namenode will generate a new namespaceID, but datanode will not produce a new value.

Example 1: view the edits file

Bin/hdfs oev-I edits_0000000000000000057-000000000000186-o edits.xml

Example 2: view the fsp_w_picpath file

Bin/hdfs oiv-p XML-I fsp_w_picpath_0000000000000000055-o fsp_w_picpath.xml

DataNode

The storage location of the DN node data is also configured through the hdfs-site.xml file:

Dfs.datanode.data.dir / usr/local/hadoop2.6.0/data/dfs/data

DN node data storage structure:

/ usr/local/hadoop2.6.0/data/dfs/data |-current |-BP-892593412-192.168.1.117-1431598212853 |-current |-dfsUsed |-168801504 1448121566578 |-finalized |-subdir0 | -subdir0 subdir1 subdir10 subdir11 subdir2. Subdir9 (directory) |-- rw-rw-r-- 1 hadoop hadoop 1377 Nov 21 16:53 blk_1073744377 |-- rw-rw-r-- 1 hadoop hadoop 19 Nov 21 16:53 blk_1073744377_3560.meta |- -- rw-rw-r-- 1 hadoop hadoop 1438 Nov 21 16:53 blk_1073744378 |-- rw-rw-r-- 1 hadoop hadoop 19 Nov 21 16:53 blk_1073744378_3561.meta |-- rw-rw-r-- 1 hadoop hadoop 1671 Nov 21 16:53 blk_1073744379 |-- rw-rw-r-- 1 hadoop hadoop 23 Nov 21 16:53 blk_1073744379_3562.meta |-rbw (empty directory) |-VERSION |-# Sat Nov 28 12: 03:36 CST 2015 |-namespaceID=339614018 |-cTime=0 |-blockpoolID=BP-892593412-192.168.1.117-1431598212853 |-layoutVersion=-56 |-dncp_block_verification.log.curr |- -dncp_block_verification.log.prev |-tmp (empty directory)-|-VERSION |-# Sat Nov 28 12:03:36 CST 2015 |-storageID=DS-e1bf6500-fc2f-4e73-bfe7-5712c9818965 |-clusterID=CID-61347f43- 0510-4b07-ad99-956472c0e49f |-cTime=0 |-datanodeUuid=33c2f4cb-cd26-4530-9d9b-40d0b748f8b8 |-storageType=DATA_NODE |-layoutVersion=-56 |-- in_use.lock |-14187 (the process number of the corresponding DataNode process Can be viewed via ps-ef | grep DataNode)

Upload a file to hdfs through the hdfs command:

[hadoop@dnode1 subdir0] $ll-h ~ / httpdomainscan_192.168.1.101_0_20130913160407.hsl-rwxrwxr-x 1 hadoop hadoop 180M Nov 28 12:06 / home/hadoop/httpdomainscan_192.168.1.101_0_20130913160407.hsl

Under the DN node directory current/finalized/subdir0/subdir14, you can see that two new bock blocks have been created:

-rw-rw-r-- 1 hadoop hadoop 128M Nov 28 12:12 blk_1073745442-rw-rw-r-- 1 hadoop hadoop 1.1m Nov 28 12:12 blk_1073745442_4628.meta-rw-rw-r-- 1 hadoop hadoop 52m Nov 28 12:12 blk_1073745443-rw-rw-r-- 1 hadoop hadoop 411K Nov 28 12:12 blk_1073745443_4629.meta

Description:

The size of the block block in HDFS is fixed at 128m, but when the size of a data file is less than one block, space is allocated according to the actual size.

My HDFS cluster has two DN nodes, and the parameter of the copy is configured to 2, so two new nodes are created under the subdir14 directory of both the host dnode1 and dnode2.

NN Node edits File

# at this time, there is an extra edits_inprogress_0000000000000036000 file-rw-rw-r-- 1 hadoop hadoop 42 Nov 21 17:28 edits_0000000000000035260-00000000000035261Rwaq Rwashi-1 hadoop hadoop 1.0M Nov 28 13:31 edits_inprogress_0000000000000036000-rw-rw-r-- 1 hadoop hadoop 231K Nov 21 22:29 fsp_w_picpath_0000000000000035842-rw-rw-r-- 1 hadoop hadoop 62 Nov 21 22:29 fsp_w_picpath_0000000000000035842.md5-rw-rw-r -- 1 hadoop hadoop 231K Nov 21 23:29 fsp_w_picpath_0000000000000035930-rw-rw-r-- 1 hadoop hadoop 62 Nov 21 23:29 fsp_w_picpath_0000000000000035930.md5-rw-rw-r-- 1 hadoop hadoop 6 Nov 21 17:27 seen_txid-rw-rw-r-- 1 hadoop hadoop 204 Nov 21 23:29 VERSION

Check again:

# two more transaction logs And seen_txid has also been updated (36002)-rw-rw-r-- 1 hadoop hadoop 42 Nov 21 17:28 edits_0000000000000035260-000000000035261When hadoop hadoop RWFor-1 hadoop hadoop 42 Nov 28 13:55 edits_0000000000000036000-00000000000036001Kui RWFI-1 hadoop hadoop 42 Nov 28 13:56 edits_0000000000000036002-00000000000036003RWFLY-1 hadoop hadoop 236143 Nov 21 22:29 fsp_w_picpath_0000000000000035842-rw-rw-r-- 1 hadoop hadoop 62 Nov 21 22:29 fsp_w_picpath_0000000000000035842.md5-rw-rw-r-- 1 hadoop hadoop 236027 Nov 21 23:29 fsp_w_picpath_0000000000000035930-rw-rw-r-- 1 hadoop hadoop 62 Nov 21 23:29 fsp_w_picpath_0000000000000035930.md5-rw-rw-r-- 1 hadoop hadoop 6 Nov 28 13:55 seen_txid-rw-rw-r-- 1 hadoop hadoop 204 Nov 21 23:29 VERSION

View DN node metadata information:

Hdfs oiv-p XML-I fsp_w_picpath_0000000000000035842-o fsp_w_picpath.xml

Download to local and open:

Go to the DN node to find the block file with ID 1073741829:

# cd current/finalized/subdir0 [hadoop@dnode1 subdir0] $find. -name "* 1073741829*.meta". / subdir0/blk_1073741829_1008.meta [hadoop@dnode1 subdir0] $cd subdir0 [hadoop@dnode1 subdir0] $ll | grep 1073741829 hadoop hadoop May 27 2015 blk_1073741829-rw-rw-r-- 1 hadoop hadoop 11 May 27 2015 blk_1073741829_1008.meta\ r\ nceshi\ r\ n123456\ r\ n123456\ r\ nceshi123456 [Hadoop @ dnode1 subdir0] $hdfs dfs- Cat lucl.txthello WorldClearlucl123456\ r\ nceshi\ r\ n123456\ r\ nceshi123456 [Hadoop @ dnode1 subdir0] $

Note: uploaded a hsl file, but did not see the log record from the edits, in addition, the fsp_w_picpath has been modified for the same time.

Look at the current directory after a period of time:

-rw-rw-r-- 1 hadoop hadoop 42 Nov 28 19:38 edits_0000000000000036294-000000000036295 Mustang RWQI-1 hadoop hadoop 42 Nov 28 19:40 edits_0000000000000036296-000000000036297 hadoop hadoop RWRaq-1 hadoop hadoop 1048576 Nov 28 19:40 edits_inprogress_0000000000000036298-rw-rw-r-- 1 hadoop hadoop 235504 Nov 28 18:08 fsp_w_picpath_0000000000000036205-rw-rw-r-- 1 hadoop hadoop 62 Nov 28 18:08 fsp_w_picpath_0000000000000036205 .MD5-rw-rw-r-- 1 hadoop hadoop 235504 Nov 28 19:08 fsp_w_picpath_0000000000000036263-rw-rw-r-- 1 hadoop hadoop 62 Nov 28 19:08 fsp_w_picpath_0000000000000036263.md5-rw-rw-r-- 1 hadoop hadoop 6 Nov 28 19:40 seen_txid-rw-rw-r-- 1 hadoop hadoop 204 Nov 21 23:29 VERSION

View the edits file:

-60 OP_START_LOG_SEGMENT 36029 OP_TIMES 36030 0 / user/hadoop/lucl.txt-1 1448694798698 OP_END_LOG_SEGMENT 36031

View the fsp_w_picpath file:

20980 FILE httpdomainscan_192.168.1.101_0_20130913160407.hsl 2 1448683960850 1448683943208 134217728 hadoop:hadoop:rw-r--r-- 1073745442 4628 134217728 1073745443 4629 53807068

Description: you can understand the relationship between files and data blocks through fsp_w_picpath, but you can't know the relationship between database and nodes.

Org.apache.hadoop.hdfs.server.namenode.NameNode * NameNode serves as both directory namespace manager and * "inode table" for the Hadoop DFS. There is a single NameNode * running in any DFS deployment. (Well, except when there * is a second backup/failover NameNode, or when using federated NameNodes.) * The NameNode controls two critical tables: * 1) filename- > blocksequence (namespace) * 2) block- > machinelist ("inodes") * * The first table is stored ond isk and is very precious. * The second table is rebuilt every time the NameNode comes up.

The startup process reads the fsp_w_picpath file:

NN will load the data in fsp_w_picpath at startup and construct the metadata structure.

During the startup process, it will be in safemode mode, and the data of HDFS will not depend on the operation, and the default time is 30 seconds.

Sfaemode will be shut down after successful startup

The NN node shows that it has been started

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report