An introduction to the basic concepts of HDFS 04/27 Update SLTechnology News&Howtos

An introduction to the basic concepts of HDFS

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

1. The design idea of HDFS?

hdfs is a distributed file system that is used to store large amounts of data on cheap clusters.

1. Large files are cut into small files, and the idea of divide and conquer is used to allow many servers to jointly manage the same file.

two。 Each small file is backed up redundant and distributed to different servers to ensure high reliability and no loss.

2. The structure of HDFS?

Namenode: the leader of the cluster, in charge of the file system directory tree, handles client reads and requests

SecondaryNamenode: persists metadata, mainly for namenode to share the pressure

DataNode: stores all data blocks of the entire cluster and handles real data reading and writing

3. The characteristics of HDFS?

The file in -HDFS is physically partitioned storage (block). The size of the block can be specified by the configuration parameter (dfs.blocksize). The default size is 128m in the hadoop2.x version and 64m in the previous version.

The -HDFS file system provides a unified abstract directory tree for the client, and the client accesses the file through the path.

-namenode is the master node of the HDFS cluster, which is responsible for maintaining the directory tree of the entire hdfs file system and the block block information corresponding to each path (file) (block's id, and the datanode server).

-datanode is a HDFS cluster slave node. Each block can store multiple replicas on multiple datanode (the number of replicas can also be set by parameter dfs.replication. Default is 3)

-HDFS is designed to adapt to write-once, read-out scenarios, and does not support file modification.

4. What are the advantages and disadvantages of HDFS? Advantages: can be built on cheap machines, improve reliability through multiple copies, provide fault tolerance and recovery mechanism with high fault tolerance, automatically save multiple copies of data, after loss of copies, automatic recovery is suitable for batch processing, mobile computing rather than data, data location exposure to the computing framework is suitable for big data processing, GB, TB, or even PB-level data streaming file access, write once, read multiple times Disadvantages of ensuring data consistency: low-latency data access, not suitable for low-latency high-throughput small file access, not suitable for small file storage, space consumption, seek time beyond read time does not support concurrent writes, and random reads. Hdfs can only have one writer at a time, and does not support multiple inserts, but can only append

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.