What are the differences between NameNode and DataNode in HDFS distributed storage 04/26 Update SLTechnology News&Howtos

What are the differences between NameNode and DataNode in HDFS distributed storage

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces what are the differences between NameNode and DataNode in HDFS distributed storage, which can be used for reference by interested friends. I hope you can learn a lot after reading this article.

Distributed storage framework

The implementation of distributed storage technology is often inseparable from the underlying distributed storage framework. According to the type of storage, it can be divided into block storage, object storage and file storage. In the mainstream distributed storage technology, HDFS belongs to file storage, Swift belongs to object storage, and Ceph can support block storage, object storage and file storage, so it is called unified storage.

HDFS is one of the core components of Hadoop and the basis of data storage management in distributed computing. It is designed to be a distributed file system suitable for running on general hardware.

The function module of HDFS

(1) Client

Client is a means for users to interact with HDFS. When a file is uploaded to HDFS, Client splits the file into a Block, and then uploads it; Client obtains the location information of the file by interacting with NameNode; it interacts with DataNode to read or write data; Client can also provide commands such as NameNode format to manage HDFS; at the same time, Client can access HDFS through operations such as adding, deleting, changing and querying HDFS.

(2) NameNode

NameNode is the Master architecture of HDFS, which maintains the file system tree and all the files and directories in the whole tree. In the HDFS file system, NameNode handles the client read and write requests, manages the mapping information of data blocks (Block), and configures replica policies.

(3) DataNode

NameNode gives commands, and DataNode performs the actual operation. DataNode represents the actual stored data block and can read and write the data block at the same time.

(4) Secondary NameNode

The main function of Secondary NameNode is to assist NameNode and share its workload; it can assist in restoring NameNode in emergency, but it cannot replace NameNode and provide services.

Advantages of HDFS

Fault tolerance: multiple copies of data are automatically saved. Improve fault tolerance by increasing the form of copies. When one of the copies is lost, it can be automatically restored.

Can be built on a cheap machine, through the multi-copy mechanism to improve reliability.

Characteristics of HDFS

Fault detection and recovery-component failures are frequent because HDFS contains a large number of product hardware. Therefore, HDFS should have the mechanism of fast and automatic fault detection and recovery.

Dataset management-HDFS has hundreds of nodes per cluster to manage applications with large datasets.

Especially when a large number of data sets are involved, it reduces network traffic and increases throughput.

The function of HDFS

Distributed storage and processing of data.

Hadoop provides a command interface to interact with HDFS.

The built-in servers for namenode and datanode help users easily check the status of the cluster.

Streaming access to file system data.

HDFS provides file permissions and authentication.

Architecture of HDFS

The following is the architecture of the Hadoop file system:

Elements of HDFS:

(1) Namenod

Namenode is the product hardware that includes the GNU/Linux operating system. It is a kind of software that can run on the product hardware. The system with Namenode acts as the primary server and performs the following tasks.

Manage file system namespaces

Adjust client access to files

Perform file system operations, such as renaming, closing, and opening files and directories.

(2) Datanode

Datanode is a product hardware with GNU/Linux operating system and data kernel software. For each product hardware / system in the cluster (cluster), there will be one data node that manages the data storage for their system.

Perform read and write operations on the file system as requested by the client.

Perform operations such as block creation, deletion, and copy according to namenode's instructions.

(3) Block

Typically, user data is stored in a file in HDFS. Files in the file system are divided into one or more fragments and stored in a single data node. These file segments are called block. In other words, the minimum amount of data that HDFS can read or write is called block. The default block size is 64MB, which can be changed according to the HDFS configuration.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.