How many nodes are there in the HDFS cluster 03/17 Update SLTechnology News&Howtos

How many nodes are there in the HDFS cluster

2026-03-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail how many nodes there are in the HDFS cluster. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.

HDFS clusters have two types of nodes that run in a manager-worker mode, namely, a name node (manager) and multiple data nodes (workers). The name node manages the namespace of the file system. It maintains the file system tree and all the files and index directories in the tree. This information permanently saves the file on the local disk in two forms: namespace mirroring and editing logs. The name node also records the data node where each block of each file resides, but it does not permanently save the location of the block because the information is rebuilt by the data node when the system starts.

The client accesses the entire file system on behalf of the user by interacting with the name node and the data node. The client provides a file system interface similar to POSIX (Portable operating system Interface), so users do not need to know the name node and data node and their functions when programming.

The data node is the worker of the file system. They store and provide services for locating blocks (when invoked by a user or name node), and periodically send a list of blocks they store to the name node.

Without a name node, the file system will not be available. In fact, if the machine running the name node is destroyed, all files on the file system will be lost because we don't know how to rebuild the file through the blocks on the data node. Therefore, it is important that name nodes be able to withstand failures, and Hadoop provides two mechanisms to ensure this.

The first mechanism is to copy the files that make up the persistent state of the file system metadata. Hadoop can be configured to have a name node write its persistence state on multiple file systems. These write operations are synchronous and atomic. The general configuration option is to write a remote NFS mount (mount) while writing to the local disk.

Another possible way is to run a secondary name node, although it cannot be used as a name node. The important role of this secondary name node is to periodically merge namespace images by editing the log to prevent the editing log from being too large. This secondary name node typically runs on other separate physical machines because it also takes up a lot of CPU and memory to perform merge operations. It saves a copy of the merged namespace image, which can be used after the name node fails. However, the state of the secondary name node lags behind that of the primary node, so if all the data of the primary node is lost, the loss is still inevitable. In this case, the primary name node metadata that exists on the NFS is generally copied to the secondary name node and run as a new primary name node.

This is the end of the article on "how many nodes are there in the HDFS cluster". I hope the above content can be helpful to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.