What is Hadoop NameNode? 04/10 Update SLTechnology News&Howtos

What is Hadoop NameNode?

2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "what is Hadoop NameNode". In daily operation, I believe many people have doubts about what Hadoop NameNode is. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the doubts about "what is Hadoop NameNode?" Next, please follow the editor to study!

What does "run Hadoop" mean?

This means running a set of daemons (daemons) on different servers distributed over the network. These daemons have special roles, some only exist on a single server, and some run on multiple servers.

Who are these characters?

NameNode (name node)

DataNode (data node)

Secondary NameNode (secondary name node)

JobTracker (Job tracking Node)

TaskTracker (Task tracking Node)

What is the structure of distributed storage?

Distributed storage system is called Hadoop file system, or HDFS for short.

Hadoop Distribution File System

Hadoop adopts master / slave (master/slave) structure in both distributed computing and distributed storage.

What is NameNode and what does it do?

The most important of the Hadoop daemons.

NameNode is located on the master side of HDFS, and it instructs the DataNode on the slave side to perform the underlying Icano task.

NamNode tracks how files are divided into file blocks, which nodes store those blocks, and whether the overall health of the distributed file system is normal.

Running NameNode consumes a lot of memory and Iswap O resources. Therefore, in order to reduce the debt of the machine, the server hosting NameNode usually does not store user data or perform the computing tasks of MapReduce programs. This means that the NameNode server will not be either DataNode or TaskTracker.

However, the importance of NameNode also has a negative impact-the single point of failure of Hadoop clusters. For any other daemon, if the node on which they reside fails in software or hardware, the Hadoop cluster will probably continue to run smoothly, otherwise you can quickly restart the node. However, this approach does not apply to NameNode.

What is DataNode and what does it do?

Each slave node hosts a DataNode daemon to perform the heavy work of the distributed file system-reading or writing HDFS blocks to the actual file of the local file system.

When the HDFS file is read and written, the file is divided into multiple blocks, and the NameNode tells the client which DataNode each block resides in. The client communicates directly with the DataNode daemon to process the local files corresponding to the data block. DataNode then communicates with other DataNode to copy these blocks for redundancy.

NameNode interacts with DataNode?

Interacting in HDFS, NameNode tracks the metadata of the file.

What is the metadata of the files held by NameNode?

Describes the files contained in the system and how each file is divided into blocks. DataNode provides backup storage of blocks and continuously reports to NameNode to keep the metadata up-to-date.

How is the file storage on DataNode saved?

Files are stored in blocks on HDFS, with the default size (64MB). NameNode specifies which DataNode these blocks are stored in, and each block has 3 copies by default, ensuring that if a DataNode collapses, the data will not be lost. During initialization, each DataNode informs NameNode of the currently stored data blocks. After initialization, DataNode will constantly update local information to NameNode and receive instructions.

What does Secondary NameNode do?

Secondary NameNode (SNN) is a secondary daemon used to monitor the state of the HDFS cluster. Just like NameNode, there is one SNN per cluster, which usually has an exclusive server, which does not run other DataNode or TaskTacker daemons. SNN differs from NameNode in that it does not receive or record any real-time changes to HDFS. Instead, it communicates with NameNode to take snapshots of HDFS metadata based on the time interval configured by the cluster.

As mentioned earlier, NameNode is a single point of failure for Hadoop clusters, and snapshots of SNN can help reduce downtime and reduce the risk of data loss. However, the failure handling of NameNode requires human intervention, that is, manually reconfiguring the cluster to use SNN as the primary NameNode.

What is JobTracker?

The JobTracker daemon is the link between the application and the Hadoop.

What did JobTracker do?

Once the code is submitted to the cluster, JobTracker determines the execution plan, including deciding which files to process, assigning nodes to different tasks, and monitoring the operation of all tasks. If a task fails, JobTacker automatically restarts the task, but the assigned nodes may be different and are limited by a predefined number of retries.

How many JobTracker daemons are there in a Hadoop cluster?

There is only one JobTracker daemon per Hadoop cluster, which usually runs on the primary node of the server cluster.

What is TaskTracker?

Like stored daemons, computed daemons follow the master / slave architecture: JobTracker, as the master node, detects the entire execution of MapReduce jobs, while TaskTracker manages the execution of each task on each slave node.

Each TaskTracker is responsible for performing individual tasks assigned by the JobTracker. Although there is only one TaskTracker on each slave node, each TaskTracker can produce multiple JVM (Java virtual machines) to handle many map or reduce tasks in parallel.

One of the responsibilities of TaskTracker is to communicate continuously with JobTracker. If JobTracker does not receive a "heartbeat" from TaskTracker within the specified time, it assumes that TaskTracker has crashed and resubmits the corresponding task to other nodes in the cluster.

How does JobTracker call TaskTracker?

The interaction between JobTracker and TaskTracker, when the client calls JobTracker to start a data processing job, JobTacker splits the work and assigns different map and reduce tasks to each TaskTracker in the cluster.

What is a typical Hadoop topology?

Run the daemons for NameNode and JobTracker on the primary node, and run SNN with separate nodes to prevent the primary node from failing. In a small cluster, SNN can also reside on a slave node, while in a large cluster, even NameNode and JobTracker reside on two machines respectively. Each slave node hosts a DataNode and TaskTracker to perform tasks on the same node where the data is stored.

At this point, the study of "what is Hadoop NameNode" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.