Introduction to hadoop related processes 02/12 Update SLTechnology News&Howtos

Introduction to hadoop related processes

2026-02-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

1.NameNode:

Equivalent to a leader, responsible for scheduling, for example, you need to save a 1280m file.

If you divide the blocks according to 128m, then namenode will put the 10 blocks (copies are not considered here)

Assign it to the datanode in the cluster and record the relationship. When you want to download this file, namenode will know which nodes to fetch the data for you. It mainly maintains two map, one is the correspondence from file to block and the other is from block to node.

2. Secondarynamenode:

It is a snapshot of namenode and is based on the values set in configuration

Decide how much time to cp namenode periodically and record in namenode

Metadata and other data of

3. NodeManager (NM):

Is an agent on each node in the YARN that manages a single compute node in the Hadoop cluster

Including maintaining communication with ResourceManger and supervising the life cycle management of Container

Monitor resource usage (memory, CPU, etc.) of each Container and track node health

Health, administrative logs and ancillary services used by different applications (auxiliary service)

4.DataNode:

The first task that a.DataNode needs to complete is Kmuri V storage.

b. Complete communication with namenode, which is achieved through an IPC heartbeat connection.

In addition, there is also information exchange with other datanode on the client side.

c. Complete large-scale communication with clients and other nodes, which requires direct communication

It is realized by socket protocol.

5.ResourceManager:

In YARN, ResourceManager is responsible for the unified management and allocation of all resources in the cluster. It receives resource report information from each node (NodeManager) and allocates this information to each application (actually ApplicationManager) according to certain policies.

RM works with the NodeManagers (NMs) of each node and the ApplicationMasters (AMs) of each application.

A.NodeManagers follows instructions from ResourceManager to manage available resources on a single node.

B.ApplicationMasters is responsible for negotiating resources with ResourceManager and working with NodeManagers to launch the container.

Introduction to 6.Hadoop2 MR-JobHistory Service

1) MR-JobHistory service objectives

It mainly provides historical mapred Job queries to users.

Detailed explanation:

A) when running MR Job, the ApplicationMaster of MR saves the job history information of MR Job to the specified hdfs path (first to the temporary directory, then mv to the final directory) according to the configuration in the mapred-site.xml configuration file.

B) if nodemanage's LogAggregation function is not configured in yarn-site.xml, historyserver cannot provide the mr log detail query function (both yarn and mapred configuration are required when submitting mr job)

C) historyserver provides two interfaces, web interface can provide jobhistory and detail log query function, while restApi only provides jobhistory query function

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.