What is the basic structure and operation principle of Yarn in Hadoop framework? 07/09 Update SLTechnology News&Howtos

What is the basic structure and operation principle of Yarn in Hadoop framework?

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail what is the basic structure and operation principle of Yarn in the Hadoop framework. The content of the article is of high quality, so the editor will share it with you for reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

I. the basic structure of Yarn

Three core components of Hadoop: distributed file system HDFS, distributed computing framework MapReduce, distributed cluster resource scheduling framework Yarn. Yarn did not exist in the early days of Hadoop, but was born in the upgrade and development of Hadoop, a typical Master-Slave architecture.

Yarn includes two main processes: resource manager Resource-Manager and node manager Node-Manager.

Resource manager

Usually deployed on a separate server to handle client requests

Deal with resource allocation and scheduling management in a cluster

Node Manager

Manage resources on the current node

Execute and deal with various specific commands

Monitor node resources and report to resource manager

ApplicationMaster

Provide fault tolerance and cut data

Request resources and assign tasks to the application

Container

A concept of dynamic Resource allocation in Yarn

The container contains a certain amount of memory, CPU and other computing resources

Started and managed by the NodeManager process

II. Basic implementation process

Submit a MapReduce application to Yarn for scheduling

The RM component returns the resource submission path and ApplicationId

RM process NM process communication, allocating containers according to cluster resources

Distribute the MRAppMaster to the containers assigned above

The resources needed for running are submitted to HDFS to apply for running MRAppMaster.

RM converts the client request into a Task task by doing the above

What runs in the container is a Map or Reduce task

The task communicates with MRAppMaster to report the status during operation

After task execution, the process logs out and releases container resources

MapReduce application development follows the MapReduceApplicationMaster of Yarn specification, so it can be run on Yarn. If other computing frameworks also follow this specification, the unified scheduling and management of resources can be realized.

Resource Scheduler

The basic function of the scheduler is to schedule tasks to each node for execution according to the use of node resources and job requirements. The key factors to understand the task queue are as follows: entry and exit mode, priority, capacity, etc.

There are three main types of Hadoop job schedulers: FIFO, CapacityScheduler, and FairScheduler. The default resource scheduler is CapacityScheduler.

FIFO scheduler

FIFO is a batch scheduler in which the scheduling strategy is based on the priority of jobs, and then the jobs to be executed are selected according to the order of arrival time.

Capacity scheduler

CapacityScheduler supports multiple queues, and each queue can be configured with a certain amount of resources. Each queue adopts FIFO scheduling policy, calculates the ratio of running task books to computing resources in the queue, selects the queue with a small ratio and relatively idle, and then installs the sorting of job priority and submission time. In order to prevent jobs of the same user from monopolizing resources in the queue, the scheduler limits the amount of resources that jobs submitted by the same user can occupy.

For example, in the above illustration, suppose the slot is divided into three queues (ABC) according to the following allocation rules: queue A gives 20% of the resources, queue B gives 50% of the resources, and queue C gives 30% of the resources; all three queues are executed in the order of tasks, and the above job11, job21 and job31 are the first to run and run in parallel.

Fair scheduler

Similar to the principle of capacity scheduler, it supports multiple queues and multiple users, the amount of resources in each queue can be configured, and jobs in the same queue share all resources in the queue fairly.

For example, there are three queues (ABC). The job in each queue allocates resources according to priority. The higher the priority, the more resources are allocated, but each job is assigned resources to ensure fairness. In the case of limited resources, there is a gap between the ideal computing resources obtained by each job and the actual computing resources, and this gap is called vacancy. In the same queue, the larger the resource shortage of job, the first to get the priority of resource execution, and the jobs are executed according to the level of vacancy.

Source code address GitHub address https://github.com/cicadasmile/big-data-parentGitEE address https://gitee.com/cicadasmile/big-data-parent

On the Hadoop framework of the basic structure and operation of Yarn what is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.