The Architecture and principle of Yarn Resource scheduling system 07/01 Update SLTechnology News&Howtos

The Architecture and principle of Yarn Resource scheduling system

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

This article discusses the solution of the architecture and principle of Yarn resource scheduling system. The content of the article is of high quality, and friends in need can learn and learn from it.

1. Yarn introduction

Apache Hadoop YARN, a subproject of apache Software Foundation Hadoop, was introduced to separate Hadoop2.0 resource management and computing components. The birth of YARN is due to the fact that data stored in HDFS requires more interactive modes, not just MapReduce patterns. Hadoop2.0 's YARN architecture provides more processing frameworks, such as the spark framework, and no longer forces the use of the MapReduce framework.

from the architecture diagram of hadoop2.0, we can see that YARN undertakes the resource management functions originally undertaken by MapReduce, and packages these functions so that they can be used by the new data processing engine. This also simplifies the process of MapReduce, allowing MapReduce to focus on doing the best possible data processing. With YARN, you can run many applications on Hadoop with common resource management. At present, many organizations have developed YARN-based applications.

2. Yarn architecture

The architecture of YARN is also a classic master/slave structure, as shown in the following figure. Generally speaking, the YARN service consists of a ResourceManager (RM) and multiple NodeManager (NM), the ResourceManager master node (master) and the NodeManager slave node (slave)

in the YARN architecture, the global ResourceManager runs as the primary daemon, which mediates available cluster resources between competing applications. ResourceManager tracks the number of active nodes and resources available on the cluster and coordinates when and when applications submitted by users should acquire these resources. ResourceManager is a single process with this information, so it can allocate (or, more precisely, schedule) decisions (for example, based on application priority, queue capacity, ACL, data location, etc.) in a shared, secure, and multi-tenant manner.

when a user submits an application, a lightweight process instance called ApplicationMaster is launched to coordinate the execution of all tasks in the application. This includes monitoring tasks, restarting failed tasks, running slow tasks speculatively, and calculating the total value of application counters. These responsibilities were previously assigned to a single JobTracker for all jobs. ApplicationMaster and tasks belonging to its application run in a resource container controlled by NodeManagers.

ApplicationMaster can run any type of task within the container. For example, MapReduce ApplicationMaster requests the container to start a map or reduce task, while Giraph ApplicationMaster requests the container to run the Giraph task. You can also implement a custom ApplicationMaster that runs specific tasks, and in this way create a shiny new distributed application framework that can change big data's world. I encourage you to read Apache Twill, which is designed to simplify writing distributed applications on top of YARN.

A cluster ResourceManager,NodeManager and container that can run any distributed application does not care about the type of application or task. All application framework-specific code is simply moved to its ApplicationMaster so that YARN can support any distributed framework, as long as someone implements the appropriate ApplicationMaster for it. Because of this general approach, the dream of Hadoop YARN clusters running many different workloads has come true. Imagine that a single Hadoop cluster in the data center can run MapReduce,Giraph,Storm,Spark,Tez / Impala,MPI, and so on.

Core components:

The component name acts as the guardian and manager of the Application, and is responsible for monitoring and managing the specific operation of all the Attempt of the Application on each node in the cluster, and is also responsible for applying for resources from the YarnResourceManager, returning resources, etc.; ApplicationMaster is equivalent to the guardian and manager of the Application, responsible for monitoring and managing the specific operation of all the Attempt of the Application on each node in the cluster, and is responsible for applying for resources from YarnResourceManager, returning resources, etc. NodeManager is an independent process running on Slave, which is responsible for reporting the status of nodes (disk, memory, cpu usage information, etc.). Container is a unit that allocates resources in yarn, including memory, CPU, etc., and YARN allocates resources in Container

RM is a global resource manager, and there is only one cluster, which is responsible for the resource management and allocation of the whole system, including handling client requests, starting / monitoring ApplicationMaster, monitoring NodeManager, and resource allocation and scheduling. It mainly consists of two components: scheduler (Scheduler) and application manager (Applications Manager,ASM).

(1) scheduler

The scheduler allocates resources in the system to running applications according to capacity, queues and other constraints (for example, each queue allocates certain resources, up to a certain number of jobs, etc.). It should be noted that the scheduler is a "pure scheduler" that engages in any work related to a specific application, such as not responsible for monitoring or tracking the execution status of the application, nor for restarting failed tasks caused by application execution failure or hardware failure, which are done by the application-related ApplicationMaster.

The scheduler allocates resources only according to the resource requirements of each application, while the resource allocation unit is represented by an abstract concept "ResourceContainer" (Container). Container is a dynamic resource allocation unit, which encapsulates resources such as memory, CPU, disk, network and so on, thus limiting the amount of resources used by each task.

(2) Application Manager

The application manager is mainly responsible for managing all the applications in the whole system, receiving the submission request from job, and assigning the first Container to the application.

Line ApplicationMaster, including application submission, negotiating resources with the scheduler to start ApplicationMaster, monitoring

ApplicationMaster running status and restarting it in case of failure, etc.

2.2 、 ApplicationMaster

manages each instance of an application running within YARN. The management of job or applications is the responsibility of the ApplicationMaster process, and Yarn allows us to develop ApplicationMaster for our applications.

function:

(1) data segmentation

(2) requests resources for the application and further assigns them to internal tasks (TASK)

(3) Task Monitoring and Fault tolerance

is responsible for coordinating resources from ResourceManager and monitoring easy execution and resource usage through NodeManager. The dynamic nature of Yarn is a process in which ApplicationMaster from multiple Application communicates with ResourceManager dynamically and constantly applies, releases, re-applies and releases resources.

2.3 、 NodeManager

There are multiple NodeManager throughout the cluster, responsible for resources and usage on each node.

NodeManager is a slave service: it is responsible for receiving resource allocation requests from ResourceManager and assigning specific Container to applications. It is also responsible for monitoring and reporting Container usage information to ResourceManager. In conjunction with ResourceManager, NodeManager is responsible for the allocation of resources throughout the Hadoop cluster.

function: resource usage on this node and the running status of each Container (resources such as cpu and memory)

(1) receives and processes command requests from ResourceManager and assigns Container to a task of the application

(2) reports regularly to RM to ensure the smooth operation of the whole cluster. RM tracks the health of the entire cluster by collecting the report information of each NodeManager, while NodeManager is responsible for monitoring its own health.

(3) processes requests from ApplicationMaster

(4) manages the life cycle of each Container where the node is located.

(5) manages logs on each node

(6) performs some additional services applied above the Yarn, such as the shuffle process of MapReduce

when a node starts, it registers with ResourceManager and tells ResourceManager how many resources it has available. At run time, through NodeManager and ResourceManager working together, this information is constantly updated and ensures that the entire cluster is at its best. NodeManager is only responsible for managing its own Container, and it doesn't know the information about the applications running on it. The component responsible for managing application information is ApplicationMaster

2.4 、 Container

Container is a resource abstraction in YARN, which encapsulates multi-dimensional resources on a node, such as memory, CPU, disk, network, etc. When AM applies for resources from RM, the resources returned by RM for AM are represented by Container. YARN assigns a Container to each task, and the task can only use the resources described in that Container.

The relationship between Container and cluster nodes is that a node runs multiple Container, but a Container does not cross nodes. Any job or application must run in one or more Container. In the Yarn framework, the ResourceManager is only responsible for telling the ApplicationMaster which Containers is available, and the ApplicationMaster also needs to go to the NodeManager request to assign a specific Container.

What needs to note is that Container is a dynamic resource division unit that is dynamically generated according to the needs of the application. So far, YARN only supports CPU and memory resources, and uses lightweight resource isolation mechanism Cgroups for resource isolation.

2.5, Resource Request and Container

The design goal of Yarn is to allow our various applications to use the entire cluster in a shared, secure, multi-tenant form. Moreover, in order to ensure the efficiency of cluster resource scheduling and data access, Yarn must also be able to perceive the whole cluster topology.

in order to achieve these goals, ResourceManager's scheduler Scheduler defines some flexible protocols for application resource requests, through which applications running in the cluster can be better scheduled. Therefore, Resource Request and Container are born. An application first sends a resource request to ApplicationMaster that meets its needs, and then ApplicationMaster requests the resource with

The Scheduler,Scheduler sent to ResourceManager in the form of resource-request returns the assigned resource description Container in the original resource-request. Each ResourceRequest can be treated as a serializable Java object that contains the following field information:

2.6 、 JobHistoryServer

Job History Service, which records the running history of jobs scheduled in yarn. There is no need to make any configuration on the data node machines in the cluster through the mr-jobhistory-daemon.sh start historyserver command. You can start directly with the command alone. After starting successfully, there will be a JobHistoryServer process (checked with the jps command, which will be described below), and you can view the log details from port 19888.

Open the interface of the following figure, click History in the following figure, and the page will make a jump

The page that jumps after clicking History is blank as shown in the following figure, which is caused by the fact that we did not start jobhistoryserver. Execute the mr-jobhistory-daemon.sh start historyserver command on the three machines to start jobhistoryserver in turn. Start at the node1 node

At this point, after we start jobhistoryserver on three nodes, run the wordcount program here (remember to delete the output directory before starting)

Clicking on the History connection will jump to a new page. At the bottom of the page, you will see the map and reduce,Total task data listed in TaskType that indicate the map and reduce task data required by the running mapreduce program.

2.7 、 Timeline Server

is used to write log service data, generally to write log service data combined with third parties (such as spark, etc.). From the introduction of the official website, it is an effective supplement to the function of jobhistoryserver. Jobhistoryserver can only record job information of mapreduce type. In addition to jobhistoryserver being able to record information during job running, there are also more fine-grained information records, such as which queue the task is running in. Which user is set when the task is run. According to the official website, jobhistoryserver can only record the records of mapreduce applications. Timelineserver is more powerful, but it is not a substitute for jobhistory. The complementary relationship between the two functions.

3. The operation principle of yarn application

How does YARN work? The basic idea of YARN is to divide the two major functions of JobTracker/TaskTracker into the following entities:

1. A global resource management ResourceManager

One ApplicationMaster per application, one NodeManager per slave node, one Container running on the NodeManager per application.

ResouceManager and NodeManager form a new, general-purpose system for distributed management applications. ResourceManager has the ultimate arbitration authority over the application resources in the system. ApplicationMaster is a framework-specific entity whose responsibility is to negotiate resources with ResourceManager while performing and monitoring component tasks for NodeManager (s). RessourceManager has a scheduler that allocates resources to various running applications according to different constraints, such as queue capacity, user restrictions, and so on. The scheduler performs the scheduling function based on the application's resource request. NodeManager publishes the application container, monitors the use of resources, and reports to ResourceManager. Each ApplicationMaster has a responsibility to negotiate appropriate resource containers from the scheduler, track their status, and monitor their progress. From the view of the system, ApplicationMaster runs as a normal container. 3.1.The process of yarn application submission

The execution process of Application in Yarn can be summarized into three steps:

(1) Application submission

(2) starts the ApplicationMaster instance of the application

(3) ApplicationMaster instance management application execution

Specific process:

(1) the client program submits the application to ResourceManager and requests an ApplicationMaster instance

(2) ResourceManager finds a NodeManager that can run a Container, and starts the ApplicationMaster instance in this Container

(3) ApplicationMaster registers with ResourceManager. After registration, the client can query ResourceManager to get the details of its own ApplicationMaster, and then interact directly with its own ApplicationMaster (at this time, the client actively communicates with ApplicationMaster, and the application first sends a resource request to ApplicationMaster to meet its own needs)

(4) during the normal operation, ApplicationMaster sends a resource-request request to ResourceManager according to the resource-request protocol.

(5) when Container is successfully allocated, ApplicationMaster initiates Container,container-launch-specification by sending container-launch-specification messages to NodeManager, which contains the materials needed to enable Container and ApplicationMaster to communicate

The code of the (6) application runs in the startup Container in the form of task, and sends the progress, status and other information to ApplicationMaster through the application-specific protocol.

(7) during the operation of the application, the client submitting the application actively communicates with ApplicationMaster to obtain information such as the running status and progress updates of the application. The communication protocol is also the application-specific protocol.

(8) once the application has been executed and all related work has been completed, ApplicationMaster unregisters with ResourceManager and then closes, and all Container used is returned to the system.

3.2 、 mapreduce on yarn

(1) client submits hadoop jar

(2) finds the job.waitForCompletition in the main () method to generate the job object, runs the runjob () method of the job object, and communicates with ResourceManager

Go back to ResourceManager and assign an ID number (applicationid). If you need an output, determine whether the output exists. If there is no problem, look at the input, get the input according to hdfs, get the number of map according to the slicing information, and send the information back to the client Job.

(3) upload the resources needed by Job, such as Jar packages, configuration files, split sharding information, to hdfs.

(4) Job object notification and ResourceManager submission application

(5) ResourceManager finds the right node to open the container container (essentially a JVM virtual machine)

(6) starts an applicationMaster in container, initializes Job, generates a notebook object (that is, a notepad), and records the status of map and reduce. Upload all job resources to hdfs, including jar packages, configuration files, split information

(7) applicationMaster requests resources from ResourceManager and returns resource information (including node node address, cpu information, memory share, IO information)

After receiving the message, (8) applicationMaster communicates with NodeManager to transmit resource information.

(9) starts the YarnChild process and obtains Job details (including Jar packages, configuration file information) from hdfs

(1) Job initialization: 1. When resourceManager receives the call notification of the submitApplication () method, scheduler begins to assign container, and then ResouceManager sends the applicationMaster process to inform each nodeManager manager. 2. It is up to applicationMaster to decide how to run tasks. If the amount of job data is relatively small, applicationMaster chooses to run tasks in a JVM. So how to tell whether the job is big or small? When the number of mappers of a job is less than 10, there is only one reducer or the file size read is less than one HDFS block, (can be adjusted by modifying the configuration items mapreduce.job.ubertask.maxmaps,mapreduce.job.ubertask.maxreduces and mapreduce.job.ubertask.maxbytes) 3. Before running tasks, applicationMaster will call the setupJob () method and then create the output path of the output (this can explain whether your mapreduce initially reported an error or not The output path is created.

(2) Task task assignment: 1. Next, applicationMaster requests containers to ResourceManager for tasks (step 8) used to execute map and reduce, where the priority of map task is higher than that of reduce task. When all map tasks ends, sort is carried out (here is later in the shuffle process), and finally reduce task begins. (here is a point: when map tasks executes 5%, reduce will be requested. 2. Running tasks consumes memory and CPU resources. By default, task resources for map and reduce are allocated to 1024MB and a core, (minimum and maximum parameter configuration for operation can be modified, mapreduce.map.memory.mb,mapreduce.reduce.memory.mb,mapreduce.map.cpu.vcores,mapreduce.reduce.reduce.cpu.vcores.)

(3) running progress and status update 1. MapReduce is a batch process with long running time, which can be one hour,

For hours or even days, it is very important to monitor the running status of Job. Every job and every task has one that contains

The status of job (running,successfully completed,failed), and the counters, status information and description information of value (description information is usually printed in the code), so how does this information communicate with the client? 2. When a task starts to execute, it will keep a running record and record the percentage of completed task. For the task of map, it will record the percentage of its running. For reduce, it may be a little more complicated, but the system will still estimate the percentage of completed reduce. When a map or reduce task is executed, the child process interacts with the applicationMaster every three seconds.

4. Yarn uses 4. 1, configuration file mapreduce.framework.nameyarnyarn.nodemanager.aux-servicesmapreduce_shuffle4.2, yarn start and stop

Start ResourceManager and NodeManager (hereinafter referred to as RM and NM respectively)

# Master node run command $HADOOP_HOME/sbin/start-yarn.sh# master node run command $HADOOP_HOME/sbin/stop-yarn.sh if RM is not started, it can be started separately

If RM is not started, it can be started separately.

# if RM is not started, run the command $HADOOP_HOME/sbin/yarn-daemon.sh start resouremanager# on the primary node. On the contrary, close $HADOOP_HOME/sbin/yarn-daemon.sh stop resouremanager separately.

If NM is not started, it can be started separately.

# if NM is not started, run the command $HADOOP_HOME/sbin/yarn-daemon.sh start nodemanager# on the corresponding node. On the contrary, close the common commands # 1 of $HADOOP_HOME/sbin/yarn-daemon.sh stop nodemanager4.3 and yarn separately. View the running task yarn application-list#2. Kill the running task yarn application-kill task id#3. Check the list of nodes yarn node-list#4. Check the node status TODOyarn node-status node3:40652#5. View yarn's jar-dependent environment variables yarn classpath5, Yarn scheduler

There are three schedulers to choose from in Yarn: FIFO Scheduler, Capacity Scheduler,FairS cheduler

5.1 、 FIFO Scheduler

FIFO Scheduler arranges applications into a queue in the order in which they are submitted, which is a first-in, first-out queue. In resource allocation, resources are allocated to the top application in the queue first, and then to the next one after the top application needs are met, and so on. FIFO Scheduler is the simplest and easiest to understand scheduler and does not require any configuration, but it is not suitable for shared clusters. Large applications may take up all cluster resources, which causes other applications to be blocked. In a shared cluster, it is more appropriate to use Capacity Scheduler or FairScheduler, both of which allow large and small tasks to be submitted with certain system resources at the same time. The following "Yarn scheduler comparison diagram" shows the differences between these schedulers, as you can see in the FIFO scheduler

5.2 、 Capacity Scheduler

For Capacity scheduler, there is a special queue for running small tasks, but setting a special queue for small tasks takes up a certain set in advance.

Group resources, which causes the execution time of large tasks to lag behind that of using the FIFO scheduler.

How to configure capacity Scheduler

The queue hierarchy is as follows:

Root

├── prod

└── dev

├── spark

└── hdp

Create a new capacity-scheduler.xml; in HADOOP_HOME/etc/hadoop/ as follows:

Yarn.scheduler.capacity.root.queues prod,dev yarn.scheduler.capacity.root.dev.queues hdp Spark yarn.scheduler.capacity.root.prod.capacity 40 yarn.scheduler.capacity.root.dev.capacity 60 yarn.scheduler.capacity.root.dev.maximum-capacity 75 yarn.scheduler.capacity.root.dev.hdp.capacity 50 yarn.scheduler.capacity.root.dev.spark.capacity 50

The queue in which the application is placed depends on the application itself.

For example, MR, you can specify the corresponding queue by setting the property mapreduce.job.queuename. Take WordCount as an example, as shown below, if the specified queue does not exist, an error occurs. If not specified, the "default" queue is used by default, as shown below

5.3 、 Fair Scheduler

In the Fair scheduler, we do not need to occupy a certain amount of system resources in advance, and the Fair scheduler dynamically adjusts system resources for all running job. When the first large job is submitted, only this job is running, and it gets all the cluster resources; when the second small task is submitted, the Fair scheduler allocates half of the resources to the small task, allowing the two tasks to share cluster resources fairly. It is important to note that in the Fair scheduler shown below, there is a delay from the second task submission to the acquisition of resources, because it needs to wait for the first task to release the occupied Container. After the completion of the execution of the small task, it will release the resources occupied by itself, and the large task will get all the system resources. The final result is that the Fair scheduler not only achieves high resource utilization but also ensures that small tasks are completed in time.

After reading the above, do you have any further understanding of the architecture and principle of the Yarn resource scheduling system? If you want to know more about it, you are welcome to follow the industry information channel. Thank you for reading.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.