How to realize YARN Capacity Scheduler 07/13 Update SLTechnology News&Howtos

How to realize YARN Capacity Scheduler

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to achieve YARN Capacity Scheduler". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to achieve YARN Capacity Scheduler".

Characteristics

Resources are divided according to queue units, and each queue can set a certain proportion of minimum guarantee and upper limit of resource use. at the same time, each user can also set a certain upper limit of resource use to prevent resource abuse. When there are remaining resources in one queue, the remaining resources can be temporarily shared with other queues. In a word, Capacity Scheduler has the following main characteristics:

Capacity assurance: administrators can set a minimum resource guarantee and an upper limit for resource usage for each queue, and all applications submitted to that queue share these resources

Flexibility: if there are resources left in one queue, they can be temporarily shared with those queues that need them, and once a new application is submitted in that queue, the resources released by other queues will be returned to the queue.

Multi-lease: support multi-user sharing cluster and multi-application running at the same time. To prevent a single application, user, or queue from monopolizing resources in the cluster, administrators can add multiple constraints (such as the number of tasks a single application runs at the same time, etc.)

Security: each queue has a strict ACL list that specifies who it accesses, and each user can specify which users are allowed to view the running status of their applications or control their applications (such as killing applications). In addition, administrators can specify queue administrators and cluster system administrators

Dynamically update configuration files: administrators can dynamically modify various configuration parameters as needed to achieve online cluster management

The function of Capacity Scheduler

Capacity Scheduler has its own configuration file, that is, capacity-scheduler.xml stored in the conf directory

In the configuration file of Capacity Scheduler, the configuration name of the parameter Y of queue queueX is yarn.scheduler.capacity.queueX.Y

Parameters related to resource allocation:

Capacity: the minimum resource capacity (percentage) of the queue. Note that the total capacity of all queues should be less than 100

Maximum-capacity: upper limit of resource usage of the queue

Minimum-user-limit-percent: minimum resource guarantee per user (percentage)

User-limit-factor: maximum amount of resources per user (percentage)

Parameters that limit the number of applications:

Maximum-applications: the upper limit on the number of applications waiting and running in the cluster or queue, which is a strong limit. Once the number of applications in the cluster exceeds this limit, subsequently submitted applications will be rejected. The default value is 10000. Hadoop allows this value in both clusters and queues, where the upper limit of the total number of clusters can be set by parameter yarn.scheduler.capacity.maximum-applications, which defaults to 10000, while a single queue can set its own value by parameter yarn.scheduler.capacity..maximum-applications.

Maximum-am-resource-percent: the upper limit of the proportion of resources used to run the application ApplicationMaster in the cluster. This parameter is usually used to limit the number of active applications. The upper limit of ApplicationMaster resource ratio for all queues can be set by parameter yarn.scheduler.capacity.maximum-am-resource-percent, while a single queue can set its own value by parameter yarn.scheduler.capacity..maximum-am-resource-percent.

Queue access control

State: queue status, which can be STOPPED or RUNNING. If a queue is in the STOPPED state, the user cannot submit the application to the queue or its subqueues. Similarly, if the root queue is in the STOPPED state, the user cannot submit the application to the cluster, but the running application can end normally so that the queue can exit gracefully

Acl_submit_application: define which users / user groups can submit applications to a given queue. This property is inherited, that is, if a user can submit an application to a queue, it can submit the application to all its subqueues

Acl_administer_queue: assign an administrator to the queue who can control all the applications of the queue, such as killing any application, etc. Similarly, this property is inherited, and if a user can submit an application to a queue, it can submit the application to all of its subqueues

When the administrator needs to dynamically modify the queue resource configuration, modify the configuration file conf/capacity-scheduler.xml, and then run "yarn rmadmin-refreshQueues"

Currently, Capacity Scheduler does not allow administrators to dynamically reduce the number of queues, and the updated configuration parameter value should be a legal value, otherwise it will cause configuration file loading failure.

Capacity Scheduler implements application initialization

After the application is submitted to ResourceManager, ResourceManager sends a SchedulerEventType.APP_ADDED event to Capacity Scheduler. When Capacity Scheduler receives this event, it creates a FiCaSchedulerApp object for the application to track and maintain the application's run-time information, and submits the application to the corresponding leaf queue, which performs a series of legitimacy checks on the application. Only through these legality checks can the application be considered successful, which includes the following aspects:

The user to which the application belongs has the application submission permission for the leaf queue

The queue and its parent queue are currently in the RUNNING state (recursive check)

The number of applications currently submitted in the queue has not reached the limit set by the administrator.

The number of applications submitted by the users to which the application belongs does not exceed the limit set by the administrator

Resource scheduling

When ResourceManager receives the heartbeat message from NodeManager, it sends a SchedulerEventType.NODE_UPDATE event to Capacity Scheduler. When Capacity Scheduler receives the event, it does the following in turn:

Processing heartbeat information: among the heartbeat messages sent by NodeManager, there are two types of information that need to be processed by the resource scheduler, one is the newly launched Container, the other is the running Container, as shown below:

For the newly launched Container, the resource scheduler sends a RMContainerEventType.LAUNCHED to the ResourceManager, which in turn removes the Container from the timeout monitoring queue. When the resource scheduler allocates a Container to the ApplicationMaster, in order to prevent the ApplicationMaster from wasting resources by not using the Container for a long time, it will add the Container to a timeout monitoring queue. If the Container in the queue is still unused for a period of time, the resource scheduler will reclaim the Container

For a running Container, the resource manager will recycle the resources it uses for subsequent reallocation of those resources

After processing the above two types of information, Capacity Scheduler allocates free resources on the node to the application

Allocation of resources

1. Container mainly contains five types of information:

Priority

The node where the expected resource is located

Resource quantity

Number of Container

Whether to relax locality (that is, whether to select rack locality resources when the node locality resources are not satisfied)

two。 After receiving the resource request, the resource scheduler temporarily stores these data requests in a data structure to wait for the idle resource to appear and allocate the appropriate resource to it.

3. When a node has free resources, it selects the queue, the application, and the container (request) to use the resource.

Step 1: select the queue

Starting from the root queue, traverse each sub-queue according to its sub-queue resource utilization from small to large. If the subteam is listed as a leaf queue, select a Container (request) in the queue according to the methods in steps 2 and 3, otherwise use the subteam as the root queue, repeat the above process until a suitable queue is found and exited

Note: the above queue resource utilization is calculated by dividing the amount of resources already used by the minimum queue resource capacity (configured by the administrator). For a non-leaf queue, the amount of resources used is the sum of the resources used by each sub-queue

Step 2: select the application

After selecting a leaf queue in step 1, Capacity Scheduler sorts the applications in the subqueue according to the commit time (Applition ID is actually used for sorting, the earlier the submission time is, the smaller the Application ID is), and select the Application with the earliest submission to allocate resources

Step 3: select Container (request)

For the same application, the Container it requests may be diverse, involving different priorities, nodes, resources, and quantities. When an application is selected, Capacity Scheduler will try to satisfy the high priority Container first. For the same type of priority, priority is given to the local Container, which in turn chooses the Container of node local, rack local and no local.

4. Capacity Scheduler has two comparators for comparing the size of two resources (for example, comparing whether the amount of resources currently used by users exceeds the set limit). The default is DefaultResourceCalculator, which only considers memory resources. The other is DominantResourceCalculator, which uses the DRF comparison algorithm and considers both memory and CPU resources. The administrator can set the resource comparator through the parameter yarn.scheduler.capacity.resource-calculator

5. Other event handling

APP_REMOVED: the Capacity Scheduler will receive this event in a variety of situations, including the normal end of the application, the killing of the application, and so on. When Capacity Scheduler receives this event, it first sends a RMContainerEventType.KILL event to all outstanding Container to release the Container; in use and then removes the application-related data structures from memory

NODE_ADDED: Capacity Scheduler will receive this event when a node is dynamically added to the cluster (for example, the administrator dynamically expands the size of the cluster or resurrects after the node is disconnected). After receiving the event, Capacity Scheduler only needs to record the NodeManager information in the corresponding data structure and increase the total resources of the system.

NODE_REMOVED: when a node is dynamically removed from the cluster (for example, the administrator removes the node dynamically or the node is removed by ResourceManager without reporting a heartbeat within a certain event), Capacity Scheduler will receive this event. After Capacity Scheduler receives this event, in addition to removing the NodeManager information and reducing the total system resources, it also needs to send a RMContainerEventType.KILL event to all running Container to clear the relevant information.

CONTAINER_EXPIRED: when Capacity Scheduler assigns a Container to ApplicationMaster, ApplicationMaster must use the Container for a certain period of time, otherwise ResourceManager will force recycling, and a CONTAINER_EXPIRED event will be triggered

Thank you for your reading, the above is the content of "how to achieve YARN Capacity Scheduler", after the study of this article, I believe you have a deeper understanding of how to achieve YARN Capacity Scheduler, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.