How to understand Yarn Scheduler Scheduler 04/16 Update SLTechnology News&Howtos

How to understand Yarn Scheduler Scheduler

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly introduces "how to understand Yarn Scheduler Scheduler". In daily operation, I believe many people have doubts about how to understand Yarn Scheduler Scheduler. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "how to understand Yarn Scheduler Scheduler"! Next, please follow the editor to study!

Catalogue

1. Introduction of Yarn scheduler

FIFO (first in first out Scheduler)

Capacity (capacity scheduler)

Fair (fair scheduler)

The difference between Fair and Capacity

2.Yarn Scheduler configuration

Fair

Capacity configuration (default configuration)

FIFO

Ideally, our application's request for Yarn resources should be satisfied immediately, but the actual resources are often limited, especially in a very busy cluster, an application resource request often needs to wait a period of time to get the corresponding resources. In Yarn, it is Scheduler that is responsible for allocating resources to applications. In fact, scheduling itself is a difficult problem, it is difficult to find a perfect strategy to solve all application scenarios. To this end, Yarn provides a variety of schedulers and configurable policies for us to choose from. The YARN architecture is as follows:

ResourceManager (RM): responsible for unified management and scheduling of resources on each NM, assigning AM to idle Container to run and monitoring its running status. Allocate the corresponding free Container to the resource request requested by the AM. It is mainly composed of two components: scheduler (Scheduler) and Application Manager (Applications Manager).

Scheduler: the scheduler allocates resources in the system to running applications according to capacity, queues and other constraints (for example, each queue allocates certain resources, up to a certain number of jobs, etc.). The scheduler allocates resources only according to the resource requirements of each application, and the resource allocation unit is Container, thus limiting the amount of resources used by each task. Scheduler is not responsible for monitoring or tracking the status of the application, nor is it responsible for restarting tasks for various reasons (ApplicationMaster is responsible for it). In short, the scheduler allocates resources encapsulated in Container to the application according to the resource requirements of the application and the resources of the cluster machine. The scheduler is pluggable, such as CapacityScheduler, FairScheduler. (PS: in practical applications, it only needs to be configured simply)

Application Manager (Application Manager): the Application Manager is responsible for managing all applications in the entire system, including application submission, negotiating resources with the scheduler to start AM, monitoring the running status of AM and restarting in case of failure, etc., and tracking the progress and status of assigned Container is also its responsibility. ApplicationMaster is an application framework, which is responsible for coordinating resources to ResourceManager and working with NodeManager to complete the implementation and monitoring of Task. MapReduce is a natively supported framework that runs Mapreduce jobs on YARN. Many distributed applications have developed corresponding application frameworks for running tasks on YARN, such as Spark,Storm and so on. If necessary, we can also write a YARN application that conforms to the specification.

NodeManager (NM): NM is the resource and task manager on each node. It periodically reports to RM the resource usage on this node and the running status of each Container; at the same time, it receives and processes requests such as Container start / stop from AM. ApplicationMaster (AM): all applications submitted by users contain an AM, which is responsible for monitoring the application, tracking the execution status of the application, restarting failed tasks, etc.

Container: resource abstraction in YARN, which encapsulates multi-dimensional resources on a node, such as memory, CPU, disk, network, etc. When AM applies for resources from RM, the resources returned by RM for AM are represented by Container. YARN assigns a Container to each task and the task can only use the resources described in that Container.

1. Introduction of Yarn scheduler

1.1. FIFO (first in first out Scheduler)

FIFO Scheduler arranges applications into a queue in the order of submission, which is a first-in, first-out queue. When allocating resources, resources are first allocated to the top application in the queue, and then to the next one after the top application needs are met, and so on. FIFO Scheduler is the simplest and easiest to understand scheduler and does not require any configuration, but it is not suitable for shared clusters. Large applications may take up all cluster resources, which causes other applications to be blocked. In a shared cluster, it is more appropriate to use Capacity Scheduler or Fair Scheduler, both of which allow large and small tasks to be submitted with certain system resources at the same time. The following "Yarn scheduler comparison diagram" shows the difference between these schedulers, from which you can see that in FIFO scheduler, small tasks are blocked by large tasks.

1.2. Capacity (capacity scheduler)

Resource Scheduler configured by default in yarn-site.xml. For Capacity scheduler, there is a special queue for running small tasks, but setting a special queue for small tasks takes up a certain amount of cluster resources in advance, which causes the execution time of large tasks to lag behind that of using FIFO scheduler. With this resource scheduler, you can configure yarn resource queues, which will be used later.

1.3. Fair (fair scheduler)

The design goal of the Fair scheduler is to allocate fair resources to all applications (the definition of fairness can be set by parameters). The Yarn scheduler comparison diagram above shows fair scheduling of two applications in a queue; of course, fair scheduling can also work between multiple queues. For example, suppose there are two users An and B, each with a queue. When A starts a job and B has no tasks, A will get all the cluster resources; when B starts a job, A's job will continue to run, but after a while the two tasks will each get half of the cluster resources. If B starts the second job at this time and the other job is still running, it will share the resources of the queue B with the first job of B, that is, the two job of B will be used for 1/4 of the cluster resources, while the job of A will still be used for half of the cluster resources, and the result is that the resources will eventually be shared equally between the two users. In the Fair scheduler, we do not need to occupy a certain amount of system resources in advance, and the Fair scheduler dynamically adjusts system resources for all running job. When the first large job is submitted, only this job is running, and it gets all the cluster resources; when the second small task is submitted, the Fair scheduler allocates half of the resources to the small task, allowing the two tasks to share cluster resources fairly.

A) Fair scheduler, which is able to share the resources of the entire cluster

B) No pre-occupation of resources, each job is shared

C) whenever an assignment is submitted, it takes up the entire resource. If you submit another job, the first job will be allocated part of the resources to the second job, and the first job will release part of the resources. The same is true when submitting other assignments. In other words, every homework comes in, there is a chance to get resources.

1.4. The difference between Fair Scheduler and Capacity Scheduler

Fair sharing of resources: in each queue, Fair Scheduler can choose to allocate resources to the application according to FIFO, Fair, or DRF policies. The Fair policy is evenly distributed, and by default, each queue allocates resources in this way

Support resource preemption: when there are remaining resources in a queue, the scheduler shares those resources with other queues, and when a new application is submitted in that queue, the scheduler reclaims resources for it. In order to reduce unnecessary computing waste as much as possible, the scheduler adopts the strategy of waiting before forced recovery, that is, if there are unreturned resources after waiting for a certain period of time, the resources will be preempted; kill some tasks from the queues that overuse the resources, and then release the resources

Load balancing: Fair Scheduler provides a load balancing mechanism based on the number of tasks, which distributes the tasks in the system evenly to each node as far as possible. In addition, users can also design load balancing mechanism according to their own needs.

Flexible configuration of scheduling policies: Fiar Scheduler allows administrators to set scheduling policies separately for each queue (currently supports FIFO, Fair or DRF)

Improve the response time of applets: due to the maximum and minimum fairness algorithm, small jobs can quickly get resources and run them.

2.Yarn Scheduler configuration

The yarn resource scheduler is configured in yarn-site.xml.

2.1. Fair Scheduler

The configuration options for Fair Scheduler include two parts:

Part of it is in yarn-site.xml, which is mainly used to configure parameters at the scheduler level.

Part of it is in a custom configuration file (default is fair-scheduler.xml), which is mainly used to configure the resource quantity, weight and other information of each queue.

2.1.1 yarn-site.xml

Yarn-site.xml introduction

Scheduler plug-in class name used by yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler to configure Yarn FairScheduler corresponds to: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler yarn.scheduler.fair.allocation.file / etc/hadoop/conf/fair-scheduler.xml configuration resource pool and its attribute quota XML file path (local path) yarn.scheduler.fair.preemption true enables resource preemption, default is True yarn.scheduler.fair.user-as-default-queue true is set to true When no resource pool is specified in the task, the user name is used as the resource pool name. This configuration enables the automatic allocation of resource pools based on user names. Whether default is True yarn.scheduler.fair.allow-undeclared-pools false allows the creation of undefined resource pools. If set to true,yarn, the undefined resource pool specified in the task will be automatically created. When set to false, the undefined resource pool specified in the task is invalid and the task is assigned to the default resource pool. , default is True

2.1.2 fair-scheduler.xml

Suppose that in the production environment Yarn, there are four types of users who need to use clusters: production, spark, default, and streaming. In order to keep the tasks submitted unaffected, we plan to configure four resource pools on Yarn, which are called production,spark,default,streaming. According to the actual business situation, the corresponding resources and priorities are allocated to each resource pool, and default is used for development and testing purposes.

The fair-scheduler.xml configuration on ResourceManager is as follows:

8192 fair 8vcores 419840 mb125vcores 60 fair 7.5 * production 8192 mbBBI 8vcores 376480mbmMIT 110vcores 50 fair 1 * spark 8192mb 8vcores 202400mbmin20 vcores 20 FIFO 0.5 * * 8192 mbpje 8vcores 69120mb 16vcores 20 fair * 1 streaming 100 10 50

Parameter description:

MinResources: minimum resource guarantee. Set the format to "X mb, Y vcores". When the minimum resource guarantee of a queue is not satisfied, it will obtain resources first than other peer queues. For different scheduling policies (described in more detail later), the meaning of the minimum resource guarantee is different. For fair policy, only memory resources are considered, that is, if the memory resources used by a queue exceed its minimum resources. Think that it has been satisfied. For the drf policy, consider the amount of resources used by the main resource, that is, if the amount of the main resource of a queue exceeds its minimum, it is considered to be satisfied.

MaxResources: the maximum amount of resources that can be used. Fair scheduler ensures that the amount of resources used by each queue will not exceed the maximum available resources of the queue.

MaxRunningApps: the maximum number of applications running at the same time. By limiting this number, the intermediate output produced when the excess Map Task is running at the same time can be prevented from exploding the disk.

Weight: resource pool weight, which is mainly used when sharing resources. The larger the weight, the more resources you will get. For example, if there is 20GB memory in a pool that cannot be used, it can be shared with other pool. The amount of each other pool is determined by the weight.

AclSubmitApps: a list of Linux users or user groups that can submit an application to the queue, which is "*" by default, indicating that any user can submit the application to the queue. It is important to note that this property is inherited, that is, the list of child queues inherits the list of parent queues. When you configure this property, you use "," to split between users or user groups, and spaces between users and user groups, such as "user1, user2 group1,group2".

AclAdministerApps: user names and groups that allow you to manage tasks; the administrator of a queue can manage resources and applications in that queue, such as killing arbitrary applications.

MinSharePreemptionTimeout: the minimum shared quantity preempts the time. If the amount of resources used by a resource pool during that time has been lower than the minimum amount of resources, it begins to preempt resources.

SchedulingMode/schedulingPolicy: the scheduling mode used by the queue, which can be fifo, fair, or drf.

Administrators can also add maxRunningJobs attributes to individual users to limit the maximum number of applications they can run at the same time. In addition, the administrator can set the default values for the above properties with the following parameters:

UserMaxJobsDefault: the default value of the user's maxRunningJobs attribute.

DefaultMinSharePreemptionTimeout: the default value of the minSharePreemptionTimeout property of the queue.

DefaultPoolSchedulingMode: the default value of the schedulingMode property of the queue.

FairSharePreemptionTimeout: fair sharing takes up time. If the amount of resources used by a resource pool during that time has been less than half of the amount of fair sharing, it begins to preempt resources.

In this way, when users under each user group submit tasks, they will go to the corresponding resource pool without affecting other businesses. The hierarchy of queues is through nesting

Element. All queues are children of the root queue, even if they are not assigned to the element. Queues in the Fair scheduler have a weight attribute (this weight is the definition of fairness) and use this attribute as the basis for fair scheduling. In this example, it is considered fair when the scheduler allocates resources from cluster 7.5 to production,spark,streaming,default, where the weight is not a percentage. Note that queues that are automatically created by user when there is no profile still have a weight and a weight value of 1. There can still be different scheduling policies within each queue. The default scheduling policy for queues can be configured through top-level elements, and fair scheduling is used by default if it is not configured. Although it is a Fair scheduler, it still supports FIFO scheduling at the queue level. The scheduling policy of each queue can be overridden by its internal elements. In the above example, the default queue is specified to be scheduled using fifo, so tasks submitted to the default queue can be executed in FIFO order. It should be noted that scheduling between spark,production,streaming,default is still fair scheduling. Each queue can be configured with the maximum and minimum resource usage and the maximum number of applications that can be run.

The Fair scheduler uses a rule-based system to determine which queue the application should be placed on. In the above example, the element defines a list of rules in which each rule is tried one by one until the match is successful. For example, specified, the first rule in the above example, will put the application in the queue specified by it. If the application does not specify a queue name or the queue name does not exist, it does not match this rule, and then try the next rule. The primaryGroup rule attempts to put the application in a queue named after the user's Unix group name. If there is no queue, try the next rule instead of creating the queue. When all the previous rules are not satisfied, the default rule is triggered and the application is placed in the default queue.

Of course, we don't need to configure queuePlacementPolicy rules, but the scheduler defaults to the following rules:

The above rule means that unless the queue is accurately defined, the queue will be created with the user name as the queue name. There is also a simple configuration strategy that allows all applications to be placed in the same queue (default) so that clusters can be shared equally among all applications rather than among users. This configuration is defined as follows:

To achieve the above function, we can also set yarn.scheduler.fair.user-as-default-queue=false directly without using the configuration file, so that the application will be placed in the default queue instead of each user name queue. In addition, we can set up yarn.scheduler.fair.allow-undeclared-pools=false so that users cannot create queues.

When a job is submitted to an empty queue in a busy cluster, the job does not execute immediately, but blocks until the running job releases system resources. To make the execution time for submitting the job more predictable (you can set the timeout for waiting), the Fair scheduler supports preemption. Preemption is to allow the scheduler to kill containers that take up more than its share of resource queues, and these containers resources can be allocated to queues that should enjoy these share of resources. It should be noted that preemption reduces the execution efficiency of the cluster because the terminated containers needs to be reexecuted. You can enable preemption by setting a global parameter, yarn.scheduler.fair.preemption=true. In addition, there are two parameters to control the expiration time of preemption (these two parameters are not configured by default, at least one needs to be configured to allow preemption of Container):

MinSharePreemptionTimeout fairSharePreemptionTimeout

If the queue does not receive minimum resource security within the time specified by the minimum share preemption timeout, the scheduler preempts the containers. We can configure this timeout for all queues through the top-level element in the configuration file; we can also configure elements within the element to specify a timeout for a queue.

Similarly, if the queue does not receive half of the equal resources within the time specified by fair share preemption timeout (this ratio can be configured), the scheduler will preempt containers. This timeout can be configured for all queues and a queue through top-level and element-level elements, respectively. The ratio mentioned above can be configured by (configuring all queues) and (configuring a queue), and the default is 0.5.

It should be noted that the correspondence between users and user groups submitted by the client needs to be maintained on ResourceManager. When ResourceManager allocates a resource pool, it reads the correspondence between users and user groups from ResourceManager, otherwise it will be assigned to the default resource pool. A warning like "UserGroupInformation: No groups available for user" appears in the log. On the other hand, the user group corresponding to the user on the client machine does not matter.

Each time you add a new user or adjust the resource pool quota on ResourceManager, you need to refresh it to take effect.

Yarn rmadmin-refreshQueues yarn rmadmin-refreshUserToGroupsMappings

Dynamic updates only support modifying resource pool quotas. If you add or decrease resource pools, you need to restart the Yarn cluster.

The configuration and usage of each resource pool in Fair Scheduer can also be found on the WEB monitoring page of ResourceManager: http://ResourceManagerHost:8088/cluster/scheduler

2.2 Capacity configuration

Hadoop2.7 uses the Capacity Scheduler capacity Scheduler by default

Yarn-site.xml

Yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.capacity.CapacityScheduler

The Capacity scheduler allows multiple organizations to share the entire cluster, and each organization can gain some of the computing power of the cluster. By assigning dedicated queues to each organization, and then allocating certain cluster resources to each queue, the entire cluster can provide services to multiple organizations by setting up multiple queues. In addition, the queue can be divided vertically, so that multiple members within an organization can share the queue resources. Within a queue, resources are scheduled using a first-in-first-out (FIFO) strategy.

One job may not be able to use the resources of the entire queue. However, if there are more than one job running in the queue, and if the queue has enough resources, it is allocated to those job. What if the queue does not have enough resources? In fact, the Capacity scheduler may still allocate additional resources to this queue, which is the concept of "resilient queue" (queue elasticity).

In normal operation, the Capacity scheduler does not force the release of Container, and when one queue does not have enough resources, the queue can only get Container resources after other queues have been released. Of course, we can set a maximum resource usage for the queue, so that the queue will not take up too much free resources, so that other queues cannot use these free resources. This is where the "flexible queue" needs to be weighed.

Suppose we have the following levels of queues:

Root ├── prod └── dev ├── eng └── science

Here is a simple configuration file for the Capacity scheduler, named capacity-scheduler.xml. In this configuration, two subqueues, prod and dev, are defined under the root queue, accounting for 40% and 60% of the capacity, respectively. It should be noted that a queue is configured through the property yarn.sheduler.capacity.. Specified represents the inheritance tree of the queue, such as the root.prod queue, which generally refers to capacity and maximum-capacity.

Yarn.scheduler.capacity.root.queues (/ & eae) prod,dev yarn.scheduler.capacity.root.dev.queues eng Science yarn.scheduler.capacity.root.prod.capacity 40 yarn.scheduler.capacity.root.dev.capacity 60 yarn.scheduler.capacity.root.dev.maximuin-capacity 75 yarn.scheduler.capacity.root.dev.eng.capacity 50 yarn.scheduler.capacity.root.dev.science.capacity 50

We can see that the dev queue is divided into two sub-queues of the same capacity, eng and science. The maximum-capacity property of dev is set to 75%, so even if the prod queue is completely idle, dev will not consume all cluster resources, that is, the prod queue still has 25% of the resources available for emergencies. We notice that the eng and science queues do not have the maximum-capacity attribute set, which means that the job in the eng or science queue may use all the resources of the entire dev queue (up to 75% of the cluster). Similarly, prod may take up all the resources of the cluster because it does not set the maximum-capacity property. In addition to configuring the queue and its capacity, the Capacity container can also configure the maximum number of resources that a user or application can allocate, how many applications can run at the same time, the ACL authentication of the queue, and so on.

With regard to the queue setting, it depends on our specific application. For example, in MapReduce, we can specify the queue to use through the mapreduce.job.queuename attribute. If the queue does not exist, we will receive an error when submitting the task. If we do not define any queues, all applications will be placed in a default queue.

Note: for the Capacity scheduler, our queue name must be the last part of the queue tree, which will not be recognized if we use the queue tree. For example, in the above configuration, it is OK for us to use prod and eng as queue names, but it is not valid if we use root.dev.eng or dev.eng.

2.3 FIFO Scheduler

Yarn-site.xml file

Yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.fifo.FifoScheduler at this point, the study on "how to understand the Yarn scheduler Scheduler" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.