Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Capacity Scheduler of YARN Resource scheduling Strategy

2025-01-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Background

By default, yarn uses the simplest FIFO scheduler, that is, a default queue, which is shared by all users, and resources are allocated on a first-come-first-served basis without priority. Sometimes all resources are taken up by one or two tasks, and hunger is caused by the lack of resources for other tasks. It is obvious that such an allocation of resources is unreasonable (in today's socialism, we should achieve common prosperity). Yarn also has two resource schedulers, capacity schedule and fair schedule. This paper focuses on capacity schedule.

What is capacity schedule?

The Capacity Schedule scheduler divides resources in queues. To put it simply, it means that each queue has independent resources, and the structure and resources of the queue can be configured, as shown below:

Default queues account for 30% of resources, analyst and dev account for 40% and 30% of resources, respectively; similarly, analyst and dev each have two child queues, which reassign resources on the basis of the parent queue.

Queues organize resources in a hierarchical manner, and multi-level resource constraints are designed to better allow multiple users to share a Hadoop cluster, such as queue resource restrictions, user resource restrictions, and the number of user applications. The applications in the queue are scheduled in the way of FIFO, and each queue can set a certain proportion of the minimum guarantee and upper limit of resource use. at the same time, each user can also set a certain upper limit of resource usage to prevent resource abuse. When there are remaining resources in one queue, the remaining resources can be temporarily shared with other queues.

Characteristics

The Capacity scheduler has the following features:

● hierarchical queue design, which ensures that child queues can use all the resources set by the parent queue. In this way, through hierarchical management, it is easier to reasonably allocate and limit the use of resources.

● capacity guarantees that a resource share is set on the queue, which ensures that each queue does not occupy the resources of the entire cluster.

● is secure, and each queue has strict access control. Users can only submit tasks to their own queues, and cannot modify or access tasks from other queues.

● is allocated flexibly, and free resources can be allocated to any queue. When there is contention among multiple queues, it will be balanced proportionally.

● multi-tenant lease, through the capacity limit of the queue, multiple users can share the same cluster, while ensuring that each queue is allocated to its own capacity to improve utilization.

● operability, yarn supports dynamic modification to adjust the allocation of capacity, permissions, etc., which can be modified directly at run time. It also provides the administrator interface to display the current queue status. An administrator can add a queue at run time, but cannot delete a queue. The administrator can also pause a queue at run time to ensure that the cluster will not receive other tasks during the execution of the current queue. If a queue is set to stopped, tasks cannot be submitted to him or to the subqueue.

● is based on resource scheduling and coordinates applications with different resource requirements, such as memory, CPU, disk, and so on.

Configure to turn on the scheduler

Configure the scheduler to be used in ResourceManager by modifying conf/yarn-site.xml and setting properties:

Yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler configuration queue

The core of the scheduler is the allocation and use of queues. Queues can be configured by modifying conf/capacity-scheduler.xml.

The Capacity scheduler has a predefined queue, root, by default, and all queues are its subqueues. Queue allocation supports hierarchical configuration, using. To segment, such as yarn.scheduler.capacity..queues

The following is a sample configuration. For example, there are three subqueues under root:

Yarn.scheduler.capacity.root.queues a dint bjorc The queues at the this level (root is the root queue). Yarn.scheduler.capacity.root.a.queues A1 Magna A2 The queues at the this level (root is the root queue). Yarn.scheduler.capacity.root.b.queues b1 and b2 and b3 The queues at the this level (root is the root queue). Queue Properties yarn.scheduler.capacity..capacity

It is the percentage of resource capacity of the queue. When the system is busy, each queue should get a set amount of resources; when the system is idle, the resources of the queue can be used by other queues. All queues on the same layer must add up to 100%.

Yarn.scheduler.capacity..maximum-capacity

The upper limit for the use of queue resources. Because the queue can use other idle resources when the system is idle, the amount of resources most used is controlled by this parameter. The default is-1, which is disabled.

Yarn.scheduler.capacity..minimum-user-limit-percent

The minimum resource consumed by each task. For example, you set it to 25%. So if two users submit tasks, the resources for each task do not exceed 50%. If 3 users submit tasks, the resources for each task are no more than 33%. If 4 users submit tasks, the resources for each task do not exceed 25%. If five users submit the task, the fifth user will have to wait before they can submit. The default is 100, which means no restrictions.

Yarn.scheduler.capacity..user-limit-factor

The proportion of queue resources most used by each user, if set to 50. 0. Then the maximum amount of resources used by each user is 50%.

Restrictions on running and submitting applications yarn.scheduler.capacity.maximum-applications / yarn.scheduler.capacity..maximum-applications

Sets the number of applications in the system that can run and wait at the same time. The default is 10000.

Yarn.scheduler.capacity.maximum-am-resource-percent / yarn.scheduler.capacity..maximum-am-resource-percent

Sets how many resources can be used to run app master, that is, applications that control the current active state. The default is 10%.

Queue Management yarn.scheduler.capacity..state

The status of the queue, you can make RUNNING or STOPPED. If the queue is in STOPPED state, the new application will not be submitted to that queue or subqueue. Similarly, if root is set to STOPPED, the entire cluster will not be able to submit tasks. Existing applications can wait for completion, so queues can be gracefully exited and closed.

Yarn.scheduler.capacity.root..acl_submit_applications

The access control list ACL controls who can submit tasks to the queue. If a user can submit to the queue, they can also submit tasks to its subqueues.

Yarn.scheduler.capacity.root..acl_administer_queue

Set the ACL control of the queue administrator, and the administrator can control all applications of the queue. Similarly, it also has inheritance.

Note: ACL is set to user1,user2 group1,group2. If so, it represents anyone. The space means that no one is allowed. The default is.

Other properties yarn.scheduler.capacity.resource-calculator

The resource calculation method, which defaults to org.apache.hadoop.yarn.util.resource.DefaultResourseCalculator, only calculates memory. DominantResourceCalculator calculates memory and CPU.

Yarn.scheduler.capacity.node-locality-delay

The number of times the scheduler attempted to schedule. It is generally related to the number of nodes in the cluster. Default 40 (number of nodes on a rack)

Once you have set these queue properties, you can see them on web ui. You can access the following connections:

Xxx:8088/scheduler

Modify queue configuration

If you want to modify the configuration of the queue or scheduler, you can modify the

Vi $HADOOP_CONF_DIR/capacity-scheduler.xml

After the modification is complete, you need to execute the following command:

$HADOOP_YARN_HOME/bin/yarn rmadmin-refreshQueues

Note:

Queues cannot be deleted, they can only be added.

Update queue configuration needs to be a valid value

Queue capacity limits at the same level need to add up to 100%.

If you want your task to be scheduled to the queue1 queue, you only need to specify it when you start the task: the mapreduce.job.queuename parameter is queue1. The default is default queue.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report