Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the characteristics of hadoop Yarn scheduler Scheduler

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what are the characteristics of hadoop Yarn scheduler Scheduler". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn what are the characteristics of hadoop Yarn scheduler Scheduler.

Overview

Cluster resources are very limited. In a multi-user and multi-task environment, a coordinator is needed to ensure orderly scheduling of tasks under limited resources or business constraints. YARN resource scheduler is this coordinator.

There are many implementations of YARN scheduler, and the built-in schedulers are Capacity Scheduler and Fair Scheduler. YARN resource scheduler implements Resource Scheduler interface and is a plug-in component. Users can use different schedulers by configuring parameters, or they can write new resource schedulers according to interface specifications. By default, YARN uses the Capacity Scheduler scheduler.

Introduction to Capacity SchedulerCapacity Scheduler

Capacity Scheduler (Computing Power Scheduler) is contributed by Yahoo, which mainly solves the problem of HOD (Hadoop On Demand) function proposed in HADOOP-3421, which overcomes the inefficiency of existing HOD. It is suitable for schedulers in environments where multiple users share clusters. In the case of multi-users, the purpose of maximizing the throughput and utilization of the cluster is achieved.

The Capacity scheduler allows multiple organizations to share the entire cluster, and each organization can gain some of the computing power of the cluster. By assigning dedicated queues to each organization, and then allocating certain cluster resources to each queue, the entire cluster can provide services to multiple organizations by setting up multiple queues. In addition, the queue can be divided vertically, so that multiple members within an organization can share the queue resources. Within a queue, resources are scheduled using a first-in-first-out (FIFO) strategy.

One job may not be able to use the resources of the entire queue. However, if there are more than one job running in the queue, and if the queue has enough resources, it is allocated to those job. What if the queue does not have enough resources? In fact, the Capacity scheduler may still allocate additional resources to this queue, which is the concept of resilient queues (queue elasticity).

In normal operation, the Capacity scheduler does not force the release of Container, and when one queue does not have enough resources, the queue can only get Container resources after other queues have been released. Of course, we can set a maximum resource usage for the queue so that the queue does not take up too much free resources, resulting in other queues unable to use these free resources. This is where flexible queues need to be weighed.

Characteristics of Capacity Scheduler

Capacity guarantee: each queue allocates part of the capacity, and they can control part of the resources. Applications that are submitted to a specific queue can use the resources of that queue. Administrators can configure a minimum guarantee for each queue capacity and an upper limit for resource usage.

Security: each queue has a strict ACL (Control access list), which controls users to submit applications to specific queues. At the same time, ensure that users can not view or modify applications submitted by other users, and queue administrators and cluster system administrators can maintain them.

Flexibility: the free resources of the queue can be allocated to other queues. If the resource allocation of a queue does not reach the upper limit of queue resource usage, when it needs more resources, the free resources of other queues will be allocated to the busy queue.

Multi-user: supports multi-user sharing of clusters, and the comprehensive settings of some columns prevent a single application, user, or queue from monopolizing all resources in the queue or cluster.

Operability: support for run-time configuration and queue stopping. The properties of the queue (for example, resource capacity allocation, ACL, etc.) can be changed by the administrator in a secure manner at run time, thus reducing the impact on users. It also provides an interface for administrators and users to view the current usage of queue resources. The administrator can add a new queue while the cluster is running, and can ensure that the tasks on the queue are completed while the running queue is stopped, while the new tasks cannot be submitted to the queue. Note that deleting queues at run time is not supported. If you need to delete queues, you need to restart the cluster.

Hierarchical queues: hierarchical queues ensure that resources are shared among the organization's subqueues, providing more controllability and predictability.

Resource-based scheduling: supports resource-intensive applications, allowing applications to use more resources than the default, so that the scheduler can support applications with different resource requirements. Currently, only the configuration of memory resources is supported, and CPU resources can be supported through configuration.

Fair Scheduler

Fair Scheduler, contributed by Facebook, is a pluggable scheduler on Hadoop that allows YARN applications to share resources fairly on a large cluster.

Fair scheduling is a method of allocating resources to applications, which emphasizes the fair use of resources by users in the case of multiple users. By default, Fair Scheduler schedules applications fairly according to memory resources, and can be modified to schedule based on memory and CPU resources through configuration. When only one application in the cluster is running, the application takes up the cluster resources. When other applications are submitted, those released resources will be allocated to the new application, so each application will eventually get almost the same amount of resources.

In Fair Scheduler, there is no need to occupy a certain amount of system resources in advance, Fair Scheduler will dynamically adjust the resource allocation of the application. For example, when the first large job is committed, only this job is running, and it gets all the cluster resources; when the second small task is submitted, the Fair scheduler allocates half of the resources to the small task, allowing the two tasks to share cluster resources fairly.

It is important to note that in the following figure Fair Scheduler, there is a delay from the second task submission to the acquisition of resources because it needs to wait for the first task to release the occupied Container. After the completion of the execution of the small task, it will release the resources occupied by itself, and the large task will get all the system resources.

Fair Scheduler enables applications to be organized as queues, where resources are shared fairly. By default, all users share a queue. If the application specifies a queue when requesting resources, the request will be submitted to the specified queue. Queues can also be assigned according to the user name through configuration. Within each queue, the application shares resources fairly or FIFO based on memory.

For example, suppose there are two users An and B, each with a queue. When A starts a job and B has no tasks, A will get all the cluster resources; when B starts a job, A's job will continue to run, but after a while the two tasks will each get half of the cluster resources. If B starts the second job at this time and the other job is still running, it will share the resources of the queue B with the first job of B, that is, the two job of B will be used for 1/4 of the cluster resources, while the job of A will still be used for half of the cluster resources, and the result is that the resources will eventually be shared equally between the two users. The process is shown in the following figure:

Fair Scheduler allows queues to be allocated with a minimum amount of shared resources, which ensures that certain users, groups, or applications always have access to adequate resources. When there is a running application in a queue, it can at least get the minimum resources set, and when there are no tasks in the queue, its resources will be split to other running tasks.

Fair Scheudler allows all tasks to run by default, but this can also limit the number of tasks running under each user and queue through configuration files. When there is a limit, the newly committed task does not fail to commit, but waits in the Scheduler queue until the previous task ends before it is executed.

Fair Scheduler vs Capacity Scheduler

Identical point

All support multi-user and multi-queue, that is, the application environment suitable for multi-user shared cluster.

All support hierarchical queues

Support dynamic configuration modification to better ensure the stable operation of the cluster.

Both support resource sharing, that is, when there are remaining resources in a queue, they can be shared with other queues that lack resources.

A single queue supports priority and FIFO scheduling.

Differences

The biggest difference between Capacity Scheduler and Fair Scheduler is the difference in scheduling policies.

The scheduling strategy of Capacity Scheduler is that you can first select the queue with low resource utilization, and then schedule it through FIFO or DRF in the queue.

The scheduling strategy of Fair Scheduler is that you can select a queue using a fair sorting algorithm, and then schedule it in the queue by Fair (default), FIFO, or DRF.

Thank you for your reading, the above is the content of "what are the characteristics of hadoop Yarn scheduler Scheduler". After the study of this article, I believe you have a deeper understanding of what the characteristics of hadoop Yarn scheduler Scheduler are, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report