Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to compare Fair Scheduler and Capacity Scheduler

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

The content of this article mainly focuses on how to compare Fair Scheduler and Capacity Scheduler. The content of the article is clear and easy to understand, and the organization is clear. It is very suitable for beginners to learn and worth reading. Interested friends can read along with Xiaobian. I hope everyone gets something out of this article!

The following provides information about the benefits and performance improvements of selecting Capacity Scheduler, as well as a comparison of features between Fair Scheduler and Capacity Scheduler.

Why do I need a Scheduler?

Cloudera Data Platform (CDP) only supports Capacity Scheduler in YARN clusters.

Prior to the release of CDP, Cloudera customers used one of two schedulers (Fair Scheduler and Capacity Scheduler) depending on the product used (CDH or HDP, respectively).

Converging to a scheduling program in CDP was a difficult choice, but ultimately rooted in our intent to reduce complexity for our customers while helping us focus on future investments. Both schedulers have evolved so much over the years that Fair Scheduler borrows almost all of its functionality from Capacity Scheduler and vice versa. Because of this, we finally decided to place all of your YARN cluster workloads on top of Capacity Scheduler.

Clusters currently using Fair Scheduler must migrate to Capacity Scheduler when migrating to CDP. Cloudera provides tools, documentation, and related help for such migrations.

Benefits of Using Capacity Planning Procedures

Here are some of the benefits of using Capacity Scheduler:

Integrated with Ranger

Node partition/label

Improved Scheduler in cloud native environments, such as better trash packaging, auto-scaling support, etc.

Planned throughput improvements

global scheduling framework

Find multiple nodes at once

Similarity/anti-similarity: Run application X only on those nodes where application Y runs, and vice versa. Do not run application X and application Y on the same node.

scheduler performance improvement

Provides information about global scheduling features and their test results.

Improvements resulting from global planning improvements (YARN-5139)

Before changing the global schedule, the YARN scheduler was in an overall lockdown state and underperforming. Global scheduling greatly improves the YARN scheduler's internal locking structure and threading model. Scheduler can now decouple placement decisions and change internal data structures. This also allows you to find multiple nodes at once that are used by autoscaling and bin-packing policies on the cloud. For more information, see Design and Implementation Notes.

Based on simulations, test results using the global scheduling feature show that:

This is a simulated environment with 20000 nodes and 47000 running applications. For more information about these tests, see the Performance Report.

YARN Community Performance Testing

Microsoft released Hydra: Federated Resource Manager for Data Center Sizing (Carlo et al.) report, focusing on scalability (YARN deployed to over 250,000 nodes, including five large federated clusters with 50,000 nodes each) and higher performance by using Capacity Scheduler scheduling (scheduler per cluster can allocate over 40k containers per second). This is the largest YARN deployment in the world.

We also saw performance data from other companies in the community that was consistent with our results using simulator testing (thousands of container allocations per second for clusters with thousands of nodes).

Disclaimer: The performance numbers discussed above relate to the size of the cluster, the workload running on the cluster, queue structure, health (such as node managers, disks, and networks), container churn, etc. This usually requires fine-tuning the scheduler and other cluster parameters to achieve the desired performance. This is not a guaranteed quantity that can be achieved by using CDP alone.

functional comparison

Over time, the functions of both dispatchers became similar. The table lists the current features and the differences between the two dispatchers.

the functions supported

function list

Capacity Scheduler

Fair Scheduler

comments

queue

hierarchical queue

is

is

Elastic queue capacity for better resource sharing

is

is

Percentage based resource allocation in queues

is

is

Percentage and absolute resource settings cannot be used simultaneously.

Automatic queue creation

is

is

User mapping (user/group to queue mapping)

is

is

CLI / REST API support to manage queues

is

is

Move applications between queues

is

is

Create/delete/modify dynamic queues

is

is

Subscription Support in Queue

is

is

authorized

Authorization control (ACL in queue for commit/admin/admin)

is

is

Third-party ACL control (Ranger)

is

is

application position

Node label support

is

no

Hive Place Integration

is

is

Node attribute support

is

no

Placement constraint support

is

no

Supported constraints are limited in the current implementation.

node location

is

is

position delay control

is

is

User limit quota management

is

is

AM Resource Quota Management

is

is

queue priority

is

no

Managed indirectly through queue weights.

Maximum and minimum allocation limits per container unit

is

is

scheduling

Asynchronous scheduling support

is

is

Implementations vary between schedulers and should not be considered equivalent.

Multiple resource types support (CPU, memory, GPU, etc.)

is

is

Queue sequencing strategy (fair, FIFO, etc.)

is

is

Multiple container assignments per heartbeat

is

is

seize

Inter-queue preemption support

is

is

In-queue preemption support

is

is

reservation based preemption

is

is

Preemption based on queue priority

is

no

Queue weights are considered when making preemptive decisions.

application

support

First-class application concept

is

is

application priority

is

is

application timeout

is

is

Move applications across queues

is

is

High Availability Stateful Application Recovery

is

is

Features in Roadmap

function list

Capacity Scheduler

Fair Scheduler

comments

queue

Absolute resource allocation in queue

is

is

Percentage and absolute resource settings cannot be used simultaneously.

application position

Maximum number of applications

no

is

Managed indirectly through AM resource quotas.

scheduling

Fairness based on application size

no

is

Migrating from Fair Scheduler to Capacity Scheduler

Starting with CDP Private Cloud Base version 7.1, Cloudera offers the fs2cs conversion utility, which is a CLI application and part of the YARN CLI command. This utility facilitates migration from Fair Scheduler to Capacity Scheduler.

Thank you for reading, I believe you have a certain understanding of "how to compare Fair Scheduler and Capacity Scheduler", go to practice quickly, if you want to know more relevant knowledge points, you can pay attention to the website! The editor will continue to bring better articles to everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report