In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
The content of this article mainly focuses on how to compare Fair Scheduler and Capacity Scheduler. The content of the article is clear and easy to understand, and the organization is clear. It is very suitable for beginners to learn and worth reading. Interested friends can read along with Xiaobian. I hope everyone gets something out of this article!
The following provides information about the benefits and performance improvements of selecting Capacity Scheduler, as well as a comparison of features between Fair Scheduler and Capacity Scheduler.
Why do I need a Scheduler?
Cloudera Data Platform (CDP) only supports Capacity Scheduler in YARN clusters.
Prior to the release of CDP, Cloudera customers used one of two schedulers (Fair Scheduler and Capacity Scheduler) depending on the product used (CDH or HDP, respectively).
Converging to a scheduling program in CDP was a difficult choice, but ultimately rooted in our intent to reduce complexity for our customers while helping us focus on future investments. Both schedulers have evolved so much over the years that Fair Scheduler borrows almost all of its functionality from Capacity Scheduler and vice versa. Because of this, we finally decided to place all of your YARN cluster workloads on top of Capacity Scheduler.
Clusters currently using Fair Scheduler must migrate to Capacity Scheduler when migrating to CDP. Cloudera provides tools, documentation, and related help for such migrations.
Benefits of Using Capacity Planning Procedures
Here are some of the benefits of using Capacity Scheduler:
Integrated with Ranger
Node partition/label
Improved Scheduler in cloud native environments, such as better trash packaging, auto-scaling support, etc.
Planned throughput improvements
global scheduling framework
Find multiple nodes at once
Similarity/anti-similarity: Run application X only on those nodes where application Y runs, and vice versa. Do not run application X and application Y on the same node.
scheduler performance improvement
Provides information about global scheduling features and their test results.
Improvements resulting from global planning improvements (YARN-5139)
Before changing the global schedule, the YARN scheduler was in an overall lockdown state and underperforming. Global scheduling greatly improves the YARN scheduler's internal locking structure and threading model. Scheduler can now decouple placement decisions and change internal data structures. This also allows you to find multiple nodes at once that are used by autoscaling and bin-packing policies on the cloud. For more information, see Design and Implementation Notes.
Based on simulations, test results using the global scheduling feature show that:
This is a simulated environment with 20000 nodes and 47000 running applications. For more information about these tests, see the Performance Report.
YARN Community Performance Testing
Microsoft released Hydra: Federated Resource Manager for Data Center Sizing (Carlo et al.) report, focusing on scalability (YARN deployed to over 250,000 nodes, including five large federated clusters with 50,000 nodes each) and higher performance by using Capacity Scheduler scheduling (scheduler per cluster can allocate over 40k containers per second). This is the largest YARN deployment in the world.
We also saw performance data from other companies in the community that was consistent with our results using simulator testing (thousands of container allocations per second for clusters with thousands of nodes).
Disclaimer: The performance numbers discussed above relate to the size of the cluster, the workload running on the cluster, queue structure, health (such as node managers, disks, and networks), container churn, etc. This usually requires fine-tuning the scheduler and other cluster parameters to achieve the desired performance. This is not a guaranteed quantity that can be achieved by using CDP alone.
functional comparison
Over time, the functions of both dispatchers became similar. The table lists the current features and the differences between the two dispatchers.
the functions supported
function list
Capacity Scheduler
Fair Scheduler
comments
queue
hierarchical queue
is
is
Elastic queue capacity for better resource sharing
is
is
Percentage based resource allocation in queues
is
is
Percentage and absolute resource settings cannot be used simultaneously.
Automatic queue creation
is
is
User mapping (user/group to queue mapping)
is
is
CLI / REST API support to manage queues
is
is
Move applications between queues
is
is
Create/delete/modify dynamic queues
is
is
Subscription Support in Queue
is
is
authorized
Authorization control (ACL in queue for commit/admin/admin)
is
is
Third-party ACL control (Ranger)
is
is
application position
Node label support
is
no
Hive Place Integration
is
is
Node attribute support
is
no
Placement constraint support
is
no
Supported constraints are limited in the current implementation.
node location
is
is
position delay control
is
is
User limit quota management
is
is
AM Resource Quota Management
is
is
queue priority
is
no
Managed indirectly through queue weights.
Maximum and minimum allocation limits per container unit
is
is
scheduling
Asynchronous scheduling support
is
is
Implementations vary between schedulers and should not be considered equivalent.
Multiple resource types support (CPU, memory, GPU, etc.)
is
is
Queue sequencing strategy (fair, FIFO, etc.)
is
is
Multiple container assignments per heartbeat
is
is
seize
Inter-queue preemption support
is
is
In-queue preemption support
is
is
reservation based preemption
is
is
Preemption based on queue priority
is
no
Queue weights are considered when making preemptive decisions.
application
support
First-class application concept
is
is
application priority
is
is
application timeout
is
is
Move applications across queues
is
is
High Availability Stateful Application Recovery
is
is
Features in Roadmap
function list
Capacity Scheduler
Fair Scheduler
comments
queue
Absolute resource allocation in queue
is
is
Percentage and absolute resource settings cannot be used simultaneously.
application position
Maximum number of applications
no
is
Managed indirectly through AM resource quotas.
scheduling
Fairness based on application size
no
is
Migrating from Fair Scheduler to Capacity Scheduler
Starting with CDP Private Cloud Base version 7.1, Cloudera offers the fs2cs conversion utility, which is a CLI application and part of the YARN CLI command. This utility facilitates migration from Fair Scheduler to Capacity Scheduler.
Thank you for reading, I believe you have a certain understanding of "how to compare Fair Scheduler and Capacity Scheduler", go to practice quickly, if you want to know more relevant knowledge points, you can pay attention to the website! The editor will continue to bring better articles to everyone!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.