How Mesos and YARN work together 07/13 Update SLTechnology News&Howtos

How Mesos and YARN work together

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how Mesos and YARN work together". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn how Mesos and YARN work together.

After Hadoop 2.0, the management of cluster resources is extracted from the JobTracker of MapReduce v1 and implemented in YARN. Although YARN supports many different computing frameworks, it still does not solve the problem of flexible scaling of cluster resources. This paper introduces a new project-Myriad, which combines the advantages of YARN and Mesos, which not only makes the operation and use of YARN more flexible, but also makes the expansion of the entire data center easier.

This is a story about two clusters. The first is the Apache Hadoop cluster, where resources are completely isolated from Hadoop and processes. The other cluster is a description of all the resources that are not part of the Hadoop cluster. The difference between the two clusters in this way is because Hadoop manages its own resources through Apache YARN (Yet Another Resource Negotiator). For Hadoop, these resources are often underutilized without big data tasks in the queue. When a big data task runs, these resources are quickly used to the limit, and more resources are requested. This is quite difficult for the first type of Hadoop cluster.

Figure 1-standalone Cluster-Source: Mesosphere and MapR, intrusion and deletion.

Although Hadoop intends to remove data barriers, while removing some barriers, other types of barriers are created in the same place. As a new technical solution, Apache Mesos also intends to remove these barriers. However, Mesos is often used to manage "second clusters", which include all resources except Hadoop tasks.

The differences between Mesos and YARN described earlier are just the beginning. Just as they are not compatible, they often compete with each other. My story, however, is that they work together.

Brief introduction of Mesos and YARN

The main differences between Mesos and YARN revolve around the design of priorities and the way tasks are scheduled. Mesos was born in UC Berkeley in 2007 and has been consolidated commercially by companies such as Twitter and Airbnb. It was designed to serve as an extensible global resource manager for the entire data center. YARN is designed to manage the scale of Hadoop. Before YARN, resource management (functionality) was integrated into the Hadoop MapReduce V1 architecture and was removed (transferred to YARN implementation) to facilitate the extension of MapReduce. MapReduce's Job Tracker does not effectively schedule MapReduce tasks on more than a thousand machines. YARN is created in the next generation of Hadoop lifecycle, mainly around resource expansion.

Scheduling of Mesos

Mesos determines which resources are available, and it returns the allocation request to an application scheduler (the application scheduler and executor are called "frameworks"). These allocation requests are accepted or rejected by the framework. This model is considered to be a non-single model because it is a "two-level" scheduler and the scheduling algorithm is pluggable. Mesos allows any scheduling algorithm to be implemented, each algorithm can receive or reject allocation requests according to its own policy, and can accommodate thousands of schedulers to run in the same cluster in a multi-tenant manner.

Mesos's two-level scheduling model allows each framework (itself) to decide which algorithm to use to schedule running work. Mesos acts as an arbiter, dispatching resources on multiple schedulers, resolving conflicts, and ensuring that resources are distributed fairly based on business policies. When an allocation request arrives, the framework performs tasks to consume those provided resources. Or the framework can choose to reject the request and wait for the next allocation request. This model is very similar to how to run multiple App simultaneously on a laptop or smartphone, creating new threads or requests when more memory is needed, and the operating system mediates all requests. Years of practical development of operating systems and distributed systems have proved that the advantage of this model lies in its good expansibility. It has been proved by Google and Twitter.

Scheduling of YARN

Now, let's take another look at YARN. When the job request arrives at the YARN explorer, YARN evaluates all available resources and dispatches the job. YARN directly determines where the job runs in a holistic way. In the evolution of the MapReduce architecture, it is important to re-emphasize the emergence of YARN. Driven by the resource scale scaling requirements of Hadoop tasks, YARN separates the resource management model from the Job Tracker of MR and implements it in Resources Manager components.

In the past, Hadoop tasks were batch tasks that lasted for a period of time, and YARN was optimized to schedule new Hadoop tasks. This means that YARN is neither designed for long-running services, nor is it designed to meet short-term interactive / fast response requests (like short and fast Spark tasks), although it may schedule other kinds of workloads, it is not an ideal model. The resource requirements, execution model, and architectural requirements of MapReduce are different from long-running services, such as Web servers, SOA applications, or real-time tasks such as Spark and Storm. At the same time, YARN is designed for scheduling stateless scripting tasks, which can be restarted at will when these tasks fail. But it cannot handle stateful services such as distributed file systems or databases. However, YARN's centralized scheduler, which can theoretically handle different types of workloads (by merging new algorithms into scheduling code), is not a lightweight model for supporting increasingly complex scheduling algorithms.

YARN vs Mesos?

When comparing YARN and Mesos, it is important to understand the general scheduling capabilities and why there is a trade-off between them. While some people may think that YARN and Mesos are more or less the same, they are not. The difference lies in the difference in the requirement model when users start to use it. Each model is not explicitly wrong, but each approach produces different long-term results. I think this is the key to choosing how to use them. Ben Hindman and Berkeley AMP labs designed Mesos at the same time as Google's Omega design team. Mesos system benefited from the experience of Google's Omega system design to build a better non-single (two-phase) scheduler.

When you evaluate how to manage the data center as a whole, on the one hand, you use Mesos to manage all the resources in the data center, and on the other hand, you use YARN to manage Hadoop tasks securely, but it does not have the ability to manage the entire data center. Data center operators tend to divide clusters into different regions (Hadoop clusters and non-Hadoop clusters) to deal with these two scenarios.

Using Mesos and YARN in the same data center, you now need to create two static partitions to benefit from the resource manager. This means that Mesos does not work when the specified resource is managed by Hadoop's YARN. This may be oversimplified, although it does work. But in essence, we want to avoid this situation.

Introduction to the project Myriad

This makes us ask: can enterprises and data centers benefit from the coordination of YARN and Mesos? The answer is yes. Some famous companies-- eBay, MapR and Mesosphere-- have worked together on a project called Myriad.

This open source software project is both a Mesos framework and a YARN scheduler, which enables Mesos to manage YARN resource requests. When a task arrives at YARN, it dispatches it through the Myriad scheduler to match the request to the resource provided by Mesos. Accordingly, Mesos passes it to the Mesos worker node. The Mesos node then associates the request with the Myriad executor of a manager that is executing the YARN node. Myriad starts the YARN node manager in the Mesos resource, and after starting, the Mesos resource tells the YARN explorer which resources are available. At this point, YARN can use these resources at will. Myriad provides a seamless bridge between the pool of available resources in Mesos and the tasks of YARN (which need to use resources in Mesos).

Figure 2-how does myriad work? Source: Mesosphere and MapR, invading and deleting.

The advantage of this approach is that it not only allows you to use YARN flexibly in shared clusters, making YARN more dynamic and resilient than it was originally designed. Moreover, it eliminates the need for the operation and maintenance team of the data center to reconfigure the YARN cluster when expanding the capacity of YARN resources. The expansion of the entire data center has become very easy. This model provides a simple way to run and manage multiple YARN implementations, even running different versions of YARN on the same cluster.

Figure 3-Resource sharing source: Mesosphere and MapR, intrusion and deletion.

Myriad combines the advantages of YARN and Mesos. By using the Myriad project so that Mesos and YARN can collaborate, you can complete a real-time business. Data analysis can be performed on the same hardware as the production service. You no longer need to face resource constraints (and low utilization) caused by static partitions. Resources can be flexibly scaled according to the needs of the business.

Thank you for your reading, the above is the content of "how Mesos and YARN work together". After the study of this article, I believe you have a deeper understanding of how Mesos and YARN work together, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.