Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What does Hadoop Yarn mean?

2025-03-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article will explain what Hadoop Yarn means for you in detail. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.

one。 What is Hadoop Yarn?

In the ancient Hadoop1.0, MapReduce's JobTracker is responsible for too much work, including resource scheduling, managing a large number of TaskTracker and so on. This is naturally unreasonable, so Hadoop will be independent of the resource scheduling of JobTracker in the process of upgrading from 1.0 to 2.0, and this change directly makes Hadoop the most stable cornerstone in big data, and this independent resource management framework is Yarn.

Before we go into detail about Yarn, let's talk briefly about Yarn. The full name of Yarn is Yet Another Resource Negotiator, which means "another kind of resource scheduler". This kind of naming is similar to that of "guest house". One more thing here, Java used to have a project compilation tool called Ant, which has a similar name, an acronym called "Another Neat Tool", which translates to "another finishing tool".

Since they are all called resource schedulers, naturally, its functions are also responsible for resource management and scheduling, so let's dig into the inside of Yarn.

two。 Yarn architecture

We mainly focus on the above diagram, but when introducing the content in the diagram, we need to first understand the concept of Container in Yarn, then introduce the components in the diagram, and finally take a look at the process of submitting a program.

2.1 Container

Container is a layer of abstraction that Yarn does to resources. Just as in our usual development process, we often need to encapsulate some things at the bottom and provide only a call interface to the upper layer, Yarn also uses this idea in the management of resources.

As shown above, Yarn encapsulates computing resources such as CPU core and memory into Container. There are two points to note:

Containers are started, managed and monitored by NodeManager.

The container is scheduled by ResourceManager.

The two components, NodeManager and ResourceManager, are discussed below.

2.2 three main components

Looking back at the top figure, the two main components we can visually see are ResourceManager and NodeManager, but there is actually another ApplicationMaster that is not intuitively shown in the figure. Let's look at these three components separately.

ResourceManager

Let's first talk about the ResourceManager (RM) in the middle of the picture above. From the name, we can know that this component is responsible for resource management, the whole system has and only one RM, is responsible for resource scheduling. It also contains two main components: the timing caller (Scheduler) and the application manager (ApplicationManager).

Timing scheduler (Scheduler): in essence, a timing scheduler is a strategy, or an algorithm. When Client submits a task, it allocates it based on the resources needed and the resource status of the current cluster. Note that it is only responsible for allocating resources to the application and does not monitor or track the status of the application.

Application Manager (ApplicationManager): again, you can get a rough idea of what it does by listening to the name. The application manager is responsible for managing applications submitted by Client users. It is mentioned above that the Scheduler does not monitor the programs submitted by users. In fact, the monitoring of applications is done by the application manager (ApplicationManager).

ApplicationMaster

Every time Client submits an Application, a new ApplicationMaster is created. This ApplicationMaster applies for container resources with ResourceManager. After obtaining the resources, the program to be run will be sent to the container to start, and then distributed computing will be carried out.

It may be a little hard to understand here, why send the running program to the container to run? If you look at the traditional way of thinking, the program is running motionless, and then the data flows in and out. However, it is impossible to do this when there is a large amount of data, because the cost of moving massive data is too high and the time is too long. But there is an old Chinese saying that if the mountain does not come, I will go there. Big data distributed computing is such an idea. Since big data is difficult to move, I will publish mobile applications to each node for computing. This is big data's idea of distributed computing.

NodeManager

NodeManager is the ResourceManager agent on each machine, responsible for managing containers, monitoring their resource usage (cpu, memory, disk, network, etc.), and providing reports on these resource usage to ResourceManager/Scheduler.

three。 Submit an Application to Yarn process

This diagram simply shows the process of submitting a program, and let's talk about each step in detail.

Client submits the Application to Yarn, and here we assume it is a MapReduce job.

ResourceManager communicates with NodeManager, assigning the first container to the Application. And run the corresponding ApplicationMaster for the application in this container.

After the ApplicationMaster starts, the job (that is, the Application) is split and the task is split out, and the task can be run in one or more containers. Then apply to ResourceManager for a container to run the program, and send a heartbeat to ResourceManager regularly.

After applying to the container, ApplicationMaster communicates with the NodeManager corresponding to the container, and then distributes the job to the container in the corresponding NodeManager to run. Here, the split MapReduce is distributed, and the Map task or Reduce task may be running in the container.

Tasks running in the container send a heartbeat to ApplicationMaster to report their own situation. When the program is finished, ApplicationMaster logs out to ResourceManager and releases the container resources.

The above is the general running process of a job.

Why is there a Yarn?

After all that has been said above, finally let's talk about why there is Yarn.

The direct reason is that because of the architectural defects in Hadoop1.0, jobTracker takes on too much responsibility in MapReduce, it is the receiving task, resource scheduling is it, monitoring the operation of TaskTracker or it. The benefits of this implementation are relatively simple, but relatively, it is prone to some problems, such as the common single point of failure.

To solve these problems, we can only split the jobTracker and disassemble some of its functions. At that time, the industry already had a part of the resource management framework, such as mesos, so Yarn was developed according to this idea. Here is one more cold knowledge, in fact, Spark was produced in the early days to promote mesos, which is also the origin of its name, but it was Spark that became popular later.

Without much gossip, in fact, Hadoop is where it is today, and Yarn can be said to have contributed to it. Because of Yarn, more computing frameworks can be connected to Hdfs, not just MapReduce. Until now, we all know that MapReduce has long been overtaken by computing frameworks such as Spark, but Hdfs is still standing. The reason is that due to the inclusion of Yarn, other computing frameworks can focus on the improvement of computing performance. Hdfs may not be the best big data storage system, but it is the most widely used big data storage system, Yarn plays an important role.

This is the end of the article on "what does Hadoop Yarn mean?". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report