Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the basic architecture of Yarn in Hadoop

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly shows you "what is the basic architecture of Yarn in Hadoop", which is easy to understand and well organized. I hope it can help you solve your doubts. Let me lead you to study and learn what the basic architecture of Yarn in Hadoop is like.

1.1 YARN basic architecture

YARN is a resource management system in Hadoop 2.0. its basic design idea is to split the JobTracker in MRv1 into two independent services: a global resource manager ResourceManager and a unique ApplicationMaster for each application.

ResourceManager is responsible for the resource management and allocation of the whole system, while ApplicationMaster is responsible for the management of individual applications.

1.2 basic composition and structure of YARN

On the whole, YARN is still a Master/Slave structure. In the whole resource management framework, ResourceManager is responsible for the unified management and scheduling of resources on each NodeManager for Master,NodeManager and Slave,ResourceManager. When users submit an application, they need to provide an ApplicationMaster to track and manage the program, which is responsible for requesting resources from ResourceManager and asking NodeManger to start tasks that can take up certain resources. Because different ApplicationMaster are distributed on different nodes, they do not affect each other. In this section, we will introduce the basic structure of YARN.

Figure 2-9 depicts the basic structure of YARN. YARN is mainly composed of ResourceManager, NodeManager, ApplicationMaster (ApplicationMaster of MapReduce and MPI computing framework, MR AppMstr and MPI AppMstr, respectively) and Container.

1.ResourceManager (RM)

RM is a global resource manager, which is responsible for the resource management and allocation of the whole system. It mainly consists of two components: scheduler (Scheduler) and application manager (Applications Manager,ASM).

(1) dispatcher

The scheduler allocates resources in the system to each running application according to capacity, queues and other constraints (for example, each queue allocates certain resources, up to a certain number of jobs, etc.).

It should be noted that the scheduler is a "pure scheduler", it is no longer engaged in any specific application-related work, such as not responsible for monitoring or tracking the execution status of the application, nor is it responsible for restarting failed tasks caused by application execution failure or hardware failure, which are done by the application-related ApplicationMaster. The scheduler allocates resources only according to the resource requirements of each application, while the resource allocation unit is represented by an abstract concept "Resource Container" (Container). Container is a dynamic resource allocation unit, which encapsulates resources such as memory, CPU, disk, network and so on, thus limiting the amount of resources used by each task. In addition, the scheduler is a pluggable component, and users can design a new scheduler according to their own needs. YARN provides a variety of directly available schedulers, such as Fair Scheduler and Capacity Scheduler.

(2) Application Manager

The Application Manager is responsible for managing all applications throughout the system, including application submission, negotiating resources with the scheduler to start ApplicationMaster, monitoring ApplicationMaster health and restarting it if it fails, and so on.

2. ApplicationMaster (AM)

Each application submitted by the user contains 1 AM, and the main features include:

Negotiate with RM Scheduler to get resources (represented in Container)

Further assign the resulting tasks to internal tasks

Communicate with NM to start / stop tasks

Monitor the running status of all tasks and re-request resources for the task to restart the task when the task fails.

Currently, YARN comes with two AM implementations, one is distributedshell, an example program used to demonstrate how to write AM, which can apply for a certain number of Container to run a Shell command or Shell script in parallel, and the other is AM-MRAppMaster that runs MapReduce applications, which we will introduce in Chapter 8. In addition, some other computing frameworks corresponding to AM are under development, such as Open MPI, Spark and so on.

3. NodeManager (NM)

NM is the resource and task manager on each node. On the one hand, it regularly reports to RM the resource usage on this node and the running status of each Container; on the other hand, it receives and processes various requests such as Container start / stop from AM.

4. Container

Container is a resource abstraction in YARN, which encapsulates multi-dimensional resources on a node, such as memory, CPU, disk, network, etc. When AM applies for resources from RM, the resources returned by RM for AM are represented by Container. YARN assigns a Container to each task, and the task can only use the resources described in that Container.

It should be noted that Container, unlike slot in MRv1, is a dynamic resource division unit that is dynamically generated according to the needs of the application. As of the completion of this book, YARN only supports CPU and memory resources, and uses a lightweight resource isolation mechanism Cgroups for resource isolation.

1.3 YARN workflow

When a user submits an application to YARN, YARN runs the application in two phases:

The first stage is to start ApplicationMaster

The second phase is for ApplicationMaster to create the application, request resources for it, and monitor its entire run until it is complete.

As shown in figure 2-11, the workflow of YARN is divided into the following steps:

Step 1 the user submits the application to YARN, including the ApplicationMaster program, the command to start ApplicationMaster, the user program, and so on.

Step 2 ResourceManager assigns the first Container to the application and communicates with the corresponding Node-Manager, asking it to start the application's ApplicationMaster in this Container.

Step 3 ApplicationMaster first registers with ResourceManager so that users can view the running status of the application directly through ResourceManager, and then it will request resources for each task and monitor its running status until the end of the run, that is, repeat step 4x7.

Step 4 ApplicationMaster uses polling to apply for and receive resources from ResourceManager through the RPC protocol.

Step 5 once the ApplicationMaster requests a resource, it communicates with the corresponding NodeManager, asking it to start the task.

Step 6 after NodeManager has set up the running environment for the task (including environment variables, JAR packages, binary programs, etc.), write the task startup command into a script and start the task by running the script.

Step 7 each task reports its status and progress to ApplicationMaster through a RPC protocol, so that ApplicationMaster can keep abreast of the running status of each task, so that it can restart the task when it fails.

During the running of the application, users can query the current running status of the application to ApplicationMaster at any time through RPC.

Step 8 when the application is finished, ApplicationMaster logs out to ResourceManager and closes itself.

1.4 understanding of YARN from multiple perspectives

Think of YARN as a cloud operating system, which is responsible for launching ApplicationMaster (equivalent to the main thread) for the application, and then ApplicationMaster is responsible for data segmentation, task allocation, startup and monitoring, while each Task (equivalent to child threads) started by ApplicationMaster is only responsible for its own computing tasks. When all the task calculations are complete, ApplicationMaster thinks the application is running and exits.

The above is all the content of the article "what is the basic architecture of Yarn in Hadoop?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report