What is the process that YARN task submission starts? 04/10 Update SLTechnology News&Howtos

What is the process that YARN task submission starts?

2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article introduces the relevant knowledge of "what is the process of YARN task submission and startup". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

[noun concept]

First of all, let's explain some of the concepts in yarn, which will be involved in subsequent processes.

ResourceManager (RM)

Responsible for the resource management and allocation of the whole cluster, processing requests from clients and AM, allocating resources to containr (scheduling tasks to different NM), obtaining the running status of all container through the heartbeat of NM, and retrying the necessary failures.

NodeManager (NM)

The actual owner of the cluster resources (CPU and memory, including GPU, etc.) receives requests from RM and AM, starts a specific container, and reports the operation of the container to the RM through a heartbeat.

Application

Corresponding to job in version 1.x, it can be a MapReduce application, a spark application, or a flink application, etc.

ApplicationMaster (AM)

Each Application has an ApplicationMaster, which is responsible for managing a specific application, including requesting resources for specific tasks from RM, starting specific tasks from NM, monitoring the running status of all tasks, and carrying out necessary fault-tolerant processing. The jobManager of spark's driver,flink belongs to AM.

Container

Container is an abstract concept in YARN. It is an encapsulation and abstraction of resources, environment variables, startup parameters and so on.

An Application can be divided into two types of Container, one is the AM mentioned earlier, and the other is the container of specific tasks. The common tasks container are map tasks in MR, reduce tasks, executor in spark, and taskManager in flink.

[overall process]

First of all, take a look at the overall process of the client submitting the task to the final run through a diagram.

When the client submits an application to RM, it essentially requests RM to launch AM.

RM selects the appropriate NM and sends a request to NM to start AM

After receiving the request to start AM, NM downloads the resources that AM depends on locally according to the parameters it carries.

After the resource-dependent localization is completed, NM starts the AM process

After AM starts, register with RM and apply to RM for the resources needed to start the task containr.

RM reports on the resources of the NM and replies to the AM about the allocation of resources (container), that is, assigning a specific NM to the requested task container.

According to the NM assigned by the task container, AM sends a request to the corresponding NM to start the task container

After receiving the request to start the task container, NM also completes the localization of dependent resources according to the request parameters, and then starts the task container process.

There are several points to note in the overall process:

The container assigned to the container in RM is passively triggered after waiting for NM to report the heartbeat.

The running status of the task container is that the NM reports to the RM through the heartbeat, and the RM informs the corresponding AM through the heartbeat response of the AM.

[process in RM]

Application, container, and AM are mentioned in the previous concepts. In RM, Application,Container,AppAttempt classes are used to correspond to these three concepts. The entire task submission and running process is completed around the creation of the three class instances and the changes to their respective state machines.

Of course, there is another piece of content that has not been involved, that is, the scheduler module, which is not in-depth here. We will explain it separately later.

Let's take a look at the process of task submission running in RM:

The client applies for the ID of Application from RM

The unique ID that generates application internally from RM

Inform the client of applicaiton ID through rpc response

The client carries the ID, as well as the container context, and submits the task to the RM through RPC.

RM's rpc service forwards the request to the internal AppManager module.

AppManager creates an App instance object (RMAppImpl).

The start event is then sent to the instance object.

After receiving the event, RMAppImpl requests the state storage service to save the App state, which changes from NEW to NEW_SAVING.

After the state storage service completes the storage of APP information, it informs RMAppImpl in the form of events.

RMAppImpl sends an event to add APP to the scheduler, changing the status from NEW_SAVING to SUBMITTED.

After receiving the message, the scheduler takes the appropriate processing action, and then informs the RMAppImpl application that it is accepted.

RMAppImpl creates an Attempt instance object (RMAppAttemptImpl)

Then, a start event is sent to it, and then the state changes from SUBMITTED to ACCEPTED.

After the Attempt is created, register with the ApplicationMasterService to make it have a corresponding record in memory, so that it is convenient for the real AM process to register.

The add Attempt event is then sent to the scheduler.

The scheduler also carries out a series of processing, including permission judgment, queue application counting, etc., records the relevant information in memory, and finally informs Attempt that it has been successfully added.

Attempt invokes the scheduler's interface to request the resources needed to start AM, while changing the status from NEW to SUBMITTED.

When a NM node sends a heartbeat request to RM, the RM will eventually notify the scheduler in the form of events, and the scheduler will select the appropriate application to allocate resources for it.

A new Container object (RMContainerImpl) is created during the resource allocation process.

The start event is then sent to the Container object.

When container receives the start event, it informs attempt that the resource has been allocated. Its own state changes from NEW to ALLOCATED.

When the attempt receives the event, it obtains the allocated resources from the scheduler through the interface, and then changes the status from SUBMITTED to SCHEDULED.

An acquire event is sent to container during the scheduler's interface processing. The state of Container changes from ALLOCATED to ACQUIRED.

The attempt then sends a request to the state storage module to store the information of the attempt. Its own state changes from SCHEDULED to ALLOCATED_SAVING.

When the state storage is complete, tell attempt in the form of an event.

Attmpt sends a request to start AM to the AMLaunch module. Its own state changes from ALLOCATED_SAVING to ALLOCATED.

AMLaunch sends a request for startContainer to the specified NM through the RPC protocol.

AMLaunch informs Attempt,container that it has started and the state of Attempt changes from ALLOCATED to LAUNCHED.

After NM receives the request, start the AM process

After receiving the registration request, ApplicationMasterService informs the corresponding Attempt. The status of Attempt is cut from LAUNCHED to RUNNING.

When Attempt receives the message that the AM process registers successfully, it then tells RMAppImpl. The state of App transitions from ACCEPTED to RUNNING.

Note: NM informs the running status of the container on the RM node through the heartbeat, and the message is processed within RM to notify the corresponding container,container status from ACQUIRED to RUNNING.

[process in NM]

Unlike RM, NM does not perceive whether container is a specific task or AM, so there are only application and container inside. The task running process revolves around the creation of these two class instances, the change of the state machine and the surrounding supporting modules.

In NM, the process that the task runs is shown in the following figure:

The containerManagerImpl inside NM handles the request to start container. First, create an instance object of AppImpl (concrete implementation of App, hereinafter referred to as App), then send an initialization event to the APP, and then create a new ContainerImpl (concrete implementation of Container, hereinafter referred to as Container) object.

App sends a request to the log aggregation module, informing App to start, requiring the corresponding initialization action, and changing the state from NEW to INITING.

After the log aggregation module completes the initialization of app, it informs App through events.

After receiving the event, the APP then sends a request to the resource localization service module to complete the download of the resource on which the App depends.

After the resource localization service module finishes downloading the corresponding resource, it informs the App through the event.

When the App receives the event, it sends an initialization event to the Container, and the state changes from INITING to RUNNING.

Container also sends a request to the resource localization service module to complete the download of the resource on which the Container depends, and the status changes from NEW to LOCALIZING.

The resource localization service module notifies Container in the form of an event every time a resource is successfully downloaded.

When Container senses that all dependent resources are localized, it informs the resource localization service module to clean up through the event (the cleanup action here is not to clean up the resource file, but to end the corresponding resource download process).

Container continues to send requests to the Container startup service module to start a specific Container process, and then the status changes from LOCALIZING to LOCALIZED.

The Container startup service module sets environment variables and startup parameters to generate startup scripts according to the Container context, creates a process for Container, and then informs Container through events.

When Container receives the event that the process starts, the state changes from LOCALIZED to RUNNING.

When the process of Container finishes running, its corresponding creation thread gets its end code and notifies Container. (assuming that it runs successfully and ends normally.)

When Container receives the event, it sends a request to the resource localization service module to clean up the resource file, and then changes the status from RUNNING to EXITED_WITH_SUCCESS.

After the resource localization service module cleans up the resource files of Container, it informs Container.

Container notifies the log aggregation module to finish running and prepare it for log aggregation.

The App,Container is then notified that the run is over, and the final state is switched to DONE.

After the App-aware Container is finished, only the relevant records are made in memory, and the NM reports the health status of all container to the RM through the heartbeat. RM then tells AM through the heartbeat, and when AM learns that all tasks are over, log out to RM and exit itself. When RM learns that the AM is over, it carries out the corresponding processing action, and finally informs the application of the NM of the corresponding task containerd, and the application is finished. NM finally informs App internally.

After receiving the message, App notifies the resource localization service module to clean up the resources. Then change its own state from RUNNING to APPLICATION_RESOURCE_CLEANUP.

The resource-based local service module notifies App of the event after the resource cleaning is completed.

App informs the log aggregation module to do log aggregation, and the final state changes to FINISHED.

This is the end of the content of "what is the process for YARN task submission to start". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.