Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to realize the principle Analysis of NodeManager

2025-03-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

How to realize the principle analysis of NodeManager, I believe that many inexperienced people are at a loss about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

1. Analysis of the interactive interface with other modules.

1) interact with RM through ResourceTracker protocol as client,NodeStatusUpdater.

There are two ways to do this API

Register NodeManager with RM with parameters httpPort, nodeId, totalResource, where totalResource is the total allocable resource of the node, including CPU and memory.

To RM heartbeat, NM starts by periodically reporting Container status to RM, such as being in launched container, completed container, node health information, and returning to get commands, such as kill container.

2) as a server, it provides ContainerManager services and provides services for AM through ContainerManagementProtocol protocol.

The API provides three methods

Start containers, and the parameter is StartContainersRequest, that is, the StartContainerRequest list. Each object includes the resources needed to start Container (localResources), environment variables (environment), commands (commands), Token, and so on.

Stop containers, the parameter is StopContainerResult, and specify the containerId of kill.

Get the container status, the parameter is GetContainerStatusRequest, and specify the target containerId.

2. Introduction of several main functional classes

1) NodeStatusUpdater: periodically report the status of container to RM, including launched containers and finished containers, return the Container list to be clean, the Application list to be clean, etc.

2) the state machine host object of the ApplicationImpl:NM application, and all container; managing the application on the NM maintains a state machine that records transitions and events between the states of the app, as well as event actions.

3) the state machine host object of ContainerImpl:NM 's Container, which stimulates the transitions and events between the various states of container, as well as the actions of events.

4) ContainerManager: provides a RPC service to start containers, stop containers, and get the status information of container; this class is the startup class managed by container.

5) for LogHandler:Container to run the log service, you can configure multiple log directories with the parameter yarn.nodemanager.log-dirs. The structure of each directory is the same. $log-dir/$appid/$containerid/stderr | stdout | stdlog; has two implementation classes, NonAggregationLogHandler and LogAggregationService. Because NM generates a large number of logs and needs to be cleaned, NonAggregationLogHandler cleans logs regularly, and LogAggregationService uploads logs to HDFS through log aggregation and rollover. By default, logs are cleaned regularly.

6) ResourceLocalizationService: to start container, you need to localize the Resource needed by container. You can download resources from hdfs according to the parameters of starting container and distribute them evenly in the data directories on each disk. The local data directory stores the data needed to execute Container (including executable programs, jar, configuration, etc.), as well as some intermediate data temporarily generated during the MR process. ResourceLocalizationService starts a RPC service (LocalizationProtocol) for downloading resources. For each container, ContainerLocalizer is used as a client to download resources through LocalizationProtocol protocol, and timely heartbeat reports progress.

7) ContainersLauncher: maintains a thread pool to run the ContainerLaunch task, which generates a shelllaunch_container.sh script and then starts the container process through ContainerExecutor.launchContainer.

8) ContainerExecutor: the process of starting and clearing container. The two implementations, DefaultContainerExecutor and LinuxContainerExecutor, default to DefaultContainerExecutor, while LinuxContainerExecutor starts and stops Container in a more secure manner as the owner of the Application, and Cgroups can be used to isolate CPU resources.

9) ContainerMonitor: periodically detect the resource usage of Container. Once the limit is exceeded, kill container. Memory resources can be monitored through ContainerMonitor, and Cgroups is still used for CPU.

10) AuxService: extended services that run as NM starts and stops.

11) NodeHealthCheckService: periodically run custom scripts and write disk files to check the health of NM, and report to RM through NodeStatusUpdater. If unhealthy, RM will not assign tasks to the NM until health is restored.

3. Container event flow

1) AM or RM starts container through the RPC service ContainerManagementProtocol.startContainers of NM

2) the startContainers logic of the server ContainerManagerImpl of NM starts container; in turn according to the number of container requested. The logic of starting container is to create and initialize Application, create and initialize container.

3) send the event ApplicationEventType.INIT_APPLICATION to the event handling framework AsyncDispatcher, and the event of type ApplicationEventType is handled by ApplicationEventDispatcher. In this dispatcher handle, the handle method of the host object ApplicationImpl of the state machine stateMachine is called, and the logical AppInitTransition that handles the INIT_APPLICATION event (state by ApplicationState.NEW à ApplicationState.INITING) is called in the handle method, and the LogHandlerEventType.APPLICATION_STARTED is sent in the AppInitTransition.

4) LogHandlerEventType.APPLICATION_STARTED is processed by LogAggregationService, where the Application-related log directory is created, and then ApplicationEventType.APPLICATION_LOG_HANDLING_INITED is sent to the event handling framework.

5) ApplicationEventType.APPLICATION_LOG_HANDLING_INITED is processed through AppLogInitDoneTransition, where a LocalizationEventType.INIT_APPLICATION_RESOURCES event is sent, which is handled by ResourceLocalizationService, which sends ApplicationEventType.APPLICATION_INITED to the event handling framework.

6) the ApplicationEventType.APPLICATION_INITED event is handled by the state machine transition logic AppInitDoneTransition in Application, and the ApplicationState.INITING state is transferred to the ApplicationState.RUNNING state.

7) send the ContainerEventType.INIT_CONTAINER event to the container in Application in turn in the AppInitDoneTransition logic, blocking the processing of the event until container is in new state.

8) the ContainerEventType.INIT_CONTAINER event is handled by the RequestResourcesTransition logic in the state machine host object ContainerImpl, and goes into resource localization processing. If there is a resource that needs to be localized (downloaded or created), then send INIT_CONTAINER_RESOURCES to ResourceLocalizationService for processing and enter the LOCALIZING state; if the resource has been localized (created or downloaded), send the LAUNCH_CONTAINER event and enter the LOCALIZED state directly.

9) the processing logic of ResourceLocalizationService to INIT_CONTAINER_RESOURCES is that each requested resource of container corresponds to a LocalResourcesTracker implementation, and sends a ResourceEventType.REQUEST request to the tracker Each type of resource has a host object LocalizedResource, which forwards the event in tracker to LocalizedResource for processing, enters the transformation of the state machine (ResourceState.INIT- > ResourceState.DOWNLOADING), invokes the logic processing FetchResourceTransition, which sends the event LocalizerEventType.REQUEST_RESOURCE_LOCALIZATION, and the LocalizerTracker logic enables the thread to LocalizerRunner prepare the container environment (Application directory, log directory..), calls ContainerExecutor.startLocalizer to start ContainerLocalizer, and reports progress to ResourceLocalizationService during the download Resource process.

10) after the localization of Resource resources is completed, first send ContainerEventType.RESOURCE_LOCALIZED to ContainerImpl,ContainerImpl and then send the ContainersLauncherEventType.LAUNCH_CONTAINER event to be handled by ContainersLauncher, write the complete shell running container to launch_container.sh in the private directory, open the thread ContainerLaunch,ContainerLauch to generate shelllaunch_container.sh script separately in the process, and start running a container (ContainerExecutor.launchContainer).

11) run container, which is run by ContainerExecutor. LaunchContainer executes the shell script to complete it; the shell script runs the YarnChild MR task (like MR1), blocking until the container is finished running container and entering the Container resource cleaning process

12) the Container operation may succeed or fail. After receiving the CONTAINER_EXITED_WITH_SUCCESS event, ContainerImpl sends CLEANUP_CONTAINER and CLEANUP_CONTAINER_RESOURCES events to ContainersLauncher and ResourceLocalizationService, respectively.

13) ContainersLauncher cleans up the temporary directory of Container, such as the process PID file, and checks whether the file exists, and forces recovery if it exists. After the collection is completed, ContainerLauncher sends the CONTAINER_RESOURCES_CLEANDUP event to ContainerImpl, and then removes the Container from the Container list by sending the FINISHED event to ApplicationImpl, ContainerMonitorImpl, and LogHandler, removing the monitoring of resource usage to the Container.

14) ResourceLocalizationService cleans up the relevant data directories of container (shell scripts, etc.) and sends CONTAINER_RESOURCE_CLEANEDUP events to ContainerImpl.

Note that some of the intermediate data in Container is cleaned up after the entire application is completed, because only RM knows whether the Application has been completed. NM NodeStatusUpdater reports to RM,RM through the ResourceTrackerProtocol protocol heartbeat and returns the list of Container that needs to be cleaned to NM,NM in order to thoroughly clean up all the resources occupied by Container.

After reading the above, have you mastered the method of how to realize the principle analysis of NodeManager? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report