In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces to you what are the two submission methods of spark on yarn, the content is very detailed, interested friends can refer to, hope to be helpful to you.
Like the yarn-cluster mode, the entire program is submitted through a spark-submit script. However, the operation of the yarn-client job program does not need to be started by the Client class, but directly calls the main function of the job through the reflection mechanism. Here's the analysis:
1. The main function of the job is called directly through the launch function of the SparkSubmit class (implemented through the reflection mechanism). If it is in cluster mode, the main function of Client is called.
2. The main function of the application must have a SparkContent and initialize it.
3. In SparkContent initialization, the following things will be done in turn: set up the relevant configuration, register MapOutputTracker, BlockManagerMaster, BlockManager, create taskScheduler and dagScheduler;, and the most important thing is to create taskScheduler and dagScheduler. When creating a taskScheduler, we will select Scheduler and SchedulerBackend according to the master we passed in. Because we choose yarn-client mode, the program will choose YarnClientClusterScheduler and YarnClientSchedulerBackend, and initialize the instance of YarnClientSchedulerBackend. The above two instances are obtained through reflection mechanism. YarnClientSchedulerBackend class is a subclass of CoarseGrainedSchedulerBackend class, and YarnClientClusterScheduler is a subclass of TaskSchedulerImpl, which only overrides the getRackForHost method in TaskSchedulerImpl.
4. After initializing the taskScheduler, the dagScheduler is created, and then the taskScheduler is started through taskScheduler.start (), and the start method of SchedulerBackend is also called during the startup of the taskScheduler. During SchedulerBackend startup, some parameters are initialized, encapsulated in ClientArguments, the encapsulated ClientArguments is passed into the Client class, and the client.runApp () method gets the Application ID.
5. What is done in client.runApp is similar to the previous section where the client operates, except that the startup is ExecutorLauncher (ApplicationMaster is started in yarn-cluster mode).
6. Initialize and start amClient in ExecutorLauncher, and then register the Application with ApplicationMaster. After registration, you will wait for driver to start, and when driver is started, a MonitorActor object will be created to communicate with CoarseGrainedSchedulerBackend (only the event AddWebUIFilter will communicate with them, and the health of Task will not communicate with CoarseGrainedSchedulerBackend through it). Then set the addAmIpFilter, and when the job is completed, ExecutorLauncher will set the status of the Application to FinalApplicationStatus.SUCCEEDED through amClient.
7. Assign Executors, where the allocation logic is similar to that in yarn-cluster, so we won't talk about it any more.
Finally, Task will run in CoarseGrainedExecutorBackend, and then the health status will be notified to CoarseGrainedScheduler through Akka until the job is finished.
9. When the job is running, YarnClientSchedulerBackend will get the running status of the job through client every 1 second and print out the corresponding running information. When the status of Application is one of FINISHED, FAILED and KILLED, then the program will exit and wait.
10. Finally, a thread reconfirms the state of Application. When the state of Application is one of FINISHED, FAILED, and KILLED, the program finishes running and stops SparkContext. The whole process is over.
So much for sharing about the two submission methods of spark on yarn. I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.