Spark's detailed process of submitting Yarn 07/19 Update SLTechnology News&Howtos

Spark's detailed process of submitting Yarn

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "the detailed process of Spark submitting Yarn". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "the detailed process of Spark submitting Yarn".

Spark-submit.sh- > spark-class.sh, and then call SparkSubmit.scala.

It is handled differently according to client or cluster mode.

Client: wrap the process to execute the driver directly where the spark-class.sh is running.

Cluster: submit the driver to the cluster for execution.

The core is in the prepareSubmitEnvironment method of SparkSubmit.scala, take a look at a section dealing with the Yarn cluster environment.

/ In client mode, launch the application main class directly / / In addition, add the main application jar and any added jars (if any) to the classpath if (deployMode = = CLIENT) {childMainClass = args.mainClass if (localPrimaryResource! = null & & isUserJar (localPrimaryResource)) {childClasspath + = localPrimaryResource} if (localJars! = null) {childClasspath + + = localJars.split (",")}

Client mode, childMainClass is the main method of driver.

Let's take a look at the Yarn cluster mode:

/ / In yarn-cluster mode, use yarn.Client as a wrapper around the user class if (isYarnCluster) {childMainClass = YARN_CLUSTER_SUBMIT_CLASS if (args.isPython) {childArgs + = ("- primary-py-file", args.primaryResource) childArgs + = ("--class" "org.apache.spark.deploy.PythonRunner")} else if (args.isR) {val mainFile = new Path (args.primaryResource). GetName childArgs + = ("--primary-r-file", mainFile) childArgs + = ("--class", "org.apache.spark.deploy.RRunner")} else {if (args.primaryResource! = SparkLauncher.NO_RESOURCE) {childArgs + = ("--jar" Args.primaryResource)} childArgs + = ("--class", args.mainClass)} if (args.childArgs! = null) {args.childArgs.foreach {arg = > childArgs + = ("--arg", arg)}

That's when childMainClass became

YARN_CLUSTER_SUBMIT_CLASS = "org.apache.spark.deploy.yarn.YarnClusterApplication"

Private [spark] class YarnClusterApplication extends SparkApplication {override def start (args: Array [String], conf: SparkConf): Unit = {/ / SparkSubmit would use yarn cache to distribute files & jars in yarn mode, / / so remove them from sparkConf here for yarn mode. Conf.remove (JARS) conf.remove (FILES) new Client (new ClientArguments (args), conf, null). Run ()

Looking at the source code, we can see that YarnClusterApplication finally uses deploy/yarn/Client.scala.

Client.run calls the client.submitApplication method to submit to the Yarn cluster.

Def submitApplication (): ApplicationId = {/ / Set up the appropriate contexts to launch our AM val containerContext = createContainerLaunchContext (newAppResponse) val appContext = createApplicationSubmissionContext (newApp, containerContext)}

Mainly the createContainerLaunchContext method:

/ * Set up a ContainerLaunchContext to launch our ApplicationMaster container. * This sets up the launch environment, java options, and the command for launching the AM. * / private def createContainerLaunchContext (newAppResponse: GetNewApplicationResponse) {val userClass = if (isClusterMode) {Seq ("--class" YarnSparkHadoopUtil.escapeForShell (args.userClass)} else {Nil} val amClass = if (isClusterMode) {Utils.classForName ("org.apache.spark.deploy.yarn.ApplicationMaster"). GetName} else {Utils.classForName ("org.apache.spark.deploy.yarn.ExecutorLauncher"). GetName} val amArgs = Seq (amClass) + + userClass + + userJar + + primaryPyFile + + primaryRFile + + userArgs + + Seq ("--properties-file" BuildPath (Environment.PWD.$$ (), LOCALIZED_CONF_DIR, SPARK_CONF_FILE) + + Seq ("- dist-cache-conf", buildPath (Environment.PWD.$$ (), LOCALIZED_CONF_DIR, DIST_CACHE_CONF_FILE)) / / Command for the ApplicationMaster val commands = prefixEnv + + Seq (Environment.JAVA_HOME.$$ () + "/ bin/java", "- server") + + javaOpts + + amArgs + + Seq ("1 >") ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/ stdout", "2 >", ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/ stderr")}

This generates the command to be executed, which is Command. What does the above sentence mean:

(1) cluster mode

Start userClass with ApplicationMaster.

(2) client mode

Start Executor

What we are going to look at here is the cluster mode, so it is clear that in cluster mode, the userClass is wrapped in ApplicationMaster and started in the Yarn cluster. UserClass means driver.

Thank you for your reading, the above is the content of "the detailed process of Spark submitting Yarn". After the study of this article, I believe you have a deeper understanding of the detailed process of Spark submitting Yarn, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.