The usage of SparkSubmit class in deploy directory 04/25 Update SLTechnology News&Howtos

The usage of SparkSubmit class in deploy directory

2025-04-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "the usage of SparkSubmit class under the deploy directory". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "the usage of SparkSubmit class under the deploy directory".

The main job of the various scripts mentioned earlier, whether it is spark-submit,spark-class or launcher project, is to prepare various environments, dependency packages, JVM parameters, and other running environments. The actual submission is mainly the responsibility of the SparkSubmit class under deploy in Spark Code.

The SparkSubmit class in the deploy directory, as mentioned earlier, is the main entry method runMain.

Let's look at other ways first.

1 、 prepareSubmitEnvironment

This method prepares the environment and parameters for submission.

First judge the cluster management mode (cluster manager): yarn, meros, k8sstandalone. Deploy mode: client or cluster.

Later, you will set up different Backend and Wapper classes based on this information.

It's hard to talk about the submission pattern, because it contains too many kinds of deployment environments, and it's highly personalized, so you need to take your time.

There are only two ways to cluster: yarn cluster and standalone cluster. If you understand yarn and standalone, the rest will be easy to understand.

This method returns a quad:

@ return a 4-tuple:

* (1) the arguments for the child process

* (2) a list of classpath entries for the child

* (3) a map of system properties, and

* (4) the main class for the child

Core code

If (deployMode = = CLIENT) {childMainClass = args.mainClass if (localPrimaryResource! = null & & isUserJar (localPrimaryResource)) {childClasspath + = localPrimaryResource} if (localJars! = null) {childClasspath + + = localJars.split (",")} / / Add the main application jar and any added jars to classpath in case YARN client / / requires these jars. / / This assumes both primaryResource and user jars are local jars, or already downloaded / / to local by configuring "spark.yarn.dist.forceDownloadSchemes", otherwise it will not be / / added to the classpath of YARN client. If (isYarnCluster) {if (isUserJar (args.primaryResource)) {childClasspath + = args.primaryResource} if (args.jars! = null) {childClasspath + + = args.jars.split (" ")} if (deployMode = = CLIENT) {if (args.childArgs! = null) {childArgs + + = args.childArgs}} if (args.isStandaloneCluster) {if (args.useRest) {childMainClass = REST_CLUSTER_SUBMIT_CLASS childArgs + = (args.primaryResource, args.mainClass)} else {/ / In legacy standalone cluster mode Use Client as a wrapper around the user class childMainClass = STANDALONE_CLUSTER_SUBMIT_CLASS if (args.supervise) {childArgs + = "--supervise"} Option (args.driverMemory). Foreach {m = > childArgs + = ("--memory", m)} Option (args.driverCores). Foreach {c = > childArgs + = ("--cores", c)} childArgs + = "launch" childArgs + = (args.master, args.primaryResource Args.mainClass)} if (args.childArgs! = null) {childArgs + + = args.childArgs}} / / In yarn-cluster mode, use yarn.Client as a wrapper around the user class if (isYarnCluster) {childMainClass = YARN_CLUSTER_SUBMIT_CLASS if (args.isPython) {childArgs + = ("--primary-py-file", args.primaryResource) childArgs + = ("--class" "org.apache.spark.deploy.PythonRunner")} else if (args.isR) {val mainFile = new Path (args.primaryResource). GetName childArgs + = ("--primary-r-file", mainFile) childArgs + = ("--class", "org.apache.spark.deploy.RRunner")} else {if (args.primaryResource! = SparkLauncher.NO_RESOURCE) {childArgs + = ("--jar" Args.primaryResource)} childArgs + = ("--class", args.mainClass)} if (args.childArgs! = null) {args.childArgs.foreach {arg = > childArgs + = ("--arg", arg)}

The above code is very core and very important. It defines what classes are used to package our spark programs under different cluster modes and different deployment methods, so as to adapt to the submission process in different cluster environments.

Let's take a little more time to analyze this code.

Let's take a look at ChildMainClass:

Under standaloneCluster: REST_CLUSTER_SUBMIT_CLASS= classOf [RestSubmissionClientApp] .getName ()

Under yarnCluster: YARN_CLUSTER_SUBMIT_CLASS=org.apache.spark.deploy.yarn.YarnClusterApplication

In standalone client mode: STANDALONE_CLUSTER_SUBMIT_CLASS = classOf [ClientApp] .getName ()

2 、 runMain

After getting the quad in the previous step, it is the process of runMain.

The core code goes first:

Private def runMain (args: SparkSubmitArguments, uninitLog: Boolean): Unit = {val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment (args) val loader = getSubmitClassLoader (sparkConf) for (jar throw findCause (t)}}

After figuring out the flow of prepareSubmitEnvironment, runMain is simple: it just starts ChildMainClass (a subclass of SparkApplication) and then executes the start method.

If it is not cluster mode but client mode, then ChildMainClass is args.mainClass. Note that ChildMainClass will be wrapped in JavaMainApplication at this time:

New JavaMainApplication (mainClass)

The rest is to look at the implementation logic of RestSubmissionClientApp and org.apache.spark.deploy.yarn.YarnClusterApplication.

Thank you for reading, the above is the content of "the usage of SparkSubmit class under the deploy directory". After the study of this article, I believe you have a deeper understanding of the usage of SparkSubmit class under the deploy directory, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.