What is the execution process of Spark-submit? 07/04 Update SLTechnology News&Howtos

What is the execution process of Spark-submit?

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article is to share with you about the implementation process of Spark-submit, the editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

We use "spark-submit-class." when submitting Spark tasks. Style command to submit the task, which is the shell script in the Spark directory. Its function is to query spark-home and invoke the spark-class command.

If [- z "${SPARK_HOME}"]; then source "$(dirname" $0 ")" / find-spark-homefi# disable randomized hash for string in Python 3.3+export PYTHONHASHSEED=0exec "${SPARK_HOME}" / bin/spark-class org.apache.spark.deploy.SparkSubmit "$@"

The spark-class command is then executed to submit the task to the Spark program with the SparkSubmit class as a parameter, while the shell script of Spark-class mainly performs the following steps:

(1) load spark environment parameters and obtain them from conf

If [- z "${SPARK_HOME}"]; then source "$(dirname" $0 ")" / find-spark-homefi. "${SPARK_HOME}" / bin/load-spark-env.sh# looking for javahomeif [- n "${JAVA_HOME}"]; then RUNNER= "${JAVA_HOME} / bin/java" else if ["$(command-v java)"]; then RUNNER= "java" else echo "JAVA_HOME is not set" > & 2 exit 1 fifi

(2) load java,jar package, etc.

# Find Spark jars.if [- d "${SPARK_HOME} / jars"]; then SPARK_JARS_DIR= "${SPARK_HOME} / jars" else SPARK_JARS_DIR= "${SPARK_HOME} / assembly/target/scala-$SPARK_SCALA_VERSION/jars" fi

(3) call Main in org.apache.spark.launcher for parameter injection

Build_command () {"$RUNNER"-Xmx128m-cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@" printf "% d\ 0" $?}

(4) the shell script monitors the execution status of the task, whether it completes or exits the task, and determines whether it ends by executing the return value.

If! [[$LAUNCHER_EXIT_CODE = ~ ^ [0-9] + $]]; then echo "${CMD [@]}" | head-nMur11 > & 2 exit 1fiif [$LAUNCHER_EXIT_CODE! = 0]; then exit $LAUNCHER_EXIT_CODEfiCMD= ("${CMD [@]: 0:$LAST}") exec "${CMD [@]}" 2. Task detection and submit tasks to Spark

Check the execution mode (class or submit) to build cmd, check the parameters in submit (SparkSubmitOptionParser), build the command line and print it back to spark-class, and finally call exec to perform the spark command line submission task. The contents of the assembled cmd are as follows:

/ usr/local/java/jdk1.8.0_91/bin/java-cp/data/spark-1.6.0-bin-hadoop2.6/conf/:/data/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/data/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/data/spark-1.6.0 -bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/data/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/data/hadoop-2.6.5/etc/hadoop/-Xms1g-Xmx1g-Xdebug-Xrunjdwp:transport=dt_socket Execution of server=y,suspend=y,address=1234org.apache.spark.deploy.SparkSubmit--classorg.apache.spark.repl.Main--nameSpark shell--masterspark://localhost:7077--verbose/tool/jarDir/maven_scala-1.0-SNAPSHOT.jar3.SparkSubmit function

(1) after the Spark task is submitted, it executes the main method in SparkSubmit.

Def main (args: Array [String]): Unit = {val submit = new SparkSubmit () submit.doSubmit (args)}

(2) doSubmit () initializes log, adds spark task parameters, and executes tasks through parameter types:

Def doSubmit (args: Array [String]): Unit = {/ / Initialize logging if it hasn't been done yet. Keep track of whether logging needs to / / be reset before the application starts. Val uninitLog = initializeLogIfNecessary (true, silent = true) val appArgs = parseArguments (args) if (appArgs.verbose) {logInfo (appArgs.toString)} appArgs.action match {case SparkSubmitAction.SUBMIT = > submit (appArgs, uninitLog) case SparkSubmitAction.KILL = > kill (appArgs) case SparkSubmitAction.REQUEST_STATUS = > requestStatus (appArgs) case SparkSubmitAction.PRINT_VERSION = > printVersion ()}}

SUBMIT: submit the application using the supplied parameters

KILL (Standalone and Mesos cluster mode only): terminates a task through the REST protocol

REQUEST_STATUS (Standalone and Mesos cluster mode only): request the status of a submitted task through the REST protocol

PRINT_VERSION: outputting version information to log

(3) call the submit function:

Def doRunMain (): Unit = {if (args.proxyUser! = null) {val proxyUser = UserGroupInformation.createProxyUser (args.proxyUser, UserGroupInformation.getCurrentUser ()) try {proxyUser.doAs (new PrivilegedExceptionAction [Unit] () {override def run (): Unit = {runMain (args) UninitLog)}} catch {case e: Exception = > / / Hadoop's AuthorizationException suppresses the exception's stack trace, which / / makes the message printed to the output by the JVM not very helpful. Instead, / / detect exceptions with empty stack traces here, and treat them differently. If (e.getStackTrace (). Length = = 0) {error (s "ERROR: ${e.getClass () .getName ()}: ${e.getMessage ()}")} else {throw e} else {runMain (args, uninitLog)}}

DoRunMain prepares the parameters for the cluster call child main class, and then calls runMain () to execute the task invoke main

Spark will use a variety of different parameters and modes in job submission, and different branches will be selected for execution according to different parameters, so the required parameters will be passed to the execution function in the final submitted runMain.

The above is how the Spark-submit implementation process is, and the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.