In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Spark startup code read:
Spark uses a series of shell scripts as an entry: under the bin directory are the scripts for task submission; and the sbin directory is the scripts related to master and worker startup and shutdown.
All scripts end up calling java (scala) code by calling bin/spark-class.
-- spark-class acquires java parameters and starts analysis--
The code processing flow of spark-class:
Call org.apache.spark.launcher.Main and substitute the passed parameters to get the specific parameters:
("$RUNNER"-cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@") replace the parameter, that is, execute:
/ usr/java/jdk/bin/java-cp / home/xxx/spark/lib/spark-assembly-1.5.2-hadoop2.5.0-cdh6.3.2.jar org.apache.spark.launcher.Main org.apache.spark.deploy.SparkSubmit-class com.xxx.xxxx.stat.core.Main-master spark://xxxx1:7077 Xxxx2:7077-- executor-memory 2G-- driver-memory 5G-- total-executor-cores 10 / home/xxx/xxxxxxx/bigdata-xxxxxxx.jar com.xxx.xxxx.stat.xxx.XXXXJob 20180527 20180528
This line of code returns:
/ usr/java/jdk/bin/java-cp / home/xxx/spark/libext/*:/home/xxx/spark/conf/:/home/xxx/spark/lib/spark-assembly-1.5.2-hadoop2.5.0-cdh6.3.2.jar:/home/xxx/spark/lib/datanucleus-api-jdo-3.2.6.jar:/home/xxx/spark/lib/datanucleus-core-3.2.10.jar:/home/xxx/spark/lib/datanucleus -rdbms-3.2.9.jar:/home/xxx/yarn/etc/hadoop-DLOG_LEVEL=INFO-DROLE_NAME=console-Xms5G-Xmx5G-Xss32M-XX:PermSize=128M-XX:MaxPermSize=512M org.apache.spark.deploy.SparkSubmit-master spark://xxxx1:7077 Xxxx2:7077-conf spark.driver.memory=5G-class com.xxx.xxxx.stat.core.Main-executor-memory 2G-total-executor-cores 10 / home/xxx/xxxxxxx/bigdata-xxxxxxx.jar com.xxx.xxxx.stat.xxx.XXXXJob 20180527 20180528
You can see that the main purpose of the org.apache.spark.launcher.Main class is to populate the parameters of the final executed java command. Including stack parameters of classpath, java commands, and so on. The following analysis of the implementation process:
1.org.apache.spark.launcher.Main is a separate path that is not part of core.
The call to org.apache.spark.deploy.SparkSubmit by the class 2.org.apache.spark.launcher.Main creates builder = new SparkSubmitCommandBuilder (args), which is used to normalize the java parameter of the sparksubmit command.
3. For other calls (which should be mainly master and worker calls), builder = new SparkClassCommandBuilder (className, args) is created; used to normalize the java parameters of other commands.
In both cases, the parameters of the java code are generated in two steps by initializing and then calling the buildCommand method of builder.
SparkSubmitCommandBuilder parses the parameters by calling the inner class class OptionParser extends SparkSubmitOptionParser.
Classpath is mainly in
BuildCommand
BuildSparkSubmitCommand
BuildJavaCommand (in parent class)
Get it in a variety of places that may contain classpath.
-Xms5G-Xmx5G these two parameters are obtained by parsing SPARK_DRIVER_MEMORY (spark.driver.memory) in the parameters in buildSparkSubmitCommand
-Xss32M-XX:PermSize=128M-XX:MaxPermSize=512M is obtained through addPermGenSizeOpt (cmd); parsing the configuration item spark.driver.extraJavaOptions in the configuration file (DRIVER_EXTRA_JAVA_OPTIONS = "spark.driver.extraJavaOptions";).
-- spark-class acquires java parameters. End of analysis
First, analyze the frequently used bin/spark-submit (spark task submission, driver process startup), sbin/start-master.sh (background startup, master process startup) and sbin/start-slave.sh (background startup, worker process startup). The startup codes are all in spark-1.5.2\ core\ src\ main\ scala\ org\ apache\ spark\ deploy:
Spark-submit
Class name called through spark-class: org.apache.spark.deploy.SparkSubmit
Let's analyze how driver is started, and analyze the class SparkSubmit.scala.
/ usr/java/jdk/bin/java-cp / home/xxx/spark/libext/*:/home/xxx/spark/conf/:/home/xxx/spark/lib/spark-assembly-1.5.2-hadoop2.5.0-cdh6.3.2.jar:/home/xxx/spark/lib/datanucleus-api-jdo-3.2.6.jar:/home/xxx/spark/lib/datanucleus-core-3.2.10.jar:/home/xxx/spark/lib/datanucleus -rdbms-3.2.9.jar:/home/xxx/yarn/etc/hadoop-DLOG_LEVEL=INFO-DROLE_NAME=console-Xms5G-Xmx5G-Xss32M-XX:PermSize=128M-XX:MaxPermSize=512M org.apache.spark.deploy.SparkSubmit-master spark://xxxx1:7077 Xxxx2:7077-conf spark.driver.memory=5G-class com.xxx.xxxx.stat.core.Main-executor-memory 2G-total-executor-cores 10 / home/xxx/xxxxxxx/bigdata-xxxxxxx.jar com.xxx.xxxx.stat.xxx.XXXXJob 20180527 20180528
The main parameters passed are:
-- conf spark.driver.memory=5G
-- class com.xxx.xxxx.stat.core.ExcuteMain
-- executor-memory 2G
-- total-executor-cores 10
/ home/xxx/xxxxxxx/bigdata-xxxxxxx.jar
Com.xxx.xxxx.stat.xxx.XXXXJob
20180527
20180528
The Main function of SparkSubmit gets the further parsed parameters through val appArgs = new SparkSubmitArguments (args), and then calls submit (appArgs) to achieve the commit.
The SparkSubmitArguments class first calls the parse of org.apache.spark.launcher.SparkSubmitOptionParser called by org.apache.spark.launcher.Main above to parse the parameters, and then calls loadEnvironmentArguments to resolve or assign default values to the parameters that may be configured in the environment. Finally, assign the default value of SUBMIT to the action parameter:
Action = Option (action) .getOrElse (SUBMIT)
Let's take a look at the process of submit's method:
Val (childArgs, childClasspath, sysProps, childMainClass) = prepareSubmitEnvironment (args)
a. Define the driver code classes childArgs, childClasspath, sysProps, childMainClass. It can be understood that submit submits a lot of information, such as the number of cores used (the corresponding number of executor), the number of memory used per core, and the driver code executed (driver code is also regarded as a kind of submitted content)
b. Define the cluster management clusterManager and distinguish it according to the prefix of master. Yarn,spark,mesos,local
c. If you define the submission mode deployMode,CLIENT, the driver is on the current machine; CLUSTER uses a worker as the driver.
d. The following code handles the special case of the combination of clusterManager, deployMode, and python (R). We focus on the standalone mode.
e. Fill each parameter into the options variable.
f. For if (deployMode = = CLIENT) {fill in four parameters. ChildMainClass = args.mainClass is populated directly, and it is called by runMain directly in sparkSubmit.
f. For isStandaloneCluster mode (standalone and cluster mode), distinguish between legacy and rest to start a client to adhere to dirver
Populate org.apache.spark.deploy.rest.RestSubmissionClient to childMainClass in rest mode
Legacy mode: org.apache.spark.deploy.Client is populated to childMainClass
Execute the above class in sparksubmit and pass args.mainClass as an argument to the above class.
g. The spark.driver.host parameter is ignored for cluster mode.
h. Returns four parameters
Explanation of four parameters:
This returns a 4-tuple:
(1) the arguments for the child process
(2) a list of classpath entries for the child
(3) a map of system properties, and
(4) the main class for the child calls doRunMain
RunMain
Call the submitted childMainClass
MainClass = Utils.classForName (childMainClass)
Val mainMethod = mainClass.getMethod ("main", new ArrayString.getClass)
MainMethod.invoke (null, childArgs.toArray)
Start-master.sh (spark-daemon.sh is called in the middle)
Class name called through spark-class: org.apache.spark.deploy.master.Master
The parameters with which the call is made:
Start-slave.sh (spark-daemon.sh is called in the middle)
Class name called through spark-class: org.apache.spark.deploy.worker.Worker
The parameters with which the call is made:
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 253
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.