How to submit tasks to Spark Standalone cluster and monitor by Java Web 07/04 Update SLTechnology News&Howtos

How to submit tasks to Spark Standalone cluster and monitor by Java Web

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly explains "how to submit tasks to the Spark Standalone cluster and monitor". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to submit tasks to the Spark Standalone cluster and monitor".

1. Environment

Software version remarks

IDEA 14.1.5

JDK 1.8

Spark 1.6.0 Project maven reference

Spark cdh6.7.3-spark1.6.0 actual Cluster 5.7.3-1.cdh6.7.3.p0.5

Hadoop 2.6.4 Project Maven reference

Hadoop 2.6.0-cdh6.7.3 actual cluster parameters

Maven 3.3

two。 Project download path

The address of the project on GitHub is: javaweb_spark_standalone_monitor

3. Spark task submission process

Having done related work before, I know that tasks can be submitted to the Spark Standalone cluster in the following ways:

String [] arg0=new String [] {

"--master", "spark://server2.tipdm.com:6066"

"--deploy-mode", "cluster"

"- name", appName

"- class", className

"--executor-memory", "2G"

"--total-executor-cores", "10"

"--executor-cores", "2"

Path

"/ user/root/a.txt"

"/ tmp/" + System.currentTimeMillis ()

}

SparkSubmit.main (arg0)

1. It is important to note that the mode used here is cluster, not client, which means that the driver program is also running in the cluster, not the committed client, that is, I am Win10 local.

two。 If you need to use client commit, you need to pay attention to whether the local resources are sufficient; and because you are using cluster here, you need to ensure that the cluster resources can run both driver and executor (that is, at least two Container need to be run at the same time)

3. The path, that is, the jar package you typed, needs to be placed in the corresponding location in each slave node of the cluster. For example, if there is node1,node2,node3 in the lz cluster, you need to put wc.jar on these three nodes, such as / tmp/wc.jar, then the setting of path should be set to file:/opt/wc.jar. If you use / opt/wc.jar directly, it will be parsed to file:/c:/opt/wc.jar when parsing parameters (because lz uses win10 to run Tomcat), thus reporting errors that cannot be found in the jar package file!

When you enter the SparkSubmit.main source code, you can see the following code:

Def main (args: Array [String]): Unit = {

Val appArgs = new SparkSubmitArguments (args)

If (appArgs.verbose) {

/ / scalastyle:off println

PrintStream.println (appArgs)

/ / scalastyle:on println

}

AppArgs.action match {

Case SparkSubmitAction.SUBMIT = > submit (appArgs)

Case SparkSubmitAction.KILL = > kill (appArgs)

Case SparkSubmitAction.REQUEST_STATUS = > requestStatus (appArgs)

}

In the code, the task is submitted through submit, and if you go down this line, it will eventually be submitted through the

MainMethod.invoke is called through reflection, which can be obtained through debug, which is actually called by reflection: the main function of RestSubmissionClient submits the task.

So here you can imitate RestSubmissionClient to submit the task. The procedure is as follows:

Public static String submit (String appResource,String mainClass,String... args) {

SparkConf sparkConf = new SparkConf ()

/ / the following is written with reference to the Debug information submitted by the task in real time

SparkConf.setMaster (MASTER)

SparkConf.setAppName (APPNAME+ "" + System.currentTimeMillis ())

SparkConf.set ("spark.executor.cores", "2")

SparkConf.set ("spark.submit.deployMode", "cluster")

SparkConf.set ("spark.jars", appResource)

SparkConf.set ("spark.executor.memory", "2G")

SparkConf.set ("spark.cores.max", "2")

SparkConf.set ("spark.driver.supervise", "false")

Map env = filterSystemEnvironment (System.getenv ())

CreateSubmissionResponse response = null

Try {

Response = (CreateSubmissionResponse)

RestSubmissionClient.run (appResource, mainClass, args, sparkConf, toScalaMap (env))

} catch (Exception e) {

E.printStackTrace ()

Return null

}

Return response.submissionId ()

}

If you don't add one of them

SparkConf.set

...

Then there will be problems with the program, and the first mistake is:

Java.lang.IllegalArgumentException: Invalid environment variable name: "=::"

This error is due to an exception during parameter matching because the mode is set incorrectly (the cluster mode is not set). The parameters you can see are shown in the following figure:

The corresponding parameter in this is actually the value corresponding to the task submitted by SparkSubmit.

4. Problems and problem solving

The question is:

1. Recently, why submit tasks to YARN when you want to run Spark, and through practice, it is found that programs submitted to YARN run much slower than Spark Standalone, so can you submit tasks directly to the Spark Standalone cluster?

two。 After submitting a task to the Spark Standalone cluster, how to get the id of the task for later monitoring?

3. After getting the task id, how to monitor it?

The answers to these three questions are as follows:

1. The first question should be a matter of opinion. The use of SparkONYARN can unify the biosphere or something.

two。 In the above code, you can already submit the task and get the task ID. It is important to note, however, that through:

Response = (CreateSubmissionResponse)

RestSubmissionClient.run (appResource, mainClass, args, sparkConf, toScalaMap (env))

The acquired response needs to be transformed into CreateSubmissionResponse to get the submittedId, but to access the CreateSubmissionResponse, it needs to be under some packages, so the SparkEngine class of lz is defined in the org.apache.spark.deploy.rest package.

Third:

Monitoring, monitoring is even easier. You can refer to:

Private def requestStatus (args: SparkSubmitArguments): Unit = {

New RestSubmissionClient (args.master)

.requestSubmissionStatus (args.submissionToRequestStatusFor)

}

Thank you for your reading. the above is the content of "how Java Web submits tasks to Spark Standalone cluster and monitor". After the study of this article, I believe you have a deeper understanding of how Java Web submits tasks to Spark Standalone cluster and monitor this problem, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.