What are the ways to configure the Spark property? 04/20 Update SLTechnology News&Howtos

What are the ways to configure the Spark property?

2025-04-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about the configuration of Spark properties, which may not be well understood by many people. In order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.

1. Spark properties: this controls most of the properties of the application. And can be set through the SparkConf object or the Java system property

2. Environment variable (Environment variables): this can be set separately for each machine, such as IP. This can be set in the $SPARK_HOME/ conf/spark-env.sh script on each machine.

3. Log: all log-related properties can be set in the log4j.properties file.

These three attribute settings are described in detail below.

1. Spark properties

Spark properties can control most of the properties of an application and can be set separately on each application. These properties can be set directly on the SparkConf object, which can be passed to SparkContext. The SparkConf object allows you to set some general properties (such as master URL, the name of the application, etc.) that can be passed to any key-value pair of the set () method. As follows:

Valconf = new SparkConf () .setMaster ("local") .setAppName ("CountingSheep") .set ("spark.executor.memory", "1g") valsc = new SparkContext (conf)

Dynamically load Spark properties

In some scenarios, you may want to avoid setting the properties of the SparkConf object to death in your code; for example, you may want to run your application on different master or with different memory capacity. This requires you to set it when you run the program. Spark allows you to create an empty conf object, as follows:

Valsc = new SparkContext (newSparkConf ())

Then you can configure some properties at run time from the command line:

. / bin/spark-submit-- name "My app"-- master local [4]-- conf spark.shuffle.spill=false-- conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails-XX:+PrintGCTimeStamps" myApp.jar

The Spark shell and spark-submit tools support two ways to load configuration properties dynamically. The first is a command-line approach, such as the-- master;spark-submit tool can receive any Spark attribute through the-- conf tag. Run. / bin/spark-submit-- help will display all the options.

The. / bin/spark-submit tool also reads configuration options from the conf/spark-defaults.conf configuration file. In the conf/spark-defaults.conf configuration file, each line is a key-value pair, which can be split by a space or directly by an equal sign. As follows:

Spark.master spark://iteblog.com:7077spark.executor.memory 512mspark.eventLog.enabled truespark.serializer org.apache.spark.serializer.KryoSerializ

Each value is passed as a flags to the application and merged with the corresponding properties in the SparkConf object. The properties configured through the SparkConf object have the highest priority, followed by the flags configuration of spark-submit or spark-shell, and finally the configuration in the spark-defaults.conf file.

Where can I view the configured Spark properties

All the Spark configuration options for the application will be displayed under the Environment tag on the corresponding WEB UI (http://:4040) for the application. It is very useful when you want to make sure that your configuration is correct. It is important to note that only properties that are configured through spark-defaults.conf or SparkConf will be displayed on that page. For all other properties that are not displayed, you can assume that the values of these properties are the default.

Second, environmental variables

There are a large number of Spark settings that can be set through environment variables. These environment variables are set in the conf/spark-env.sh script file (if you are a windows system, the file name is conf/spark-env.cmd). In Standalone and Mesos mode, this file can set some machine-related information (such as hostname).

It is important to note that the conf/spark-env.sh file does not exist in the Spark you just installed. But you can create it by copying the conf/spark-env.sh.template file, and you need to make sure that the copied file is runnable.

The following properties can be configured in the conf/spark-env.sh file

The installation directory of JAVA_HOME Java, PYSPARK_PYTHON Python binary executable to use for PySpark.SPARK_LOCAL_IP IP address of the machine to bind to.SPARK_PUBLIC_DNS Hostname your Spark program will advertise to other machines.

For the standalone mode cluster, in addition to the above properties can be configured, there are many properties can be configured, I will not say specific, read the documentation.

III. Log configuration

Spark uses log4j to log. You can configure log4j.properties to set the level and location of different logs. This file does not exist by default, and you can get it by copying the log4j.properties.template file.

After reading the above, do you have any further understanding of how the Spark properties are configured? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.