How to install Spark and configure its environment 07/01 Update SLTechnology News&Howtos

How to install Spark and configure its environment

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces how to install Spark and how to configure the environment, which has a certain reference value. Interested friends can refer to it. I hope you can learn a lot after reading this article.

1. Download Apache spark

Enter the URL in the browser

Https://spark.apache.org/downloads.html goes to the download page of spark, as shown in the following figure:

It should be noted when downloading that when step 2 "choose a package type" after selecting the spark version in step 1, the spark version must be used in conjunction with the hadoop version. Because spark reads the contents of the hdfs file and the spark program runs on HadoopYARN. So we have to choose package type according to the version of hadoop we currently install. The version of hadoop we currently use is hadoop2.7.5, so we chose Pre-built for Apache Hadoop 2.7and later.

Click the link after step 3 Download Spark

Spark-2.1.2-bin-hadoop2.7.tgz goes to the page shown in the following figure. In China, we generally choose Tsinghua server to download, this download speed is relatively fast, connection address

2. Install spark

Through WinSCP

Upload spark-2.1.2-bin-hadoop2.7.tgz to the Downloads directory of the master virtual machine, then extract it to the user's home directory and change the decompressed file name (the file name is shorter and easier to operate). The decompression process will take some time. Wait patiently.

After decompressing, use the ls command to view the current user's home directory, and add the spark-2.1.2-bin-hadoop2.7 file directory as shown in the following figure.

Change the name of spark-2.1.2-bin-hadoop2.7 to spark through the mv command

3. Configure spark environment variables

Edit environment variables with the command vim .bashrc

Add the following at the end of the file, then save and exit

Reload the environment variable configuration file to make the new configuration take effect (for the current terminal only, if the new environment variable of the exit terminal still does not take effect, it will take effect permanently after restarting the virtual machine system)

Use spark-shell to show whether spark is installed correctly. Spark-shell is an interactive scala REPL interpreter with some spark features, as shown in the following figure. Spark-related information such as version will be printed during startup.

Exit spark-shell using the command: quit

4. Install spark on other nodes

After the master node is installed, you only need to copy the spark file directory and .bashrc file to other nodes. The specific operation command can be operated as shown below.

Finally, restart slave1 and slave2 to make the configuration file effective. At this point, the spark installation is complete, and the next step is to configure the spark-related configuration files according to the spark operation mode to make the cluster work.

5. Configure spark related files

Step 1: spark-env.sh file

Configure the Spark settings determined by the environment variable. The environment variables are read from the conf/spark-env.sh script under the Spark installation directory.

You can set the following variables in spark-env.sh:

Spark related configuration

First, open the three virtual machines master, slave1, and slave2, then configure them on the master host, and then send the spark/conf to other nodes after the configuration is completed.

Let's jump to the spark/conf directory to see which files we need to configure. The following figure shows that we can view the list of files through the ls command. We mainly use spark-env.sh.template and slaves.template today, and we can also use log4j.properties.template to modify the output information.

Note that when Spark is installed, conf/spark-env.sh does not exist by default. You can copy it.

Conf/spark-env.sh.template creates it.

Through the vim editor to edit spark-env.sh, in the terminal we can only enter the first few letters and then press the tab key to automatically complete for us.

Add the following at the end of the file, save and exit

Step 2: log4j.properties

Spark will print out a lot of log information during startup. If we only want to see warnings or errors instead of general information, you can set it in log4j.properties. The same spark provides us with a template file that needs to be copied out of the log4j.properties through a template.

The setting method is to change the second line of the file INFO to WARN

After the changes are completed, the contents of the file are shown in the following figure. Remember to save and exit.

Step 3: slaves file

The main function of the slaves file is to tell the spark cluster which nodes are worker. Here, the slaves file also needs to be copied from the template file, as shown in the following figure.

To edit a slaves using the vim editor

Enter the following in the file to indicate that the work nodes are slave1 and slave2, save and exit.

Finally, move the spark/conf directory to the slave1 slave2 node spark directory, as shown in the following figure

You can start the cluster now. Start the hadoop cluster first (or without hadoop, but in practice, most spark will still use hadoop's resource management YARN), and then start the spark cluster, as shown below.

Looking at the started process through jps, the process of spark on the master node is Master, and the spark-related process on the slave node is Worker.

Stop the spark cluster when you stop the cluster

Then stop the hadoop cluster

Thank you for reading this article carefully. I hope the article "how to install Spark and configure the environment" shared by the editor will be helpful to you. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.