Spark distributed cluster installation 07/15 Update SLTechnology News&Howtos

Spark distributed cluster installation

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

The first step: the choice of version:

Spark-0.x

Spark-1.x (mainstream: Spark-1.3 and Spark-1.6)

Spark-2.x (latest Spark-2.4)

Download address: http://spark.apache.org/downloads.html (official website)

Other mirror sites: https://mirrors.tuna.tsinghua.edu.cn/apache/spark/

Https://www.apache.org/dyn/closer.lua/spark/spark-2.3.0/

Https://www.apache.org/dyn/closer.lua/spark/

Note that my choice here is: spark-2.3.0-bin-hadoop2.7.tgz.

Step 2: about setting up the environment of spark cluster:

The underlying spark is written in the Scala language, so here you need to install the environment for scala and configure the environment variables for scala.

Both scala and spark also need jdk, so we also need to configure jdk's environment and environment variables, and the version of jdk is preferably java 8 +.

Here we use spark-2.3.

Note: due to the simplicity of installation, the installation of java and scala is skipped at this time.

Reprint: https://www.cnblogs.com/liugh/p/6623530.html (install java under Linux)

Reprint: https://www.cnblogs.com/freeweb/p/5623795.html (install scala under Linux)

The third step: the planning of spark cluster:

Server

Master

Worker

Hostname01

√

Hostname02

√

Hostname03

√

Step 4: specific cluster installation:

① uploads and downloads the spark installation package to any node of the cluster (due to different personal tastes, the way the software is uploaded here is also different. The author uses Xshell)

Extract the ② and place it under a unified directory (note that this directory must have read and write permissions): tar zxvf spark-2.3.2-bin-hadoop2.7.tgz-C / application/

③ enters the conf directory of the corresponding spark: cd $SPARK_HOME/conf:

[user01@hostname01 ~] $mv spark-env.sh.template spark-env.sh

[user01@hostname01 conf] $vim spark-env.sh (add the following configuration)

Export JAVA_HOME=/application/jdk1.8.0_73

Export SPARK_MASTER_HOST=hostname01

Export SPARK_MASTER_PORT=7077

④ modifies $SPARK_HOME/conf/slaves (the host or IP of the slave node in which the cluster is added. Here I regard hostname02 and hostname03 as slave nodes)

Hostname02

Hostname03

Note: the configuration here, do not use any extra spaces and blank lines!

⑤ copy the spark installation package to other nodes in the cluster

Scp-r / application/spark-2.3.2-bin-hadoop2.7 hostname02: / application

Scp-r / application/spark-2.3.2-bin-hadoop2.7 hostname03: / application

Note: since there are not many nodes in the cluster here, you can enter the password manually when distributing the installation package. It is recommended to configure SSH key login.

Reprint: https://blog.csdn.net/furzoom/article/details/79139570

⑥ configure spark environment variables: (note that all cluster nodes need to be configured here. Of course, where to configure depends on different requirements)

I configure it here in / etc/profile: (since the permission setting of sudo has been made in advance, / etc/profile can still be modified under ordinary users)

Export SPARK_HOME=/application/spark-2.3.2-bin-hadoop2.7

PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin # Note that both bin and sbin should be configured here

⑦ finally starts the cluster:

[user01@hostname01 ~] $/ application/spark-2.3.2-bin-hadoop2.7/sbin/start-all.sh

Remember: if the cluster has a hadoop cluster, there is also a start-all.sh command in the sbin directory of hadoop, so you can only use the full path here.

Step 5: test whether the startup is successful

The first method:

Use the jps command to view the process: master is the master node of the cluster, and worker is the slave node of the cluster:

The second way: check the web UI interface:

In the end, any of the above situations indicates that the cluster has been built successfully. What is shared here is a distributed cluster, a HA cluster, which requires complex steps.

And the zookeeper component is required.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.