How to deploy a Spark cluster 04/10 Update SLTechnology News&Howtos

How to deploy a Spark cluster

2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article introduces the knowledge of "how to deploy Spark clusters". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

This article will accept the deployment of Spark clusters, including no HA, Spark Standalone HA and ZooKeeper-based HA.

Environment: CentOS6.6, JDK1.7.0_80, turn off firewall, configure hosts and SSH password-free, Spark1.5.0

one。 No HA mode

1. The corresponding relationship between hostname and role:

Node1.zhch Master

Node2.zhch Slave

Node3.zhch Slave

two。 Extract the Spark deployment package (you can download the deployment package directly from the official website, or download the source code from the official website and compile the deployment package)

[yyl@node1 program] $tar-zxf spark-1.5.0-bin-2.5.2.tgz

3. Modify the configuration file

[yyl@node1 program] $cd spark-1.5.0-bin-2.5.2/conf/ [yyl@node1 conf] $cp slaves.template slaves [yyl@node1 conf] $vim slavesnode2.zhchnode3.zhch [yyl@node1 conf] $cp spark-env.sh.template spark-env.sh [yyl@node1 conf] $vim spark-env.shexport JAVA_HOME=/usr/lib/java/jdk1.7.0_80export SPARK_MASTER_IP=node1.zhchexport SPARK_MASTER_PORT=7077export SPARK_WORKER_CORES=1export SPARK_WORKER_INSTANCES=1export SPARK_WORKER_MEMORY=1g

Description:

SPARK_MASTER_IP: address of Master node

SPARK_MASTER_PORT: Master port number

SPARK_WORKER_CORES: the number of cores per worker, which is generally set to the number of CPU cores of the host

SPARK_WORKER_INSTANCES: the number of worker running on each host

SPARK_WORKER_MEMORY: the total amount of memory allowed by the Spark job (each job's own memory space is determined by the property spark.executor.memory)

4. Distribute Spark

[yyl@node1 program] $scp-rp spark-1.5.0-bin-2.5.2 node2.zhch:~/program/

[yyl@node1 program] $scp-rp spark-1.5.0-bin-2.5.2 node3.zhch:~/program/

5. Start and stop command

. / sbin/start-master.sh-start Master

. / sbin/start-slaves.sh-start all Slave

. / sbin/start-slave.sh spark://IP:PORT-start the native Slave

. / sbin/start-all.sh-start all Master and Slave

. / sbin/stop-master.sh-stop Master

. / sbin/stop-slaves.sh-stop all Slave

. / sbin/stop-slave.sh-stop the native Slave

. / sbin/stop-all.sh-stop all Master and Slave

. / bin/spark-shell-- master spark://IP:PORT-run Spark Shell

. / bin/spark-submit-- class packageName.MainClass-- master spark://IP:PORT path/jarName.jar-submit assignment

two。 Spark Standalone HA

You only need to modify the conf/spark-env.sh file without HA and add the following line:

Export SPARK_DAEMON_JAVA_OPTS= "- Dspark.deploy.recoveryMode=FILESYSTEM-Dspark.deploy.recoveryDirectory=/home/yyl/program/spark-1.5.0-bin-2.5.2/recovery"

Description:

Spark.deploy.recoveryMode-- FILESYSTEM, which means that the single-node recovery model based on file system is enabled. The default is NONE.

Spark.deploy.recoveryDirectory-- the directory where Spark saves the recovery state

three。 HA based on ZooKeeper

1. The corresponding relationship between hostname and role:

Node1.zhch Master 、 ZooKeeper

Node2.zhch Master 、 ZooKeeper

Node3.zhch Slave 、 ZooKeeper

Node4.zhch Slave

Node5.zhch Slave

two。 Install ZooKeeper cluster

3. Configuration

Compared with no HA mode, all Slave node addresses are still configured in the conf/slaves file. The difference is that SPARK_MASTER_IP is not configured in conf/spark-env.sh, but SPARK_DAEMON_JAVA_OPTS configuration is added, as follows:

Export SPARK_DAEMON_JAVA_OPTS= "- Dspark.deploy.recoveryMode=ZOOKEEPER-Dspark.deploy.zookeeper.url=node1.zhch:2181,node2.zhch:2181,node3.zhch:2181-Dspark.deploy.zookeeper.dir=/spark"

Description:

Spark.deploy.recoveryMode-- ZOOKEEPER, which means to enable HA based on ZooKeeper

Park.deploy.zookeeper.url-ZooKeeper URL

Spark.deploy.zookeeper.dir-the directory where ZooKeeper saves the recovery state. Default is / spark.

After configuring HA, since there are multiple master, list all the master where spark url is used, for example:

. / bin/spark-shell-- master spark://host1:port1,host2:port2,host3:port3

5. Verify HA

Start the Zookeeper cluster first; then run the sbin/start-all.sh command on the node1 node to start the Spark cluster. Finally, using the jps command to check, only the Master process is found on the node1 node, but not on the node2 node. In this case, you need to use the sbin/start-master.sh command on the node2 node to start the standby Master process, so Master HA is implemented.

After kill drops the master process on the node1 node:

This is the end of "how to deploy Spark clusters". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.