Spark Series (7)-- Building Spark High availability Cluster based on ZooKeeper 07/02 Update SLTechnology News&Howtos

Spark Series (7)-- Building Spark High availability Cluster based on ZooKeeper

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

I. Cluster planning

A 3-node Spark cluster is built here, in which Worker services are deployed on all three hosts. At the same time, in order to ensure high availability, in addition to deploying the primary Master service on the upper part of the hadoop001, the standby Master service is also deployed on hadoop002 and hadoop003 respectively. The Master service is coordinated and managed by the Zookeeper cluster. If the primary Master is not available, the standby Master will become the new primary Master.

II. Pre-conditions

Before building a Spark cluster, you need to make sure that the JDK environment, Zookeeper cluster and Hadoop cluster have been set up. For more information, please see:

JDK installation Zookeeper stand-alone environment and cluster environment under Linux environment build Hadoop cluster environment III. Spark cluster build 3.1 download decompress

Download the required version of Spark from the official website: http://spark.apache.org/downloads.html

Decompress it after download:

# tar-zxvf spark-2.2.3-bin-hadoop2.6.tgz3.2 configuration environment variable # vim / etc/profile

Add environment variables:

Export SPARK_HOME=/usr/app/spark-2.2.3-bin-hadoop2.6export PATH=$ {SPARK_HOME} / bin:$PATH

Make the configured environment variables take effect immediately:

# source / etc/profile3.3 cluster configuration

Enter the ${SPARK_HOME} / conf directory and copy the configuration sample to modify:

1. Spark-env.sh cp spark-env.sh.template spark-env.sh# configuration JDK installation location JAVA_HOME=/usr/java/jdk1.8.0_201# configuration hadoop configuration file location HADOOP_CONF_DIR=/usr/app/hadoop-2.6.0-cdh6.15.2/etc/hadoop# configuration zookeeper address SPARK_DAEMON_JAVA_OPTS= "- Dspark.deploy.recoveryMode=ZOOKEEPER-Dspark.deploy.zookeeper.url=hadoop001:2181,hadoop002:2181 Hadoop003:2181-Dspark.deploy.zookeeper.dir=/spark "2. Slavescp slaves.template slaves

Configure the location of all Woker nodes:

Hadoop001hadoop002hadoop0033.4 installation package distribution

Distribute the installation package of Spark to other servers, and it is recommended that you also configure the environment variables of Spark on these two servers.

Scp-r / usr/app/spark-2.4.0-bin-hadoop2.6/ hadoop002:usr/app/scp-r / usr/app/spark-2.4.0-bin-hadoop2.6/ hadoop003:usr/app/ 4. Start the cluster 4.1 start the ZooKeeper cluster

Start the ZooKeeper service on the three servers:

ZkServer.sh start4.2 starts Hadoop cluster # starts dfs service start-dfs.sh# starts yarn service start-yarn.sh4.3 starts Spark cluster

Go to the ${SPARK_HOME} / sbin directory of hadoop001 and execute the following command to start the cluster. After the command is executed, the Maser service is started on hadoop001 and the Worker service is started on all nodes configured in the slaves configuration file.

Start-all.sh

Start the standby Master service by executing the following command on hadoop002 and hadoop003, respectively:

# execute start-master.sh4.4 View Service under ${SPARK_HOME} / sbin

Check the Web-UI page of Spark, port 8080. At this point, you can see that the Master node on hadoop001 is in the ALIVE state, and there are three available Worker nodes.

The Master nodes on hadoop002 and hadoop003 are in the STANDBY state and there are no Worker nodes available.

Verify that the cluster is highly available

At this point, you can use the kill command to kill the Master process on hadoop001, and one of the standby Master will become the primary Master again. Here is hadoop002, and you can see that the Master on hadoop2 becomes the new primary Master after RECOVERING, and gets all the available Workers.

The Master on Hadoop002 becomes the primary Master and gets all the available Workers.

At this point, if you start the Master service using start-master.sh on hadoop001 again, it will exist as a backup Master.

VI. Submit homework

It is exactly the same as the command submitted to Yarn in a stand-alone environment. Here, take the sample program built in Spark to calculate Pi as an example, and submit the command as follows:

Spark-submit\-class org.apache.spark.examples.SparkPi\-master yarn\-deploy-mode client\-executor-memory 1G\-num-executors 10 / usr/app/spark-2.4.0-bin-hadoop2.6/examples/jars/spark-examples_2.11-2.4.0.jar\ 100

For more articles in big data's series, please see the GitHub Open Source Project: big data's getting started Guide.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.