Spark 2.2.0 High availability build 04/19 Update SLTechnology News&Howtos

Spark 2.2.0 High availability build

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

I. Overview

1. The experimental environment is based on the previously built haoop HA.

The zookeeper environment required for 2.spark HA has been configured previously and will not be repeated here.

3. The required software packages are: scala-2.12.3.tgz, spark-2.2.0-bin-hadoop2.7.tar

4. Host planning

Bd1

Bd2

Bd3

Worker

Bd4

Bd5

Master 、 Worker

2. Configure Scala

1. Decompress and copy

[root@bd1 ~] # tar-zxf scala-2.12.3.tgz [root@bd1 ~] # cp-r scala-2.12.3 / usr/local/

two。 Configure environment variables

[root@bd1 ~] # vim / etc/profileexport SCALA_HOME=/usr/local/scalaexport PATH=:$SCALA_HOME/bin:$ path [root @ bd1 ~] # source / etc/profile

3. Verification

3. Configure Spark

1. Decompress and copy

[root@bd1 ~] # tar-zxf spark-2.2.0-bin-hadoop2.7.tgz [root@bd1 ~] # cp spark-2.2.0-bin-hadoop2.7 / usr/local/spark

two。 Configure environment variables

[root@bd1 ~] # vim / etc/profileexport SCALA_HOME=/usr/local/scalaexport PATH=:$SCALA_HOME/bin:$ path [root @ bd1 ~] # source / etc/profile

3. There is no need to copy the template to modify the spark-env.sh # file

[root@bd1 conf] # vim spark-env.shexport JAVA_HOME=/usr/local/jdkexport HADOOP_HOME=/usr/local/hadoopexport HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoopexport SCALA_HOME=/usr/local/scalaexport SPARK_DAEMON_JAVA_OPTS= "- Dspark.deploy.recoveryMode=ZOOKEEPER-Dspark.deploy.zookeeper.url=bd4:2181,bd5:2181-Dspark.deploy.zookeeper.dir=/spark" export SPARK_WORKER_MEMORY=1gexport SPARK_WORKER_CORES=2export SPARK_WORKER_INSTANCES=1

4. There is no need to copy the template to modify the spark-defaults.conf # file

[root@bd1 conf] # vim spark-defaults.confspark.master spark://master:7077spark.eventLog.enabled truespark.eventLog.dir hdfs://master:/user/spark/historyspark.serializer org.apache.spark.serializer.KryoSerializer

5. Create a new log file directory in the HDFS file system

Hdfs dfs-mkdir-p / user/spark/historyhdfs dfs-chmod 777 / user/spark/history

6. Modify slaves

[root@bd1 conf] # vim slavesbd1bd2bd3bd4bd5

Fourth, synchronize to other hosts

1. Use scp to synchronize Scala to bd2-bd5

Scp-r / usr/local/scala root@bd2:/usr/local/scp-r / usr/local/scala root@bd3:/usr/local/scp-r / usr/local/scala root@bd4:/usr/local/scp-r / usr/local/scala root@bd5:/usr/local/

two。 Synchronize Spark to bd2-bd5

Scp-r / usr/local/spark root@bd2:/usr/local/scp-r / usr/local/spark root@bd3:/usr/local/scp-r / usr/local/spark root@bd4:/usr/local/scp-r / usr/local/spark root@bd5:/usr/local/

Start the cluster and test the HA

1. The startup sequence is: zookeeper-- > hadoop-- > spark

two。 Start spark

Bd4:

[root@bd4 sbin] # cd / usr/local/spark/sbin/ [root@bd4 sbin] #. / start-all.sh starting org.apache.spark.deploy.master.Master, logging to / usr/local/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-bd4.outbd4: starting org.apache.spark.deploy.worker.Worker Logging to / usr/local/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-bd4.outbd2: starting org.apache.spark.deploy.worker.Worker, logging to / usr/local/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-bd2.outbd3: starting org.apache.spark.deploy.worker.Worker Logging to / usr/local/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-bd3.outbd5: starting org.apache.spark.deploy.worker.Worker, logging to / usr/local/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-bd5.outbd1: starting org.apache.spark.deploy.worker.Worker Logging to / usr/local/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-bd1.out [root@bd4 sbin] # jps3153 DataNode7235 Jps3046 JournalNode7017 Master3290 NodeManager7116 Worker2958 QuorumPeerMain

Bd5:

[root@bd5 sbin] #. / start-master.sh starting org.apache.spark.deploy.master.Master, logging to / usr/local/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-bd5.out [root@bd5 sbin] # jps3584 NodeManager5602 RunJar3251 QuorumPeerMain8564 Master3447 DataNode8649 Jps8474 Worker3340 JournalNode

3. Stop the Master process of bd4

[root@bd4 sbin] # kill-9 7017 [root@bd4 sbin] # jps3153 DataNode7282 Jps3046 JournalNode3290 NodeManager7116 Worker2958 QuorumPeerMain

V. Summary

At first I wanted to put Master on bd1 and bd2, but when I started Spark, I found that both nodes were Standby. Then modify the configuration file and transfer it to bd4 and bd5 before running smoothly. In other words, the Master of Spark HA must be on the Zookeeper cluster to function properly, that is, there must be a JournalNode process on that node.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.