Spark Cluster deployment (MasterHA) 04/27 Update SLTechnology News&Howtos

Spark Cluster deployment (MasterHA)

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

one。 prerequisite

Quote

Spark Standalone cluster is the cluster mode of Master-Slaves architecture. Like most Master-Slaves clusters, there exists the problem of Master single point failure. How to solve this single point of failure problem, Spark provides two solutions:

Single point of recovery based on file system (Single-Node Recovery with Local File System)

Standby Masters based on zookeeper (Standby Masters with ZooKeeper)

ZooKeeper provides a Leader Election mechanism that ensures that although there are multiple Master in the cluster, only one is Active and the rest is Standby. When the Master of the Active fails, another Standby Master is elected. Because the information of the cluster, including Worker, Driver and Application, has been persisted to the file system, it will only affect the submission of the new Job during the switching process, but will not have any impact on the ongoing Job. The overall structure of the cluster that joins ZooKeeper

Zookeeper cluster is running normally

two。 Deployment step download the Spark package wget http://mirrors.shu.edu.cn/apache/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz extract and rename the tar-zxvf spark-2.4.0-bin-hadoop2.7.tgz-C / optmv spark-2.4.0-bin-hadoop2.7 spark-2.4.0 configuration environment variable

/ etc/profileexport JAVA_HOME=/usr/lib/jdk1.8.0_172export CLASSPATH=$ {JAVA_HOME} / jre/lib:$ {JAVA_HOME} / libexport HADOOP_HOME=/opt/hadoop-2.7.6export SPARK_HOME=/opt/spark-2.4.0export PATH=$ {JAVA_HOME} / bin:$HADOOP_HOME/bin:$SPARK_HOME/bin:$PATH

Modify machine name

Hostnamectl set-hostname res-spark-0001

Execute the command to make the environment variable effective

Source / etc/profile modify configuration file cd / opt/spark-2.4.0/confcp log4j.properties.template log4j.propertiescp slaves.template slavescp spark-env.sh.template spark-env.shcp spark-defaults.conf.template spark-defaults.conf

4.1 slaves

Res-spark-0003res-spark-0004res-spark-0005

4.2 spark-defaults.conf

Spark.deploy.recoveryMode ZOOKEEPERspark.deploy.zookeeper.url res-spark-0001:2181,res-spark-0002:2181,res-spark-0003:2181spark.master spark://res-spark-0001:7077spark.eventLog.enabled truespark.eventLog.dir hdfs://cluster1/spark/eventLogspark.shuffle.service.enabled true

4.3 spark-env.sh

Export JAVA_HOME=/usr/lib/jdk1.8.0_172export HADOOP_HOME=/opt/hadoop-2.7.6export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopexport YARN_CONF_DIR=$HADOOP_HOME/etc/hadoopexport SPARK_HOME=/opt/spark-2.4.0export SPARK_WORKER_CORES=6export SPARK_WORKER_MEMORY=24g

4.4 log4j.properties

# # Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements. See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2. (the "License"); you may not use this file except in compliance with# the License. You may obtain a copy of the License at## http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "ASIS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.## Set everything to be logged to the consolelog4j.rootCategory=INFO Consolelog4j.appender.console=org.apache.log4j.ConsoleAppenderlog4j.appender.console.target=System.errlog4j.appender.console.layout=org.apache.log4j.PatternLayoutlog4j.appender.console.layout.ConversionPattern=%d {yy/MM/dd HH:mm:ss}% p% c {1}:% m%n# Set the default spark-shell log level to WARN. When running the spark-shell, the# log level for this class is used to overwrite the root logger's log level So that# the user can have different defaults for the shell and regular Spark apps.log4j.logger.org.apache.spark.repl.Main=WARN# Settings to quiet third party logs that are too verboselog4j.logger.org.spark_project.jetty=WARNlog4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERRORlog4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFOlog4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFOlog4j.logger.org.apache.parquet=ERRORlog4j.logger.parquet=ERROR# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL With Hive supportlog4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATALlog4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

Distribute spark programs and configuration files to other nodes

Scp-r / opt/spark-2.4.0 res-spark-0002:/optscp-r / opt/spark-2.4.0 res-spark-0003:/optscp-r / opt/spark-2.4.0 res-spark-0004:/optscp-r / opt/spark-2.4.0 res-spark-0005:/opt

Modify the configuration file of the res-spark-0002 node

6.1 spark-defaults.conf

Spark.master spark://res-spark-0002:7077 starts cluster cd sbin./start-all.sh

Res-spark-0002 node

Cd sbin./start-master.sh test

The res-spark-0001 node executes. / stop-master.sh

The following results are obtained.

Submit appspark-submit-master spark://res-spark-0001:7077-driver-cores 4-driver-memory 6g-conf spark.dynamicAllocation.enabled=true-conf spark.shuffle.service.enabled=true-class com.cloud.RuleEngine rule-engine-1.0-SNAPSHOT-jar-with-dependencies.jar

Error message

18-12-30 08:47:41 ERROR TaskSchedulerImpl: Lost executor 3 on 172.16.0.24: Unable to create executor due to Unable to register with external shuffle server due to: Failed to connect to / 172.16.0.24:7337

Official website:

In standalone mode, simply start your workers with spark.shuffle.service.enabled set to true.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.