In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces Spark+Zookeeper how to build a high-availability Spark cluster, the content is very detailed, interested friends can refer to, I hope it can be helpful to you.
Comparison of three distributed deployment modes of Spark
Currently, Apache Spark supports three distributed deployment methods, namely standalone, spark on mesos and spark on YARN. For more information, please see.
Spark standalone mode distributed deployment
Environment introduction hostname application tvm11zookeepertvm12zookeepertvm13zookeeper, spark (master), spark (slave), Scalatvm14spark (backup), spark (slave), Scalatvm15spark (slave), Scala description
Rely on scala:
Note that support for Java 7, Python 2.6 and old Hadoop versions before 2.6.5 were removed as of Spark 2.2.0. Support for Scala 2.10 was removed as of 2.3.0. Support for Scala 2.11 is deprecated as of Spark 2.4.1 and will be removed in Spark 3.0.
Zookeeper: Master node has a single point of failure, so it is necessary to start at least two Master nodes to achieve high availability with the help of zookeeper. The configuration scheme is relatively simple.
Install scala
As can be seen from the above instructions, spark is strictly dependent on the scala version, and spark-2.4.5 depends on scala-2.12.x, so you need to install scala-2.12.x first, and choose scala-2.12.10 here. Use binary installation:
Download the installation package
Decompression is ready to use.
$wget https://downloads.lightbend.com/scala/2.12.10/scala-2.12.10.tgz$ tar zxvf scala-2.12.10.tgz-C / path/to/scala_install_dir
If the system environment wants to use the same version of scala, you can add it to the user environment variable (.bashrc or .bash _ profile).
Install spark
Get through the work user ssh channel of three spark machines
Now install the package to the master machine: tvm13
Download address
Pay attention to the prompt, and the Hadoop version (match the existing environment, if not, choose the non-precompiled version to compile it yourself).
Unzip it to the installation directory.
Configure spark
There are two main spark service profiles: spark-env.sh and slaves.
Spark-evn.sh: configure spark to run related environment variables
Slaves: specify the worker server
Configure spark-env.sh:cp spark-env.sh.template spark-env.sh
Export JAVA_HOME=/data/template/j/java/jdk1.8.0_201export SCALA_HOME=/data/template/s/scala/scala-2.12.10export SPARK_WORKER_MEMORY=2048mexport SPARK_WORKER_CORES=2export SPARK_WORKER_INSTANCES=2export SPARK_DAEMON_JAVA_OPTS= "- Dspark.deploy.recoveryMode=ZOOKEEPER-Dspark.deploy.zookeeper.url=tvm11:2181,tvm12:2181 Tvm13:2181-Dspark.deploy.zookeeper.dir=/data/template/s/spark "# about the meaning of SPARK_DAEMON_JAVA_OPTS parameters: #-Dspark.deploy.recoverMode=ZOOKEEPER # represents failure using zookeeper service #-Dspark.depoly.zookeeper.url=master.hadoop,slave1.hadoop Slave1.hadoop # name of hostname #-Dspark.deploy.zookeeper.dir=/spark # spark save directory when you want to write data on zookeeper # other parameter meaning: https://blog.csdn.net/u010199356/article/details/89056304
Configure slaves:cp slaves.template slaves
# A Spark Worker will be started on each of the machines listed below.tvm13tvm14tvm15
Configure spark-default.sh, which is mainly used for spark to perform tasks (can be specified by command action):
# http://spark.apache.org/docs/latest/configuration.html#configuring-logging# spark-defaults.shspark.app.name YunTuSparkspark.driver.cores 2spark.driver.memory 2gspark.master spark://tvm13:7077 Tvm14:7077spark.eventLog.enabled truespark.eventLog.dir hdfs://cluster01/tmp/event/logs spark.serializer org.apache.spark.serializer.KryoSerializerspark.serializer.objectStreamReset 100spark.executor.logs.rolling.time.interval dailyspark.executor.logs.rolling.maxRetainedFiles 30spark.ui.enabled Truespark.ui.killEnabled truespark.ui.liveUpdate.period 100msspark.ui.liveUpdate.minFlushPeriod 3sspark.ui.port 4040spark.history.ui.port 18080spark.ui.retainedJobs 100spark.ui.retainedStages 100spark.ui.retainedTasks 1000spark.ui.showConsoleProgress truespark.worker.ui.retainedExecutors 100spark.worker.ui.retainedDrivers 100spark.sql.ui.retainedExecutions 100spark.streaming.ui.retainedBatches 100spark.ui.retainedDeadExecutors 10 spark.executor.extraJavaOptions-XX:+PrintGCDetails-Dkey=value-Dnumbers= "one two three" hdfs resource preparation
Because spark.eventLog.dir is specified as hdfs storage, you need to pre-create the corresponding directory file in hdfs:
Hdfs dfs-mkdir-p hdfs://cluster01/tmp/event/logs configure system environment variables
Edit ~ / .bashrc:
Export SPARK_HOME=/data/template/s/spark/spark-2.4.5-bin-hadoop2.7export PATH=$SPARK_HOME/bin/:$PATH distribution
After the above configuration is complete, distribute / path/to/spark-2.4.5-bin-hadoop2.7 to each slave node and configure the environment variables for each node.
Start
Start all services on the master node first:. / sbin/start-all.sh
Then start the master service separately on the backup node:. / sbin/start-master.sh
View statu
After startup, go to web to check:
Master (port 8081): Status: ALIVE
Backup (port 8080): Status: STANDBY
Done!
This is the end of how to build a high-availability Spark cluster for Spark+Zookeeper. I hope the above content can be of some help and learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.