Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Spack installation and use

2025-01-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Prerequisites: JDK1.8, no secret communication, zookeeper, hadoop

Use server list

masterslave1slave2192.168.3.58192.168.3.54192.168.3.31QuorumPeerMainQuorumPeerMainNameNodeDataNodeDataNodeJournalNodeManagerDFSZKFailoverControllerDFSZKFailoverControllerMasterWorkerWorkerscala What is scala (Baidu Encyclopedia)

Scala is a multi-paradigm programming language, a Java-like programming language [1] designed to implement scalable languages [2] and integrate features of object-oriented programming and functional programming.

Download and install scala

Official download address: www.scala-lang.org/download/

download

cd /data

wget https://downloads.lightbend.com/scala/2.12.4/scala-2.12.4.tgz

tar axf scala-2.12.4.tgz

Add environment variables

vim /etc/profile

#scalaexport SCALA_HOME=/data/scala-2.12.4export PATH=$PATH:${SCALA_HOME}/bin

source /etc/profile

inspection

scala -version

Display version information indicating successful installation

What is spark?

Apache spark is a fast and general engine for large-scale data processing.

Spark is a fast and versatile big data computing engine.

Apache Spark is a fast, general-purpose computing engine designed for large-scale data processing.

Continue with Baidu Encyclopedia:

Spark is an open source clustered computing environment similar to Hadoop, but there are some useful differences that make Spark superior for certain workloads, in other words, Spark enables in-memory distributed datasets that optimize iterative workloads in addition to providing interactive queries.

Spark is implemented in the Scala language and uses Scala as its application framework. Unlike Hadoop, Spark and Scala are tightly integrated, with Scala making it easy to manipulate distributed data sets like native collection objects.

Download and install spark

Official website: spark.apache.org/

Here I followed the advice of my predecessors and used one or two versions lower than the latest version.

cd /data

wget https://www.apache.org/dyn/closer.lua/spark/spark-2.1.2/spark-2.1.2-bin-hadoop2.7.tgz

tar axf spark-2.1.2-bin-hadoop2.7.tgz

Add environment variables

vim /etc/profile

#spackexport SPARK_HOME=/data/spark-2.1.2-bin-hadoop2.7export PATH=$PATH:${SPARK_HOME}/bin

source /etc/profile

modify the configuration file

cd ${SPARK_HOME}/conf

cp fairscheduler.xml.template fairscheduler.xml

cp log4j.properties.template log4j.properties

cp slaves.template slaves

cp spark-env.sh.template spark-env.sh

cp spark-defaults.conf.template spark-site.conf

vim slaves

#Delete localhost, add worker node information

masterslave1slave2

vim spark-env.sh

#Add the following information: JAVA_HOME, SCALA_HOME, SPARK_MASTER_IP, SPARK_WORKER_MEMORY, HADOOP_CONF_DIR

export JAVA_HOME=/usr/local/jdkexport SCALA_HOME=/data/scala-2.12.4export SPARK_WORKER_MEMORY=1gexport HADOOP_CONF_DIR=/data/hadoop/etc/hadoop/

vim spark-site.conf

spark.master spark://master:7077

Copy files to other nodes

scp -r /data/scala-2.12.4 slave1:/data

scp -r /data/scala-2.12.4 slave2:/data

scp -r /data/spark-2.1.2-bin-hadoop2.7 slave1:/data

scp -r /data/spark-2.1.2-bin-hadoop2.7 slave2:/data

scp -r /etc/profile slave1:/etc

scp -r /etc/profile slave2:/etc

Start cluster

cd ${SPARK_HOME}/sbin

./ start-all.sh

Single-node startup master

cd ${SPARK_HOME}/sbin

./ start-master.sh

Single-node startup slave

./ start-slave.sh

Multi-master, HA implementation

All nodes are modified

Modify spark-site.conf

vim spark-site.conf

spark.master spark://master:7077,slave1:7077,slave2:7077

Modify spark-env.sh to specify zookeeper cluster

vim spark-env.sh

export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=master:2181,slave1:2181,slave2:2181 -Dspark.deploy.zookeeper.dir=/data/spark-2.1.2-bin-hadoop2.7"

Start cluster

master node

cd ${SPARK_HOME}/sbin

./ start-all.sh

slave1 node

cd ${SPARK_HOME}/sbin

./ start-master.sh

slave2 node

cd ${SPARK_HOME}/sbin

./ start-master.sh

view status

IP:8080

failover test

kill the Master process on the master

Reference: www.cnblogs.com/liugh/p/6624923.html

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report