Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to deploy Spark applications

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "how to deploy Spark applications". In daily operation, I believe many people have doubts about how to deploy Spark applications. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "how to deploy Spark applications". Next, please follow the editor to study!

Deployment of Spark applications

Local

Spark standalone

Hadoop yarn

Apache mesos

Amazon ec2

Spark standalone cluster deployment

Standalonestandalone ha

SPARK source code compilation

SBT compilation

SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly

Export MAVEN_OPTS= "- Xmx2g-XX:MaxPermSize=512M-XX:ReservedCodeCacheSize=512m"

Mvn-Pyarn-Phadoop-2.4-Dhadoop.version=2.4.0-DskipTests clean package

Spark deployment package Generation Command make-distribution.sh

-- hadoop VERSION:hadoop version number without this parameter is hadoop version 1.0.4

-- whether with-yarn supports hadoop yarn if no parameters are added.

Whether with-hive supports hive in sparksql. If this parameter is not added, hive is not supported.

-- whether skip-tachyon supports the memory file system Tachyon. Without this parameter, no tgz file is generated, only the / dist directory is generated.

-- name NAME and-tgz can be combined to generate the deployment package of spark- version-bin-$NAME.tgz. If this parameter is not added, NAME is the version number of hadoop.

Deployment package generation

Generate a deployment package that supports yarn hadoop2.2.0

. / make-distribution.sh-- hadoop 2.2.0-- with-yarn-- tgz

Generate a deployment package that supports yarn hive

. / make-distribution.sh-- hadoop 2.2.0-- with-yarn-- with-hive-- tgz

[root@localhost lib] # ls / root/soft/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar

/ root/soft/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar

[root@localhost conf] # vi slaves [slave node, if pseudo distribution is]

Localhost

[root@localhost conf] # cp spark-env.sh.template spark-env.sh

[root@localhost conf] # vi spark-env.sh is copied to all nodes

File conf/spark-env.sh

Export SPARK_MASTER_IP=localhost

Export SPARK_MASTER_PORT=7077

Export SPARK_WORKER_CORES=1

Export SPARK__WORKER_INSTANCES=1

Export SPARK__WORKER_MEMORY=1

[root@localhost conf] #.. / sbin/start-all.sh

Starting org.apache.spark.deploy.master.Master, logging to / root/soft/spark-1.4.0-bin-hadoop2.6/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-localhost.localdomain.out

Localhost: starting org.apache.spark.deploy.worker.Worker, logging to / root/soft/spark-1.4.0-bin-hadoop2.6/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out

Localhost: failed to launch org.apache.spark.deploy.worker.Worker:

Localhost: JAVA_HOME is not set

Localhost: full log in/ root/soft/spark-1.4.0-bin-hadoop2.6/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out

Visit http://192.168.141.10:8080/

[root@localhost conf] #.. / bin/spark-shell-- master spark://localhost:7077

Access to http://192.168.141.10:8080/ has application id generation

Sparkstandalone HA deployment

HA based on file system

Spark.deploy.recoveryMode is set to FILESYSTEM

The directory where spark.deploy.recoveryDirecory Spark saves the recovery state

Set SPARK_DAEMON_JAVA_OPTS in Spark-env.sh

Export SPARK_DAEMON_JAVA_OPTS= "- Dspark.deploy.recoveryMode=FILESYSTEM-Dspark.deploy.recoveryDirecory=$dir"

Ha based on zookeeper

Spark.deploy.recoveryMode is set to ZOOKEEPER

Spark.deploy.zookeeper.url Zookeeper url

The directory where spark.deploy.zookeeper.dir Zookeeper saves the recovery state defaults to spark

Set SPARK_DAEMON_JAVA_OPTS in spark-env

Export SPARK_DAEMON_JAVA_OPTS= "- Dspark.deploy.recoveryMode=ZOOKEEPER-Dspark.deploy.zookeeper.url=hadoop1:2181,hadoop2:2181-D=spark.deploy.zookeeper.dir=$DIR"

Start startall

Then start start-master on another one.

[root@localhost ~] # jps

4609 Jps

4416 SparkSubmit

4079 Master

4291 SparkSubmit

Ssh secret-free

[root@localhost] # ssh-keygen-t rsa-P''

[root@localhost] # cat ~ / .ssh/id_rsa.pub > > ~ / .ssh/authorized_keys

[root@localhost] # chmod 600 ~ / .ssh/authorized_keys

[root@localhost conf] #.. / bin/spark-shell-- master spark://localhost:7077-- executor-memory 2g

Brief introduction of spark tool

Spark interaction tool spark-shell

Spark Application deployment tool spark-submit

Option

-- master MASTER_URL spark://host:port mesos://host:port yarn or local

-- where deploy-mode DEPLOY_MODE driver runs client runs on native cluster runs on cluster

-- class to be run by the class CLASS_NAME application package

-- name application name

-- jars comma-separated driver local jar packages to run and executor classpath

Py-files PY_FILES comma separated list of files to be placed in each executor working directory

-- properties-file FILE sets the file placement text for application properties by default to conf/spark-defaults.conf

-- driver-memory MEMDRIVER memory size defaults to 512m

-- java option for driver-java-options driver

-- driver-library-path driver library path

-driver-class-path driver classpath

-- the memory size set by executor-memory MEM defaults to 1G

[root@localhost sbin] # sh start-dfs.sh

Scala > val rdd=sc.textFile ("hdfs://localhost.localdomain:9000/20140824/test-data.csv")

Scala > val rdd2=rdd.flatMap (_ .split (")) .map (x = > (xmem1)) .reduceByKey (_ + _)

At this point, the study on "how to deploy Spark applications" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report