In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces "how to deploy Spark applications". In daily operation, I believe many people have doubts about how to deploy Spark applications. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "how to deploy Spark applications". Next, please follow the editor to study!
Deployment of Spark applications
Local
Spark standalone
Hadoop yarn
Apache mesos
Amazon ec2
Spark standalone cluster deployment
Standalonestandalone ha
SPARK source code compilation
SBT compilation
SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly
Export MAVEN_OPTS= "- Xmx2g-XX:MaxPermSize=512M-XX:ReservedCodeCacheSize=512m"
Mvn-Pyarn-Phadoop-2.4-Dhadoop.version=2.4.0-DskipTests clean package
Spark deployment package Generation Command make-distribution.sh
-- hadoop VERSION:hadoop version number without this parameter is hadoop version 1.0.4
-- whether with-yarn supports hadoop yarn if no parameters are added.
Whether with-hive supports hive in sparksql. If this parameter is not added, hive is not supported.
-- whether skip-tachyon supports the memory file system Tachyon. Without this parameter, no tgz file is generated, only the / dist directory is generated.
-- name NAME and-tgz can be combined to generate the deployment package of spark- version-bin-$NAME.tgz. If this parameter is not added, NAME is the version number of hadoop.
Deployment package generation
Generate a deployment package that supports yarn hadoop2.2.0
. / make-distribution.sh-- hadoop 2.2.0-- with-yarn-- tgz
Generate a deployment package that supports yarn hive
. / make-distribution.sh-- hadoop 2.2.0-- with-yarn-- with-hive-- tgz
[root@localhost lib] # ls / root/soft/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar
/ root/soft/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar
[root@localhost conf] # vi slaves [slave node, if pseudo distribution is]
Localhost
[root@localhost conf] # cp spark-env.sh.template spark-env.sh
[root@localhost conf] # vi spark-env.sh is copied to all nodes
File conf/spark-env.sh
Export SPARK_MASTER_IP=localhost
Export SPARK_MASTER_PORT=7077
Export SPARK_WORKER_CORES=1
Export SPARK__WORKER_INSTANCES=1
Export SPARK__WORKER_MEMORY=1
[root@localhost conf] #.. / sbin/start-all.sh
Starting org.apache.spark.deploy.master.Master, logging to / root/soft/spark-1.4.0-bin-hadoop2.6/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-localhost.localdomain.out
Localhost: starting org.apache.spark.deploy.worker.Worker, logging to / root/soft/spark-1.4.0-bin-hadoop2.6/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
Localhost: failed to launch org.apache.spark.deploy.worker.Worker:
Localhost: JAVA_HOME is not set
Localhost: full log in/ root/soft/spark-1.4.0-bin-hadoop2.6/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
Visit http://192.168.141.10:8080/
[root@localhost conf] #.. / bin/spark-shell-- master spark://localhost:7077
Access to http://192.168.141.10:8080/ has application id generation
Sparkstandalone HA deployment
HA based on file system
Spark.deploy.recoveryMode is set to FILESYSTEM
The directory where spark.deploy.recoveryDirecory Spark saves the recovery state
Set SPARK_DAEMON_JAVA_OPTS in Spark-env.sh
Export SPARK_DAEMON_JAVA_OPTS= "- Dspark.deploy.recoveryMode=FILESYSTEM-Dspark.deploy.recoveryDirecory=$dir"
Ha based on zookeeper
Spark.deploy.recoveryMode is set to ZOOKEEPER
Spark.deploy.zookeeper.url Zookeeper url
The directory where spark.deploy.zookeeper.dir Zookeeper saves the recovery state defaults to spark
Set SPARK_DAEMON_JAVA_OPTS in spark-env
Export SPARK_DAEMON_JAVA_OPTS= "- Dspark.deploy.recoveryMode=ZOOKEEPER-Dspark.deploy.zookeeper.url=hadoop1:2181,hadoop2:2181-D=spark.deploy.zookeeper.dir=$DIR"
Start startall
Then start start-master on another one.
[root@localhost ~] # jps
4609 Jps
4416 SparkSubmit
4079 Master
4291 SparkSubmit
Ssh secret-free
[root@localhost] # ssh-keygen-t rsa-P''
[root@localhost] # cat ~ / .ssh/id_rsa.pub > > ~ / .ssh/authorized_keys
[root@localhost] # chmod 600 ~ / .ssh/authorized_keys
[root@localhost conf] #.. / bin/spark-shell-- master spark://localhost:7077-- executor-memory 2g
Brief introduction of spark tool
Spark interaction tool spark-shell
Spark Application deployment tool spark-submit
Option
-- master MASTER_URL spark://host:port mesos://host:port yarn or local
-- where deploy-mode DEPLOY_MODE driver runs client runs on native cluster runs on cluster
-- class to be run by the class CLASS_NAME application package
-- name application name
-- jars comma-separated driver local jar packages to run and executor classpath
Py-files PY_FILES comma separated list of files to be placed in each executor working directory
-- properties-file FILE sets the file placement text for application properties by default to conf/spark-defaults.conf
-- driver-memory MEMDRIVER memory size defaults to 512m
-- java option for driver-java-options driver
-- driver-library-path driver library path
-driver-class-path driver classpath
-- the memory size set by executor-memory MEM defaults to 1G
[root@localhost sbin] # sh start-dfs.sh
Scala > val rdd=sc.textFile ("hdfs://localhost.localdomain:9000/20140824/test-data.csv")
Scala > val rdd2=rdd.flatMap (_ .split (")) .map (x = > (xmem1)) .reduceByKey (_ + _)
At this point, the study on "how to deploy Spark applications" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.