In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
Editor to share with you how to install, configure and basic use of Spark, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!
7. Spark
This topic describes the installation, configuration, and basic use of Spark.
Spark basic Information
Official website: http://spark.apache.org/ official tutorial: http://spark.apache.org/docs/latest/programming-guide.html7.1. Environment preparation # switch to Workspace cd / opt/workspaces# to create Spark data directory mkdir data/spark# create Spark log directory mkdir logs/spark
Official tutorial
Http://spark.apache.org/docs/latest/spark-standalone.html7.2. Install wget http://mirrors.hust.edu.cn/apache/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgztar-zxf spark-1.6.1-bin-hadoop2.6.tgzrm-rf spark-1.6.1-bin-hadoop2.6.tgzmv spark-1.6.1-bin-hadoop2.6. / frameworks/spark7.3. Configuration (pseudo-distributed)
Vi. / frameworks/spark/conf/spark-env.sh
Export SPARK_MASTER_IP=bdexport SPARK_MASTER_PORT=7077export MASTER=spark://$ {SPARK_MASTER_IP}: ${SPARK_MASTER_PORT} # specify Spark data directory export SPARK_LOCAL_DIRS=/opt/workspaces/data/spark/# specify Spark log directory export SPARK_LOG_DIR=/opt/workspaces/logs/spark/# specify JDK directory export JAVA_HOME=/opt/env/java# specify Scala directory export SCALA_HOME=/opt/env/scala7.4. Start and stop. / frameworks/spark/sbin/start-all.sh7.5. Test # perform an example of pi calculation. / frameworks/spark/bin/run-example org.apache.spark.examples.SparkPi./frameworks/spark/bin/spark-submit\-- class org.apache.spark.examples.SparkPi\-- master spark://bd:6066\-- deploy-mode cluster\-- driver-memory 512m\-- executor-memory 256m\ # if there is an error in running, please make it larger. / frameworks/spark/lib/spark-examples- 1.6.1-hadoop2.6.0.jar\ 10007.6. Word Count
Http://spark.apache.org/docs/latest/quick-start.html
Word Count
. / frameworks/spark/bin/spark-shell// basic val textFile = sc.textFile (". / frameworks/spark/README.md") val words = textFile.flatMap (line = > line.split (")) val exchangeVal = words.map (word = > (word,1)) val count = exchangeVal.reduceByKey ((a) b)) count.collect// optimized sc.textFile (". / frameworks/spark/README.md "). FlatMap (_ .split (")). Map ((_ _) 1). ReduceByKey (_ + _). Collect// with sort sc.textFile (". / frameworks/spark/README.md"). FlatMap (_ .split (")). Map ((_ 1). ReduceByKey (_ + _). Map (_ .swap) .sortByKey (false). Map (_ .swap). Collect// final version val wordR= ""\ w + "" .rsc.textFile (". / frameworks/spark/README.md"). FlatMap (_ .split (")) .filter (wordR.pattern.matcher (_). Matches). Map ((_) 1). ReduceByKey (_ + _). Map (_ .swap) .sortByKey (false). Map (_ .swap) .saveAsTextFile ("hdfs://bd:9000/wordcount")
You can visit http://:8080 to view the job 7.7. Parameter description
Where to configure:
The Spark properties (Spark property) is set in the application through the SparkConf object or through the Java system property.
Environment variables (environment variable) specifies the settings of each node, such as IP address and port, and the configuration file is in conf/spark-env.sh.
Logging (log) logs can be configured through log4j.properties.
Spark properties
Specify the configuration in the code
Val conf = new SparkConf () / / specifies 2 local threads to run. In local mode, we can use n threads (n > = 1), but in scenarios like Spark Streaming, we may need multiple threads. SetMaster ("local [2]") .setAppName ("CountingSheep") val sc = new SparkContext (conf)
Specify the configuration in the script
. / bin/spark-submit-- name "My app"-- master local [4]-- conf spark.eventLog.enabled=false-- conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails-XX:+PrintGCTimeStamps" myApp.jar
Table 1. Common configuration
Attribute name default value description
Spark.app.name
The name of the Spark application
Spark.driver.cores
one
The number of cores running the driver process in cluster mode
Spark.driver.memory
1g
The total amount of memory available to the driver process (e.g. 1g, 2g) has no effect in client mode. You must use-driver-memory on the command line or set it in the default property configuration file.
Spark.executor.memory
1g
The total amount of memory used by a single executor (e.g., 2g, 8g)
Spark.master
Cluster Manager URL
Environment variables
Environment variables are set in the ${SPARK_HOME} / conf/spark-env.sh script
Table 2. Common configuration
Schema property name default value description
JAVA_HOME
Java installation directory
SCALA_HOME
Scala installation directory
SPARK_LOCAL_IP
Locally bound IP
SPARK_LOG_DIR
${SPARK_HOME} / logs
Log directory
Standalone
SPARK_MASTER_IP
(current IP)
Master IP
Standalone
SPARK_MASTER_PORT
7077 (6066)
Master port
Standalone
MASTER
Default Master URL
Standalone
SPARK_WORKER_CORES
All
Upper limit of CPU core used per node
Standalone
SPARK_WORKER_MEMORY
All memory on this node minus 1GB
Upper limit of memory used per node
Standalone
SPARK_WORKER_INSTANCES
one
Number of worker instances started per node
Standalone
SPARK_WORKER_PORT
Random
Port bound by Worker
If your slave node performance is very strong, you can set the SPARK_WORKER_INSTANCES to greater than 1; accordingly, you need to set the SPARK_WORKER_CORES parameter to limit the number of CPU used by each worker instance, otherwise each worker instance will use all CPU.
Logging
Log is set in ${SPARK_HOME} / conf/log4j.properties
Hadoop cluster configuration
When using HDFS, you need to copy hdfs-site.xml and core-site.xml from Hadoop to the classpath of Spark
Http://spark.apache.org/docs/latest/configuration.html7.8. Resource scheduling
Standalone currently supports only simple first-in, first-out (FIFO) schedulers. This scheduler can support multiple users, and you can control the maximum resources used by each application. By default, Spark applications apply for all CPU in the cluster.
Restrict resources in your code
Val conf = new SparkConf () .setMaster (...) .setAppName (...) .set ("spark.cores.max", "10") val sc = new SparkContext (conf)
Restrict resources in the configuration file spark-env.sh
Export SPARK_MASTER_OPTS= "- Dspark.deploy.defaultCores=" 7.9. Performance tuning
Http://spark.apache.org/docs/latest/tuning.html7.10. Hardware configuration
Each node:
* 4-8 disks
* more than 8 GB of memory
* Gigabit network card
* 8-16 core CPU
At least 3 nodes
Http://spark.apache.org/docs/latest/hardware-provisioning.html7.11. Integrate Hive
Add configuration items in spark-env.sh
# Hive directory export HIVE_HOME=$HIVE_HOME
SPARK_CLASSPATH
Some tutorials say you want to add
Export SPARK_CLASSPATH=$HIVE_HOME/lib/mysql-connector-java-x.jar:$SPARK_CLASSPATH
However, this configuration is not required in the current version, and adding it will cause errors in the operation of zeppelin:
Org.apache.spark.SparkException: Found both spark.driver.extraClassPath and SPARK_CLASSPATH. Use only the former.
Copy several configuration files for Hive
Cp. / frameworks/hive/conf/hive-site.xml. / frameworks/spark/confcp. / frameworks/hive/conf/hive-log4j.properties. / frameworks/spark/conf
Start thriftserver to provide JDBC services to the outside world
. / frameworks/spark/sbin/start-thriftserver.sh
Test connection
. / frameworks/spark/bin/beeline!connect jdbc:hive2://bd:10000show tables
These are all the contents of the article "how to install, configure and use Spark". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.