In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces the relevant knowledge of "Spark environment building and testing methods". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
0. Environment
Official recommendation:
Spark runs on Java 6, Python 2.6 + and R 3.1. For the Scala API, Spark 1.4.0 uses Scala 2.10. You will need to use a compatible Scala version (2.10.x).
Scala2.11.x needs to download additional spark support packs
Native environment:
Ubuntu14.04 + jdk1.8 + python2.7 + scala2.10.5 + hadoop2.6.0 + spark1.4.0
1. Install and configure scala
Download scala from http://www.scala-lang.org/download/2.10.5.html#Other_resources
Upload the scala installation package and extract it
To configure the environment variable, add vim / etc/profile as follows:
Export JAVA_HOME=/usr/local/java/jdk1.8.0_45export JRE_HOME=$ {JAVA_HOME} / jre export CLASSPATH=.:$ {JAVA_HOME} / lib:$ {JRE_HOME} / lib export HADOOP_HOME=/home/nob/opt/hadoop-2.6.0export SCALA_HOME=/home/nob/opt/scala-2.10.5export SPARK_HOME=/home/nob/opt/spark-1.4.0-bin-hadoop2.6export PATH=$ {JAVA_HOME} / bin:$ {HADOOP_ HOME} / bin:$ {SCALA_HOME} / bin:$ {SPARK_HOME} / bin:$PATH
After source / etc/profile, type scala-version to see the version information
2. Spark configuration
Download and extract to: / home/nob/opt/spark-1.4.0-bin-hadoop2.6
Configure the running environment, edit spark-env.sh
Nob@nobubuntu:~/opt/spark-1.4.0-bin-hadoop2.6$ vim conf/spark-env.shexport JAVA_HOME=/usr/local/java/jdk1.8.0_45export SCALA_HOME=/home/nob/opt/scala-2.10.5export HADOOP_HOME=/home/nob/opt/hadoop-2.6.0export HADOOP_CONF_DIR=/home/nob/opt/hadoop-2.6.0/etc/hadoopexport SPARK_MASTER_IP=nobubuntuexport SPARK_WORKER_MEMORY=512M
SPARK_MASTER_IP is the ip or hostname of the master node
3. Start nob@nobubuntu:~/opt/spark-1.4.0-bin-hadoop2.6$ sbin/start-all.sh starting org.apache.spark.deploy.master.Master, logging to / data/server/spark-1.4.0-bin-hadoop2.6/sbin/../logs/spark-nob-org.apache.spark.deploy.master.Master-1-nobubuntu.outnobubuntu: org.apache.spark.deploy.worker.Worker running as process 10297. Stop it first.nob@nobubuntu:~/opt/spark-1.4.0-bin-hadoop2.6 $jps8706 DataNode9062 ResourceManager10775 Jps9192 NodeManager10569 Master10297 Worker8572 NameNode8911 SecondaryNameNodenob@nobubuntu:~/opt/spark-1.4.0-bin-hadoop2.6 $
Jps can see Master and Worker processes, and visit http://nobubuntu:8080/ to see the details of the operation.
4. Use the Python shell test that comes with Spark
Using PySpark shell, under the source code path decompressed by Spark, run
Bin/pyspark
At the prompt, enter the following command in turn
> lines = sc.textFile ("README.md") > lines.count () > lines.first () 5. Modify the level of the log printed.
After running above, I found that there are too many logs printed in shell environment, so I need to adjust the level of the following logs. To that end, I am in
Create a new file, log4j.properties, under the conf directory, which is a copy of log4j.properties.template, where
The following line
Log4j.rootCategory=INFO, console
Change to
Log4j.rootCategory=WARN, console
Then reopen shell and find that there is much less debugging information.
6. Use Scala shell to test line count Mini Program
Open the Scala version of shell and run
Bin/spark-shellscala > val lines = sc.textFile ("README.md") scala > lines.cout () scala > lines.first ()
A stand-alone application that demonstrates python. At that time, you can also use scala or java, which is very simple, from the official documentation.
"" SimpleApp.py "from pyspark import SparkContextlogFile =" YOUR_SPARK_HOME/README.md "# Should be some file on your systemsc = SparkContext (" local "," SimpleApp ") logData = sc.textFile (logFile). Cache () numAs = logData.filter (lambda's:'a'in s). Count () numBs = logData.filter (lambda's:'b' in s). Count () print" Lines with a:% I, lines with b:% I "% (numAs, numBs)
Use bin/spark-submit to execute the above script
# Use spark-submit to run your application$ YOUR_SPARK_HOME/bin/spark-submit-- master local [4] SimpleApp.py...Lines with a: 46, Lines with b: 23 "Spark Environment Building and testing methods" ends here. Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.