Building and testing methods of Spark Environment 07/16 Update SLTechnology News&Howtos

Building and testing methods of Spark Environment

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "Spark environment building and testing methods". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

0. Environment

Official recommendation:

Spark runs on Java 6, Python 2.6 + and R 3.1. For the Scala API, Spark 1.4.0 uses Scala 2.10. You will need to use a compatible Scala version (2.10.x).

Scala2.11.x needs to download additional spark support packs

Native environment:

Ubuntu14.04 + jdk1.8 + python2.7 + scala2.10.5 + hadoop2.6.0 + spark1.4.0

1. Install and configure scala

Download scala from http://www.scala-lang.org/download/2.10.5.html#Other_resources

Upload the scala installation package and extract it

To configure the environment variable, add vim / etc/profile as follows:

Export JAVA_HOME=/usr/local/java/jdk1.8.0_45export JRE_HOME=$ {JAVA_HOME} / jre export CLASSPATH=.:$ {JAVA_HOME} / lib:$ {JRE_HOME} / lib export HADOOP_HOME=/home/nob/opt/hadoop-2.6.0export SCALA_HOME=/home/nob/opt/scala-2.10.5export SPARK_HOME=/home/nob/opt/spark-1.4.0-bin-hadoop2.6export PATH=$ {JAVA_HOME} / bin:$ {HADOOP_ HOME} / bin:$ {SCALA_HOME} / bin:$ {SPARK_HOME} / bin:$PATH

After source / etc/profile, type scala-version to see the version information

2. Spark configuration

Download and extract to: / home/nob/opt/spark-1.4.0-bin-hadoop2.6

Configure the running environment, edit spark-env.sh

Nob@nobubuntu:~/opt/spark-1.4.0-bin-hadoop2.6$ vim conf/spark-env.shexport JAVA_HOME=/usr/local/java/jdk1.8.0_45export SCALA_HOME=/home/nob/opt/scala-2.10.5export HADOOP_HOME=/home/nob/opt/hadoop-2.6.0export HADOOP_CONF_DIR=/home/nob/opt/hadoop-2.6.0/etc/hadoopexport SPARK_MASTER_IP=nobubuntuexport SPARK_WORKER_MEMORY=512M

SPARK_MASTER_IP is the ip or hostname of the master node

3. Start nob@nobubuntu:~/opt/spark-1.4.0-bin-hadoop2.6$ sbin/start-all.sh starting org.apache.spark.deploy.master.Master, logging to / data/server/spark-1.4.0-bin-hadoop2.6/sbin/../logs/spark-nob-org.apache.spark.deploy.master.Master-1-nobubuntu.outnobubuntu: org.apache.spark.deploy.worker.Worker running as process 10297. Stop it first.nob@nobubuntu:~/opt/spark-1.4.0-bin-hadoop2.6 $jps8706 DataNode9062 ResourceManager10775 Jps9192 NodeManager10569 Master10297 Worker8572 NameNode8911 SecondaryNameNodenob@nobubuntu:~/opt/spark-1.4.0-bin-hadoop2.6 $

Jps can see Master and Worker processes, and visit http://nobubuntu:8080/ to see the details of the operation.

4. Use the Python shell test that comes with Spark

Using PySpark shell, under the source code path decompressed by Spark, run

Bin/pyspark

At the prompt, enter the following command in turn

> lines = sc.textFile ("README.md") > lines.count () > lines.first () 5. Modify the level of the log printed.

After running above, I found that there are too many logs printed in shell environment, so I need to adjust the level of the following logs. To that end, I am in

Create a new file, log4j.properties, under the conf directory, which is a copy of log4j.properties.template, where

The following line

Log4j.rootCategory=INFO, console

Change to

Log4j.rootCategory=WARN, console

Then reopen shell and find that there is much less debugging information.

6. Use Scala shell to test line count Mini Program

Open the Scala version of shell and run

Bin/spark-shellscala > val lines = sc.textFile ("README.md") scala > lines.cout () scala > lines.first ()

A stand-alone application that demonstrates python. At that time, you can also use scala or java, which is very simple, from the official documentation.

"" SimpleApp.py "from pyspark import SparkContextlogFile =" YOUR_SPARK_HOME/README.md "# Should be some file on your systemsc = SparkContext (" local "," SimpleApp ") logData = sc.textFile (logFile). Cache () numAs = logData.filter (lambda's:'a'in s). Count () numBs = logData.filter (lambda's:'b' in s). Count () print" Lines with a:% I, lines with b:% I "% (numAs, numBs)

Use bin/spark-submit to execute the above script

# Use spark-submit to run your application$ YOUR_SPARK_HOME/bin/spark-submit-- master local [4] SimpleApp.py...Lines with a: 46, Lines with b: 23 "Spark Environment Building and testing methods" ends here. Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.