Getting started with spark and job task submission process 11/04 Update SLTechnology News&Howtos

Getting started with spark and job task submission process

2025-11-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Spark is a distributed computing engine of the Apache open source community, which is based on memory computing, so it is faster than hadoop. download

Address spark.apache.org

Installation

Copy a separate virtual machine named c

Modify its ip,192.168.56.200

Change its hostname to c _ c _

Modify / etc/hosts to add the parsing of the local machine

Restart the network service systemctl restart network

Upload spark installation files to the root directory

Extract spark to / usr/local and change its name to spark

Local operation mode uses spark-submit to submit job

Cd / usr/local/spark

. / bin/spark-submit-- class org.apache.spark.examples.SparkPi. / examples/jars/spark-examples_2.11-2.1.0.jar 10000

Interactive submission using spark-shell

Create a text file hello.txt under root

. / bin/spark-shell

Connect a terminal again, observe the process with jps, and you will see the spark-submit process.

Sc.textFile ("/ root/hello.txt")

Val lineRDD = sc.textFile ("/ root/hello.txt")

LineRDD.foreach (println)

Observe the situation on the web side.

Val wordRDD = lineRDD.flatMap (line = > line.split (""))

WordRDD.collect

Val wordCountRDD = wordRDD.map (word = > (word,1))

WordCountRDD.collect

Val resultRDD = wordCountRDD.reduceByKey ((x _ ray) = > x _ ray)

ResultRDD.collect

Val orderedRDD = resultRDD.sortByKey (false)

OrderedRDD.collect

OrderedRDD.saveAsTextFile ("/ root/result")

Observation result

Easy to write: sc.textFile ("/ root/hello.txt"). FlatMap (_ .split (")). Map ((_, 1)). ReduceByKey (_ + _). SortByKey (). Collect

Using local mode to access hdfs data

Start-dfs.sh

Spark-shell execution: sc.textFile ("hdfs://192.168.56.100:9000/hello.txt"). FlatMap (_ .split (")). Map ((_, 1)). ReduceByKey (_ + _). SortByKey (). Collect (you can replace ip with master and modify / etc/hosts)

Sc.textFile ("hdfs://192.168.56.100:9000/hello.txt"). FlatMap (_ .split (")). Map ((_, 1)). ReduceByKey (_ + _). SortByKey (). SaveAsTextFile (" hdfs://192.168.56.100:9000/output1 ")

Spark standalone mode

Extract the spark on master and all slave

Modify the conf/slaves file on master and add slave

Modify conf/spark-env.sh,export SPARK_MASTER_HOST=master

Copy spark-env.sh to each slave

Cd / usr/local/spark

. / sbin/start-all.sh

Execute:. / bin/spark-shell-- master spark://192.168.56.100:7077 on c (you can also use a configuration file)

Observe http://master:8080

Spark on yarn mode

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.