Big data's learning route to share Master's jps 07/04 Update SLTechnology News&Howtos

Big data's learning route to share Master's jps

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Big data's learning route to share Master's jps,SparkSubmit

Class, which is used to submit tasks

Which section starts the submission task and which section starts the submit (Driver side)

Submit task flow

The 1.Driver side submits the task to Master (starts the sparkSubmit process)

2.Master generates task information and puts it in the alignment

3.Master informs Worker to start Executor, (Master filters out surviving Worker and assigns tasks to worker with more free resources)

The Executor of 4.worker registers with the driver side (only executor actually participates in the calculation)-> worker takes the information from the Dirver side

The 5.Driver starts Executor to divide the task into stages, divides it into small task, and then broadcasts it to the corresponding Worker for him to execute.

6.worker will send the completed task back to Driver.

Range is equivalent to a collection subclass.

Scala > 1.to (10)

Res0: scala.collection.immutable.Range.Inclusive = Range (1,2,3,4,5,6,7,8

9, 10)

Scala > 1 to 10

Res1: scala.collection.immutable.Range.Inclusive = Range (1,2,3,4,5,6,7,8

9, 10)

Submit tasks to the task class of the cluster:

Spark context available as sc

SQL context available as sqlContext

Call directly:

Spark WordCount

Build the template code:

SparkConf: build the configuration information class, which takes precedence over the cluster profile

SetAppName: specifies the name of the application. If not, a name similar to that generated by uuid will be automatically generated.

SetMaster: specify the running mode: local- simulates cluster operation with 1 thread

Local [2]: simulate cluster operation with 2 threads, loca [*]-use as many threads as there are currently idle threads to run the task

/ * *

* use spark to count words

, /

Object SparkWordCount {

Def main (args: Array [String]): Unit = {

/ * *

* build template code

, /

Val conf: SparkConf = new SparkConf ()

.setAppName ("SparkWordCount")

/ / .setMaster ("local [2]")

/ / create an entry class (context object) to submit tasks to the cluster

Val sc: SparkContext = new SparkContext (conf)

/ / get the data of HDFS

Val lines: RDD [String] = sc.textFile (args (0))

/ / split the data to generate words

Val words: RDD [String] = lines.flatMap (_ .split (""))

/ / generate words into tuples

Val tuples: RDD [(String, Int)] = words.map ((_ 1))

/ / perform aggregation operation

/ / tuples.reduceByKey ((x, y) = > x + y)

Val sumed: RDD [(String, Int)] = tuples.reduceByKey (_ + _)

/ / sort words in descending order by the number of times they appear

Val sorted: RDD [(String, Int)] = sumed.sortBy (_ .2, false)

/ / print to the console

/ / println (sorted.collect.toBuffer)

/ / sorted.foreach (x = > println (x))

/ / sorted.foreach (println)

/ / Save the result to HDFS

Sorted.saveAsTextFile (args (1))

/ / release resources

Sc.stop ()

}

Upload Linux after package

1. Start the zookeeper,hdfs and Spark clusters first

Start hdfs

/ usr/local/hadoop-2.6.1/sbin/start-dfs.sh

Start spark

/ usr/local/spark-1.6.1-bin-hadoop2.6/sbin/start-all.sh

two。 Use the spark-submit command to submit the Spark application (note the order of the parameters)

/ usr/local/spark-1.6.1-bin-hadoop2.6/bin/spark-submit\

-- class com.qf.spark.WordCount\

-- master spark://node01:7077\

-- executor-memory 2G\

-- total-executor-cores 4\

/ root/spark-mvn-1.0-SNAPSHOT.jar\

Hdfs://node01:9000/words.txt\

Hdfs://node01:9000/out

3. View the execution results of the program

Hdfs dfs-cat hdfs://node01:9000/out/part-00000

JavaSparkWC

Import org.apache.spark.SparkConf

Import org.apache.spark.api.java.JavaPairRDD

Import org.apache.spark.api.java.JavaRDD

Import org.apache.spark.api.java.JavaSparkContext

Import org.apache.spark.api.java.function.FlatMapFunction

Import org.apache.spark.api.java.function.Function2

Import org.apache.spark.api.java.function.PairFunction

Import scala.Tuple2

Import java.util.Arrays

Import java.util.List

Public class JavaSparkWC {

Public static void main (String [] args) {

SparkConf conf = new SparkConf ()

.setAppName ("JavaSparkWC") .setMaster ("local [1]")

/ / submit task entry class

JavaSparkContext jsc = new JavaSparkContext (conf)

/ / obtain data

JavaRDD lines = jsc.textFile ("hdfs://hadoop01:9000/wordcount/input/a.txt")

/ / split the data

JavaRDD words =

Lines.flatMap (new FlatMapFunction () {

@ Override

Public Iterable call (String s) throws Exception {

List splited = Arrays.asList (s.split ("")); / / generate list

Return splited

}

});

/ / Generator / / one-to-one group, (input words, output words, output 1)

JavaPairRDD tuples =

Words.mapToPair (new PairFunction () {

@ Override

Public Tuple2 call (String s) throws Exception {

Return new Tuple2 (s, 1)

}

});

/ / aggregate / / 2 value with the same key, aggregate

JavaPairRDD sumed =

Tuples.reduceByKey (new Function2 () {

@ Override

Public Integer call (Integer v1, Integer v2) throws Exception {

Return v1 + v2

}

});

/ / previously, key is of String type, so there is no way to sort it.

/ / Java api does not provide a sortBy operator. At this time, you need to swap the positions of the two values. After the sorting is complete, you need to switch back.

Final JavaPairRDD swaped =

Sumed.mapToPair (new PairFunction () {

@ Override

Public Tuple2 call (Tuple2 tup) throws Exception {

/ / return new Tuple2 (tup._2, tup._1)

Return tup.swap (); / / swap (), exchange method

}

});

/ / sort in descending order

JavaPairRDD sorted = swaped.sortByKey (false)

/ / swap again

JavaPairRDD res = sorted.mapToPair (

New PairFunction () {

@ Override

Public Tuple2 call (Tuple2 tup) throws Exception {

Return tup.swap ()

}

});

System.out.println (res.collect ())

Jsc.stop (); / / release resources

}

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.