In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Big data's learning route to share Master's jps,SparkSubmit
Class, which is used to submit tasks
Which section starts the submission task and which section starts the submit (Driver side)
Submit task flow
The 1.Driver side submits the task to Master (starts the sparkSubmit process)
2.Master generates task information and puts it in the alignment
3.Master informs Worker to start Executor, (Master filters out surviving Worker and assigns tasks to worker with more free resources)
The Executor of 4.worker registers with the driver side (only executor actually participates in the calculation)-> worker takes the information from the Dirver side
The 5.Driver starts Executor to divide the task into stages, divides it into small task, and then broadcasts it to the corresponding Worker for him to execute.
6.worker will send the completed task back to Driver.
Range is equivalent to a collection subclass.
Scala > 1.to (10)
Res0: scala.collection.immutable.Range.Inclusive = Range (1,2,3,4,5,6,7,8
9, 10)
Scala > 1 to 10
Res1: scala.collection.immutable.Range.Inclusive = Range (1,2,3,4,5,6,7,8
9, 10)
Submit tasks to the task class of the cluster:
Spark context available as sc
SQL context available as sqlContext
Call directly:
Spark WordCount
Build the template code:
SparkConf: build the configuration information class, which takes precedence over the cluster profile
SetAppName: specifies the name of the application. If not, a name similar to that generated by uuid will be automatically generated.
SetMaster: specify the running mode: local- simulates cluster operation with 1 thread
Local [2]: simulate cluster operation with 2 threads, loca [*]-use as many threads as there are currently idle threads to run the task
/ * *
* use spark to count words
, /
Object SparkWordCount {
Def main (args: Array [String]): Unit = {
/ * *
* build template code
, /
Val conf: SparkConf = new SparkConf ()
.setAppName ("SparkWordCount")
/ / .setMaster ("local [2]")
/ / create an entry class (context object) to submit tasks to the cluster
Val sc: SparkContext = new SparkContext (conf)
/ / get the data of HDFS
Val lines: RDD [String] = sc.textFile (args (0))
/ / split the data to generate words
Val words: RDD [String] = lines.flatMap (_ .split (""))
/ / generate words into tuples
Val tuples: RDD [(String, Int)] = words.map ((_ 1))
/ / perform aggregation operation
/ / tuples.reduceByKey ((x, y) = > x + y)
Val sumed: RDD [(String, Int)] = tuples.reduceByKey (_ + _)
/ / sort words in descending order by the number of times they appear
Val sorted: RDD [(String, Int)] = sumed.sortBy (_ .2, false)
/ / print to the console
/ / println (sorted.collect.toBuffer)
/ / sorted.foreach (x = > println (x))
/ / sorted.foreach (println)
/ / Save the result to HDFS
Sorted.saveAsTextFile (args (1))
/ / release resources
Sc.stop ()
}
}
Upload Linux after package
1. Start the zookeeper,hdfs and Spark clusters first
Start hdfs
/ usr/local/hadoop-2.6.1/sbin/start-dfs.sh
Start spark
/ usr/local/spark-1.6.1-bin-hadoop2.6/sbin/start-all.sh
two。 Use the spark-submit command to submit the Spark application (note the order of the parameters)
/ usr/local/spark-1.6.1-bin-hadoop2.6/bin/spark-submit\
-- class com.qf.spark.WordCount\
-- master spark://node01:7077\
-- executor-memory 2G\
-- total-executor-cores 4\
/ root/spark-mvn-1.0-SNAPSHOT.jar\
Hdfs://node01:9000/words.txt\
Hdfs://node01:9000/out
3. View the execution results of the program
Hdfs dfs-cat hdfs://node01:9000/out/part-00000
JavaSparkWC
Import org.apache.spark.SparkConf
Import org.apache.spark.api.java.JavaPairRDD
Import org.apache.spark.api.java.JavaRDD
Import org.apache.spark.api.java.JavaSparkContext
Import org.apache.spark.api.java.function.FlatMapFunction
Import org.apache.spark.api.java.function.Function2
Import org.apache.spark.api.java.function.PairFunction
Import scala.Tuple2
Import java.util.Arrays
Import java.util.List
Public class JavaSparkWC {
Public static void main (String [] args) {
SparkConf conf = new SparkConf ()
.setAppName ("JavaSparkWC") .setMaster ("local [1]")
/ / submit task entry class
JavaSparkContext jsc = new JavaSparkContext (conf)
/ / obtain data
JavaRDD lines = jsc.textFile ("hdfs://hadoop01:9000/wordcount/input/a.txt")
/ / split the data
JavaRDD words =
Lines.flatMap (new FlatMapFunction () {
@ Override
Public Iterable call (String s) throws Exception {
List splited = Arrays.asList (s.split ("")); / / generate list
Return splited
}
});
/ / Generator / / one-to-one group, (input words, output words, output 1)
JavaPairRDD tuples =
Words.mapToPair (new PairFunction () {
@ Override
Public Tuple2 call (String s) throws Exception {
Return new Tuple2 (s, 1)
}
});
/ / aggregate / / 2 value with the same key, aggregate
JavaPairRDD sumed =
Tuples.reduceByKey (new Function2 () {
@ Override
Public Integer call (Integer v1, Integer v2) throws Exception {
Return v1 + v2
}
});
/ / previously, key is of String type, so there is no way to sort it.
/ / Java api does not provide a sortBy operator. At this time, you need to swap the positions of the two values. After the sorting is complete, you need to switch back.
Final JavaPairRDD swaped =
Sumed.mapToPair (new PairFunction () {
@ Override
Public Tuple2 call (Tuple2 tup) throws Exception {
/ / return new Tuple2 (tup._2, tup._1)
Return tup.swap (); / / swap (), exchange method
}
});
/ / sort in descending order
JavaPairRDD sorted = swaped.sortByKey (false)
/ / swap again
JavaPairRDD res = sorted.mapToPair (
New PairFunction () {
@ Override
Public Tuple2 call (Tuple2 tup) throws Exception {
Return tup.swap ()
}
});
System.out.println (res.collect ())
Jsc.stop (); / / release resources
}
}
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.