Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Spark2.x from shallow to deep RDD of series 6 supports java8 lambda expressions

2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Before learning any technology of spark, please correctly understand spark, you can refer to: correctly understand spark

We already know in http://7639240.blog.51cto.com/7629240/1966131 that a scala function is actually an interface in java, and the same is true for java8 lambda, and an lambda expression is an interface in java. Next, let's take a look at the simplest example of wordcount in spark, which is implemented with java8's non-lambda and lambda respectively:

1. Java spark wordcount programs that are not implemented by lambda:

Public class WordCount {public static void main (String [] args) {SparkConf conf = new SparkConf (). SetAppName ("appName"). SetMaster ("local"); JavaSparkContext sc = new JavaSparkContext (conf); / / JavaPairRDD inputRDD = sc.hadoopFile ("hdfs://master:9999/user/word.txt", / / TextInputFormat.class, LongWritable.class, Text.class) JavaRDD inputRDD = sc.textFile ("file:///Users/tangweiqun/test.txt"); JavaRDD wordsRDD = inputRDD.flatMap (new FlatMapFunction () {@ Override public Iterator call (String s) throws Exception {return Arrays.asList (s.split (")) .iterator ();}})) JavaPairRDD keyValueWordsRDD = wordsRDD.mapToPair (new PairFunction () {@ Override public Tuple2 call (String s) throws Exception {return new Tuple2 (s, 1);}}) JavaPairRDD wordCountRDD = keyValueWordsRDD.reduceByKey (new HashPartitioner (2), new Function2 () {@ Override public Integer call (Integer a, Integer b) throws Exception {return a + b;}}) / / delete File outputFile = new File ("/ Users/tangweiqun/wordcount") if the output file exists; if (outputFile.exists ()) {File [] files = outputFile.listFiles (); for (File file: files) {file.delete ();} outputFile.delete () } wordCountRDD.saveAsTextFile ("file:///Users/tangweiqun/wordcount"); System.out.println (wordCountRDD.collect ());}}"

2. Wordcount code of java8 lambda implementation

Public class WordCount {public static void main (String [] args) {SparkConf conf = new SparkConf (). SetAppName ("appName"). SetMaster ("local"); JavaSparkContext sc = new JavaSparkContext (conf); / / JavaPairRDD inputRDD = sc.hadoopFile ("hdfs://master:9999/user/word.txt", / / TextInputFormat.class, LongWritable.class, Text.class) JavaRDD inputRDD = sc.textFile ("file:///Users/tangweiqun/test.txt"); JavaRDD wordsRDD = inputRDD.flatMap (input-> Arrays.asList (input.split (")). Iterator ()); JavaPairRDD keyValueWordsRDD = wordsRDD.mapToPair (word-> new Tuple2 (word, 1)); JavaPairRDD wordCountRDD = keyValueWordsRDD.reduceByKey ((a, b)-> a + b) / / delete File outputFile = new File ("/ Users/tangweiqun/wordcount") if the output file exists; if (outputFile.exists ()) {File [] files = outputFile.listFiles (); for (File file: files) {file.delete ();} outputFile.delete () } wordCountRDD.saveAsTextFile ("file:///Users/tangweiqun/wordcount"); System.out.println (wordCountRDD.collect ());}}"

As you can see from the above, the implementation of lambda is more concise, and you can also see that a lambda function expression is a java interface.

The combineByKey we mentioned in http://7639240.blog.51cto.com/7629240/1966958 looks like this:

JavaPairRDD javaPairRDD = sc.parallelizePairs (Arrays.asList (new Tuple2 ("coffee", 1), new Tuple2 ("coffee", 2), new Tuple2 ("panda", 3), new Tuple2 ("coffee", 9), 2) / / when a new key is encountered in a partition, apply the function Function createCombiner = new Function () {@ Override public Tuple2 call (Integer value) throws Exception {return new Tuple2 (value, 1);} to the value corresponding to the partition. / when you encounter a key that has already applied the above createCombiner function in a partition, apply the function Function2 mergeValue = new Function2 () {@ Override public Tuple2 call (Tuple2 acc, Integer value) throws Exception {return new Tuple2 (acc._1 () + value, acc._2 () + 1) to the value corresponding to the key;}} / / apply this function Function2 mergeCombiners = new Function2 () {@ Override public Tuple2 call (Tuple2 acc1, Tuple2 acc2) throws Exception {return new Tuple2 (acc1._1 () + acc2._1 (), acc1._2 () + acc2._2 ());} when you need to aggregate data from different partitions. JavaPairRDD combineByKeyRDD = javaPairRDD.combineByKey (createCombiner, mergeValue, mergeCombiners); / / result: [(coffee, (12)), (panda, (3))] System.out.println ("combineByKeyRDD =" + combineByKeyRDD.collect ())

It can be written as the combineByKey implemented by lambda as follows:

JavaPairRDD javaPairRDD = sc.parallelizePairs (Arrays.asList (new Tuple2 ("coffee", 1), new Tuple2 ("coffee", 2), new Tuple2 ("panda", 3), new Tuple2 ("coffee", 9), 2); / / when a new key is encountered in a partition, apply this function Function createCombiner = value-> new Tuple2 (value, 1) to the value corresponding to that key / / when you encounter a key that has already applied the above createCombiner function in a partition, apply the function Function2 mergeValue = (acc, value)-> new Tuple2 (acc._1 () + value, acc._2 () + 1) to the value corresponding to the key. / / apply this function Function2 mergeCombiners = (acc1, acc2)-> new Tuple2 (acc1._1 () + acc2._1 (), acc1._2 () + acc2._2 ()) when you need to aggregate data from different partitions; JavaPairRDD combineByKeyRDD = javaPairRDD.combineByKey (createCombiner, mergeValue, mergeCombiners); / / result: [(coffee, (12)), (panda, (3))] System.out.println ("combineByKeyRDD =" + combineByKeyRDD.collect ())

If you want an in-depth systematic understanding of spark RDD api, you can refer to: detailed explanation of the principle of spark core RDD api

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report