What is the process of spark01--scala 's wordcount? 04/28 Update SLTechnology News&Howtos

What is the process of spark01--scala 's wordcount?

2026-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Today, I will talk to you about the wordcount process of spark01--scala. Many people may not know much about it. In order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.

First edition: original version

Def main (args: Array [String]): Unit = {val conf = new SparkConf () conf.setAppName ("workcount") conf.setMaster ("local") / / SparkContext is the only channel to the spark cluster val sc = new SparkContext (conf) / * * load and configure the contents of the words file under the current project * content: hello java hello spark hello hdfs hello mr Hello java hello spark * / val lines = sc.textFile (". / words") / / line is each line Each line is segmented into RRD type val lists: RDD [String] = lines.flatMap (line = > {line.split (")}) / / words are converted into tuples val values: RDD [(String, Int)] = lists.map (word= > {new Tuple2 (word,1)}) / * reduceByKey function first groups the same words (key) For example, hello 1 java 1 java 1 spark 1 spark 1 hdfs 1 mr 1 (v1:Int, v2:Int) = > {v1+v2} indicates the grouped word Sring,Int, and value of the same key is added If v1+v2 is returned, the cumulative value is * / val result: RDD [(String, Int)] = values.reduceByKey ((v1:Int, v2:Int) = > {v1+v2}) / / traversal result result.foreach (println) / / close sc.stop ()}

Second edition:

Def main (args: Array [String]): Unit = {val conf = new SparkConf () conf.setAppName ("workcount") conf.setMaster ("local") val sc = new SparkContext (conf) val result = sc.textFile (". / words") .flatMap (line= > line.split (")) .map (world= > new Tuple2 (world,1). ReduceByKey (v1:Int, v2:Int) = > {v1+v2}) result.foreach (println) sc.stop ()}

Version 3: the simplest version

Def main (args: Array [String]): Unit = {val conf = new SparkConf () conf.setAppName ("workcount") conf.setMaster ("local") val sc = new SparkContext (conf) val result = sc.textFile (". / words"). FlatMap (_ .split (")). Map ((_, 1). ReduceByKey (_ + _) result.foreach (println) sc.stop ()}

Explain after simplification:

The parameter line in xxx.flatMap (line= > line.split ("")) is only used once after = >. It can be denoted by the "_" symbol, xxx.flatMap (_ .split ("")).

The world parameter in xxx.map (world= > new Tuple2 (world,1)) is also used only once after = >, which can be indicated by "_". The tuple can omit new or Tuple2,xxx.map ((_, 1)).

V1 v2:Int v2 in xxx.reduceByKey ((v1:Int, v2:Int) = > {v1+v2}) is also used only once after = >, and can be expressed by "_", xxx.reduceByKey ((_ + _)).

After reading the above, do you have any further understanding of the process of spark01--scala 's wordcount? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.