In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
Spark-shell implements WordCount& to sort by word-sort by count. For this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a simpler and easier way.
Enter:
Hello tomhello jerryhello kittyhello worldhello tom
Read the text file located in the hdfs://node1:9000/wc/input directory in HDFS, and assign the result to textRdd
Val textRdd = sc.textFile ("hdfs://node1:9000/wc/input") textRdd.collectres1: Array [String] = Array (hello,tom, hello,jerry, hello,kitty, hello,world, hello,tom)
Implement normal WordCount, but the results are not sorted by Key (word) like MapReduce
Val wcRdd = textRdd.flatMap (_ .split (")) .map ((_, 1)) .reduceByKey (_ + _) wcRdd.collectres2: Array [(String, Int)] = Array ((tom,2), (hello,5), (jerry,1), (kitty,1), (world,1)
Implement WordCount sorted by Key (word) (dictionary order)
Idea: sort Key (word) on the basis of wcRdd
Val sortByWordRdd = wcRdd.sortByKey (true) / / sort Key (word) on the basis of wcRdd: Array [(String, Int)] = Array ((hello,5), (jerry,1), (kitty,1), (tom,2), (world,1))
In Spark 1.3, you can use a transform operation of RDD:
Use the sortBy () operation
/ / _. _ 1: the first item of the tuple is word; true: sort val sortByWordRdd = wcRdd.sortBy (_. _ 1, true) sortByWordRdd.collectres3: Array [(String, Int)] = Array ((hello,5), (jerry,1), (kitty,1), (tom,2), (world,1)) in ascending order
Implement WordCount sorted (descending) by Value (count)
Idea 1: on the basis of wcRdd, reverse K (word) and V (count), sort Key (count) at this time, and then reverse it.
/ on the basis of wcRdd, reverse K (word), V (count), sort Key (count) at this time, and then reverse back to val sortByCountRdd = wcRdd.map (x = > (x.Zong2jue x.room1)) .sortByKey (false) .map (x = > (x.fu2mcmx.class1)) sortByCountRdd.collectres4: Array [(String, Int)] = Array ((hello,5), (tom,2), (jerry,1), (kitty,1)) (world,1))
Idea 2: use the sortBy () operation directly
/ / _. _ 2: item 2 of the tuple, which is count False: sort in descending order val sortByCountRdd = wcRdd.sortBy (_. _ 2, false) sortByCountRdd.collectres4: Array [(String, Int)] = Array ((hello,5), (tom,2), (jerry,1), (kitty,1), (world,1)) about spark-shell implementing WordCount& sort by word & this is the end of the solution to the question of sorting by count. I hope the above content can be helpful to you. If you still have a lot of questions to solve, you can follow the industry information channel to learn more about it.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.