In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Before learning any knowledge of spark, you should have a correct understanding of spark. You can refer to: correctly understand spark.
This paper gives an explanation of api related to join.
SparkConf conf = new SparkConf (). SetAppName ("appName"). SetMaster ("local"); JavaSparkContext sc = new JavaSparkContext (conf); JavaPairRDD javaPairRDD = sc.parallelizePairs (Arrays.asList (new Tuple2 (1,2), new Tuple2 (3,4), new Tuple2 (3,6), new Tuple2 (5,6)); JavaPairRDD otherJavaPairRDD = sc.parallelizePairs (Arrays.asList (new Tuple2 (3,9), new Tuple2 (4,5) / result: [(4, ([], [5])), (1, ([2], [])), (3, ([4,6], [9])), (5, ([6], []))] System.out.println (javaPairRDD.cogroup (otherJavaPairRDD). Collect ()) / result: [(4, ([], [5])), (1, ([2], [])), (3, ([4,6], [9])), (5, ([6], [])] / / groupWith and cogroup have the same System.out.println (javaPairRDD.groupWith (otherJavaPairRDD). Collect ()). / / result: [(3, (4jue 9)), (3, (6je 9))] / / based on cogroup, that is, the same key in cogroup result has value data System.out.println in both RDD (javaPairRDD.join (otherJavaPairRDD). Collect ()). / / result: [(1, (2) otherJavaPairRDD. Empty)), (3, (4) optional [9]), (3, (6) optional [9]), (5, (6) optional.empty)] / / based on cogroup, the required key is subject to the RDD on the left (javaPairRDD.leftOuterJoin (otherJavaPairRDD). Collect ()). / / result: [(4, (Optional.empty,5)), (3, (Optional [4], 9)), (3, (Optional [6], 9))] / / based on cogroup, the required key is based on the RDD on the right (javaPairRDD.rightOuterJoin (otherJavaPairRDD). Collect ()). / result: [(4, (Optional.empty,Optional [5])), (1, (Optional [2], Optional.empty)), (3, (Optional [4], Optional [9])), (3, (Optional [6], Optional [9])), (5, (Optional [6], Optional.empty))] / / based on cogroup, the key needed to appear is all keySystem.out.println (javaPairRDD.fullOuterJoin (otherJavaPairRDD). Collect ()) in two RDD.
As can be seen from above, the most basic operation is the cogroup operation. The following is the schematic diagram of cougroup:
If you want to have a more thorough understanding of cogroup principle, you can refer to: detailed explanation of spark core RDD api principle
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.