Spark2.x from shallow to deep to the end series 6 RDD java api detailed explanation 4 10/23 Update SLTechnology News&Howtos

Spark2.x from shallow to deep to the end series 6 RDD java api detailed explanation 4

2025-10-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Before learning any knowledge of spark, you should have a correct understanding of spark. You can refer to: correctly understand spark.

This paper gives an explanation of api related to join.

SparkConf conf = new SparkConf (). SetAppName ("appName"). SetMaster ("local"); JavaSparkContext sc = new JavaSparkContext (conf); JavaPairRDD javaPairRDD = sc.parallelizePairs (Arrays.asList (new Tuple2 (1,2), new Tuple2 (3,4), new Tuple2 (3,6), new Tuple2 (5,6)); JavaPairRDD otherJavaPairRDD = sc.parallelizePairs (Arrays.asList (new Tuple2 (3,9), new Tuple2 (4,5) / result: [(4, ([], [5])), (1, ([2], [])), (3, ([4,6], [9])), (5, ([6], []))] System.out.println (javaPairRDD.cogroup (otherJavaPairRDD). Collect ()) / result: [(4, ([], [5])), (1, ([2], [])), (3, ([4,6], [9])), (5, ([6], [])] / / groupWith and cogroup have the same System.out.println (javaPairRDD.groupWith (otherJavaPairRDD). Collect ()). / / result: [(3, (4jue 9)), (3, (6je 9))] / / based on cogroup, that is, the same key in cogroup result has value data System.out.println in both RDD (javaPairRDD.join (otherJavaPairRDD). Collect ()). / / result: [(1, (2) otherJavaPairRDD. Empty)), (3, (4) optional [9]), (3, (6) optional [9]), (5, (6) optional.empty)] / / based on cogroup, the required key is subject to the RDD on the left (javaPairRDD.leftOuterJoin (otherJavaPairRDD). Collect ()). / / result: [(4, (Optional.empty,5)), (3, (Optional [4], 9)), (3, (Optional [6], 9))] / / based on cogroup, the required key is based on the RDD on the right (javaPairRDD.rightOuterJoin (otherJavaPairRDD). Collect ()). / result: [(4, (Optional.empty,Optional [5])), (1, (Optional [2], Optional.empty)), (3, (Optional [4], Optional [9])), (3, (Optional [6], Optional [9])), (5, (Optional [6], Optional.empty))] / / based on cogroup, the key needed to appear is all keySystem.out.println (javaPairRDD.fullOuterJoin (otherJavaPairRDD). Collect ()) in two RDD.

As can be seen from above, the most basic operation is the cogroup operation. The following is the schematic diagram of cougroup:

If you want to have a more thorough understanding of cogroup principle, you can refer to: detailed explanation of spark core RDD api principle

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.