Lesson 17: RDD cases (join, cogroup, etc.) 07/01 Update SLTechnology News&Howtos

Lesson 17: RDD cases (join, cogroup, etc.)

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

This lesson demonstrates the two most important operators in RDD, join and cogroup, through code practice.

Join operator code practice:

/ / demonstrate the join operator in code

Val conf = new SparkConf () .setAppName ("RDDDemo") .setMaster ("local")

Val sc = new SparkContext (conf)

Val arr1 = Array (Tuple2 (1, "Spark"), Tuple2 (2, "Hadoop"), Tuple2 (3, "Tachyon"))

Val arr2 = Array (Tuple2 (1,100), Tuple2 (2,70), Tuple2 (3,90))

Val rdd1 = sc.parallelize (arr1)

Val rdd2 = sc.parallelize (arr2)

Val rdd3 = rdd1.join (rdd2)

Rdd3.collect () foreach (println)

Running result:

(1, (Spark,100))

(3, (Tachyon,90))

(2, (Hadoop,70))

Cogroup operator code practice:

First of all, it is written by java:

SparkConf conf = new SparkConf () .setMaster ("local") .setAppName ("Cogroup")

JavaSparkContext sc = new JavaSparkContext (conf)

List nameList = Arrays.asList (new Tuple2 (1, "Spark")

New Tuple2 (2, "Tachyon"), new Tuple2 (3, "Hadoop"))

List ScoreList = Arrays.asList (new Tuple2 1,100)

New Tuple2 (2,95), new Tuple2 (3,80)

New Tuple2 (1,80), new Tuple2 (2,110)

New Tuple2 (2,90))

JavaPairRDD names = sc.parallelizePairs (nameList)

JavaPairRDD scores = sc.parallelizePairs (ScoreList)

JavaPairRDD nameAndScores = names.cogroup (scores)

NameAndScores.foreach (new VoidFunction () {

Public void call (Tuple2 t) throws Exception {

System.out.println ("ID:" + t.room1)

System.out.println ("Name:" + t.room2.room1)

System.out.println ("Score:" + t.room2.room2)

}

});

Sc.close ()

Running result:

ID:1

Name: [Spark]

Score: [100, 80]

ID:3

Name: [Hadoop]

Score: [80]

ID:2

Name: [Tachyon]

Score: [95, 110, 90]

Through Scala:

Val conf = new SparkConf () .setAppName ("RDDDemo") .setMaster ("local")

Val sc = new SparkContext (conf)

Val arr1 = Array (Tuple2 (1, "Spark"), Tuple2 (2, "Hadoop"), Tuple2 (3, "Tachyon"))

Val arr2 = Array (Tuple2 (1,100), Tuple2 (2,70), Tuple2 (3,90), Tuple2 (1,95), Tuple2 (2,65), Tuple2 (1,110))

Val rdd1 = sc.parallelize (arr1)

Val rdd2 = sc.parallelize (arr2)

Val rdd3 = rdd1.cogroup (rdd2)

Rdd3.collect () foreach (println)

Sc.stop ()

Running result:

(1, (CompactBuffer (Spark), CompactBuffer (100,95,110))

(3, (CompactBuffer (Tachyon), CompactBuffer (90)

(2, (CompactBuffer (Hadoop), CompactBuffer (70,65))

Note:

Source: DT_ big data DreamWorks (customized Spark distribution)

For more private content, please follow the Wechat official account: DT_Spark

If you are interested in big data Spark, you can listen to the Spark permanent free open course offered by teacher Wang Jialin at 20:00 every evening, address YY room number: 68917580

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.