What type of operators are commonly used in Spark RDD 07/03 Update SLTechnology News&Howtos

What type of operators are commonly used in Spark RDD

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

Editor to share with you what type of Spark RDD commonly used operators, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to understand it!

Spark RDD common operators: Value type

One of the reasons why Spark is more flexible and powerful than Hadoop is that Spark has many useful operators built in, that is, methods. By combining these methods, programmers can write the functions they want. To put it bluntly, spark programming is the use of spark operators. Let's explain in detail the common operators of SparkValue types.

Map function description:

Map () receives a function that maps the elements in the RDD one by one, either type conversion or value conversion, RDD programming the return result of the function as the result.

Function signature: def map [U: ClassTag] (f: t = > U): RDD [U] case demonstration val sparkConf = new SparkConf (). SetMaster ("local [*]"). SetAppName ("Operator") val sc = new SparkContext (sparkConf) / / operator-map val rdd = sc.makeRDD (List (1,2,3,4), 2) val mapRdd1 = rdd.map (_ * 2) mapRdd1.collect (). Foreach (println) sc.stop ()

Running result

2468mapPartitons function description:

The data to be processed is sent to the computing node for processing on a partition-by-partition basis. MapPartition operates on the iterator of each partition of RDD and returns the iterator. The processing here can be processed arbitrarily.

Function signature: def mapPartitions [U: ClassTag] (f: Iterator [T] = > Iterator [U], preservesPartitioning: Boolean = false): RDD [U] case demonstration def main (args: Array [String]): Unit = {val sparkConf = new SparkConf (). SetMaster ("local [*]"). SetAppName ("Operator") val sc = new SparkContext (sparkConf) / / operator-mapPartitons calculates the maximum number of val rdd = sc.makeRDD for each partition (List (1,34,36) 4) val mapParRdd = rdd.mapPartitions (iter = > {List (iter.max) .iterator}) mapParRdd.foreach (println) sc.stop ()}

Running result:

62243534345mapPartitonsWithIndex function description:

The data to be processed is sent to the computing node in partition units, where the processing can be processed arbitrarily, even if the data is filtered, and the index value of the current partition can be obtained at the same time.

Function signature: def mapPartitionsWithIndex [U: ClassTag] (f: (Int, Iterator [T]) = > Iterator [U], preservesPartitioning: Boolean = false): RDD [U] case demonstration: def main (args: Array [String]): Unit = {val conf = new SparkConf (). SetMaster ("local [*]"). SetAppName ("rdd") val sc = new SparkContext (conf) val rdd = sc.makeRDD (List ("Hello Spark") "Hello Scala", "Word Count"), 2) val mapRDD = rdd.flatMap (_ .split (")) val mpwiRdd = mapRDD.mapPartitionsWithIndex ((index, datas) = > {datas.map (num = > {(index, num)})}) mpwiRdd.collect () .foreach (println)}

Running result:

Flattening the data to print only the data in the first partition def main (args: Array [String]): Unit = {val conf = new SparkConf () .setMaster ("local [*]") .setAppName ("rdd") val sc = new SparkContext (conf) val rdd = sc.makeRDD (List ("Hello Spark", "Hello Scala", "Word Count") 2) val mapRDD = rdd.flatMap (_ .split (") val mpwiRdd = mapRDD.mapPartitionsWithIndex ((index, datas) = > {if (index==0) {datas.map (num = > {(index, num)})} else {Nil.iterator}}) mpwiRdd.collect () .foreach (println)

Running result:

(0Hello) (0menSpark) description of flatMap function:

Mapping is done after flattening the data, so the operator is also called flattening mapping.

Function signature: def flatMap [U: ClassTag] (f: t = > TraversableOnce [U]): RDD [U] case demonstration:

Flatten each word

Def main (args: Array [String]): Unit = {val sparkConf = new SparkConf (). SetMaster ("local [*]"). SetAppName ("Operator") val sc = new SparkContext (sparkConf) / / operator-map val rdd = sc.makeRDD (List ("Hello Scala", "Hello Spark"), 2) val FltRdd = rdd.flatMap (_ .split (")) FltRdd.foreach (println) sc.stop ()}

Running result:

HelloScalaHelloSparkglom function description:

The purpose of glom is to merge the data of a partition into an array.

Function signature: def glom (): RDD [Array [T]] case demonstration: merge rdd elements of different partitions into one partition def main (args: Array [String]): Unit = {val conf = new SparkConf (). SetMaster ("local [*]"). SetAppName ("rdd") val sc = new SparkContext (conf) val rdd = sc.makeRDD (List (1, 2, 3, 4, 5, 6, 7, 8, 9) 2) val glomRdd = rdd.glom () glomRdd.collect (). Foreach (data= > println (data.mkString (",")) sc.stop ()}

Running result:

1, 2, 3, 45, 6, 7, 7, 8, 9, groupBy function:

If the data is grouped according to the specified rules, the partition remains the same by default, and the odd data will be disturbed. Our operation like this is shuffer.

Function signature: def groupBy [K] (f: t = > K) (implicit kt: ClassTag [K]): RDD [(K, Iterable [T])] case demonstration: groupby partition def main (args: Array [String]): Unit = {val conf = new SparkConf (). SetMaster ("local [*]"). SetAppName ("rdd") val sc = new SparkContext (conf) val rdd = sc.makeRDD (List (1,2,3,4,5,6,7,810)) 2) val groupByRDD = rdd.groupBy (_% 2 = = 0) groupByRDD.collect () .foreach (println) sc.stop ()}

Running result:

(false,CompactBuffer (1,3,5,7) (true,CompactBuffer (2,4,6,8,10)) are grouped by the first letter of the word def main (args: Array [String]): Unit = {val conf = new SparkConf (). SetMaster ("local [*]"). SetAppName ("rdd") val sc = new SparkContext (conf) val rdd = sc.makeRDD (List ("Hello", "Tom", "Timi", "Scala") "Spark")) val groupByRDD = rdd.groupBy (_ .charat (0)) groupByRDD.collect () .foreach (println) sc.stop ()}

Running result:

(Tom CompactBuffer (Tom, Timi)) (HMagic CompactBuffer (Hello)) (SmemeCompactBuffer (Scala, Spark)) filter function description:

Filter means filter, so the function of filter operator is the function of filter. Filter will filter and filter according to the specified rules, and the qualified data will be retained, and the inconsistent data will be discarded. When the data is filtered, the partition will remain unchanged, but the data in the partition may be uneven, and data skew may occur in the production environment.

Function signature: def filter (f: t = > Boolean): RDD [T] case demonstration: screening out the digits def main (args: Array [String]): Unit = {val conf = new SparkConf (). SetMaster ("local [*]"). SetAppName ("rdd") val sc = new SparkContext (conf) val rdd = sc.makeRDD (List (46, 276, 235, 246, 3276, 235, 234, 6245) val filterRDD = rdd.filter (_% 2 = 0) filterRDD.collect () foreach (println) sc.stop ()}

Running result:

4624623463276234623424624624626

two。 Filter words that contain H

Def main (args: Array [String]): Unit = {val conf = new SparkConf (). SetMaster ("local [*]"). SetAppName ("rdd") val sc = new SparkContext (conf) val rdd = sc.makeRDD (List ("Hello", "Horber", "Hbeer", "ersfgH", "Scala", "Hadoop", "Zookeeper")) val filterRDD = rdd.filter (_ .Singapore ("H")) filterRDD.collect () .foreach (println) sc.stop ()}

Running result:

HelloHorberHbeerersfgHHadoop above is all the content of this article "what is the type of Spark RDD common operators?" Thank you for your reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.