Commonly used action operator 07/01 Update SLTechnology News&Howtos

Commonly used action operator

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Introduction to action operator

Action class operators are also a class of operators (functions) called action operators, such as foreach,collect, count, etc. Transformations class operators are delayed execution, and Action class operators are triggered execution. There are several Action class operators executed in an application (that is, an application we wrote), and there are several jobs running.

1.reduce

All elements in the dataset are aggregated by the func function, which must be associative to ensure that it can be executed correctly concurrently.

scala> val rdd1 = sc.makeRDD(1 to 10)rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[3] at makeRDD at :24scala> rdd1.reduce(_+_)res3: Int = 552.collect

In driver programs, all elements of a data set are returned as an array. This usually returns a small enough subset of data to be used after using filter or other operations.

scala> var rdd1 = sc.makeRDD(1 to 10)rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[2] at makeRDD at :24scala> rdd1.collectres2: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)3.count

Returns the number of elements in the dataset

scala> val rdd1 = sc.makeRDD(1 to 10)rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[4] at makeRDD at :24scala> rdd1.countres4: Long = 104.first

Returns the first element of the dataset (similar to take(1))

scala> val rdd1 = sc.makeRDD(1 to 10)rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[5] at makeRDD at :24scala> rdd1.firstres5: Int = 15.take

Returns an array consisting of the first n elements of the dataset. Note that this operation is not currently performed in parallel, but on the machine where the driver program resides

scala> val rdd1 = sc.makeRDD(1 to 10)rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[7] at makeRDD at :24scala> rdd1.take(3)res6: Array[Int] = Array(1, 2, 3)6.takeSample(withReplacement,num,seed)

withReplacement: Is the result repeatable?

num: how many

seed: random seed

Returns an array consisting of num elements randomly sampled from the dataset, optionally replacing the missing elements with random numbers, seed for the specified random number generator seed

principle

takeSample() function and sample function are the same principle, but do not use relative proportion sampling, but according to the set number of samples, and return the result is no longer RDD, but equivalent to collecting () the sampled data, the set of returned results is a single array

scala> val rdd1 = sc.makeRDD(1 to 10)rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[20] at makeRDD at :24scala> rdd1.takeSample(true,4,10)res19: Array[Int] = Array(10, 10, 2, 3)7.takeOrdered

takeOrdered is similar to top, except that it returns elements in reverse order.

top default reverse order, taskOrdered default positive order

The top method is actually called taskOrdered, and then the result is reversed.

def top(num: Int)(implicit ord: Ordering[T]): Array[T] = withScope { takeOrdered(num)(ord.reverse) }scala> val rdd1 = sc.makeRDD(1 to 10)rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[23] at makeRDD at :24scala> rdd1.top(5)res22: Array[Int] = Array(10, 9, 8, 7, 6)scala> rdd1.takeOrdered(5)res23: Array[Int] = Array(1, 2, 3, 4, 5)8.saveAsTextFile

saveAsTextFile is used to store the RDD as a text file to the file system

val conf = new SparkConf() .setAppName("saveFile") .setMaster("local[*]")val sc = new SparkContext(conf)val rdd1: RDD[Int] = sc.parallelize(1 to 10)rdd1.repartition(1).saveAsTextFile("/tmp/fff")9.saveAsSequenceFile

saveAsSequenceFile is used to save the RDD to HDFS in the file format of SequenceFile. Use methods similar to saveAsTextFile

10.saveAsObjectFile

saveAsObjectFile is used to serialize elements in RDD into objects and store them in files. Use methods similar to saveAsTextFile

11.countByKey

Valid for RDD of type (K,V), returns a map of (K,Int) pairs, indicating the number of elements each can correspond to.

scala> val rdd1 = sc.makeRDD(Array(("A",0),("A",2),("B",1),("B",2),("C",3)))rdd1: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[3] at makeRDD at :24scala> rdd1.countByKeyres1: scala.collection.Map[String,Long] = Map(B -> 2, A -> 2, C -> 1)12.foreach

On each element of the data set, the function func,t is run, usually to update an accumulator variable or to interact with an external storage system.

scala> val rdd1 = sc.makeRDD(Array(("A",0),("A",2),("B",1),("B",2),("C",3)))rdd1: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[9] at makeRDD at :24scala> rdd1.collect.foreach(println(_))(A,0)(A,2)(B,1)(B,2)(C,3)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.