In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
Introduction to action operator
Action class operators are also a class of operators (functions) called action operators, such as foreach,collect, count, etc. Transformations class operators are delayed execution, and Action class operators are triggered execution. There are several Action class operators executed in an application (that is, an application we wrote), and there are several jobs running.
1.reduce
All elements in the dataset are aggregated by the func function, which must be associative to ensure that it can be executed correctly concurrently.
scala> val rdd1 = sc.makeRDD(1 to 10)rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[3] at makeRDD at :24scala> rdd1.reduce(_+_)res3: Int = 552.collect
In driver programs, all elements of a data set are returned as an array. This usually returns a small enough subset of data to be used after using filter or other operations.
scala> var rdd1 = sc.makeRDD(1 to 10)rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[2] at makeRDD at :24scala> rdd1.collectres2: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)3.count
Returns the number of elements in the dataset
scala> val rdd1 = sc.makeRDD(1 to 10)rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[4] at makeRDD at :24scala> rdd1.countres4: Long = 104.first
Returns the first element of the dataset (similar to take(1))
scala> val rdd1 = sc.makeRDD(1 to 10)rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[5] at makeRDD at :24scala> rdd1.firstres5: Int = 15.take
Returns an array consisting of the first n elements of the dataset. Note that this operation is not currently performed in parallel, but on the machine where the driver program resides
scala> val rdd1 = sc.makeRDD(1 to 10)rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[7] at makeRDD at :24scala> rdd1.take(3)res6: Array[Int] = Array(1, 2, 3)6.takeSample(withReplacement,num,seed)
withReplacement: Is the result repeatable?
num: how many
seed: random seed
Returns an array consisting of num elements randomly sampled from the dataset, optionally replacing the missing elements with random numbers, seed for the specified random number generator seed
principle
takeSample() function and sample function are the same principle, but do not use relative proportion sampling, but according to the set number of samples, and return the result is no longer RDD, but equivalent to collecting () the sampled data, the set of returned results is a single array
scala> val rdd1 = sc.makeRDD(1 to 10)rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[20] at makeRDD at :24scala> rdd1.takeSample(true,4,10)res19: Array[Int] = Array(10, 10, 2, 3)7.takeOrdered
takeOrdered is similar to top, except that it returns elements in reverse order.
top default reverse order, taskOrdered default positive order
The top method is actually called taskOrdered, and then the result is reversed.
def top(num: Int)(implicit ord: Ordering[T]): Array[T] = withScope { takeOrdered(num)(ord.reverse) }scala> val rdd1 = sc.makeRDD(1 to 10)rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[23] at makeRDD at :24scala> rdd1.top(5)res22: Array[Int] = Array(10, 9, 8, 7, 6)scala> rdd1.takeOrdered(5)res23: Array[Int] = Array(1, 2, 3, 4, 5)8.saveAsTextFile
saveAsTextFile is used to store the RDD as a text file to the file system
val conf = new SparkConf() .setAppName("saveFile") .setMaster("local[*]")val sc = new SparkContext(conf)val rdd1: RDD[Int] = sc.parallelize(1 to 10)rdd1.repartition(1).saveAsTextFile("/tmp/fff")9.saveAsSequenceFile
saveAsSequenceFile is used to save the RDD to HDFS in the file format of SequenceFile. Use methods similar to saveAsTextFile
10.saveAsObjectFile
saveAsObjectFile is used to serialize elements in RDD into objects and store them in files. Use methods similar to saveAsTextFile
11.countByKey
Valid for RDD of type (K,V), returns a map of (K,Int) pairs, indicating the number of elements each can correspond to.
scala> val rdd1 = sc.makeRDD(Array(("A",0),("A",2),("B",1),("B",2),("C",3)))rdd1: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[3] at makeRDD at :24scala> rdd1.countByKeyres1: scala.collection.Map[String,Long] = Map(B -> 2, A -> 2, C -> 1)12.foreach
On each element of the data set, the function func,t is run, usually to update an accumulator variable or to interact with an external storage system.
scala> val rdd1 = sc.makeRDD(Array(("A",0),("A",2),("B",1),("B",2),("C",3)))rdd1: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[9] at makeRDD at :24scala> rdd1.collect.foreach(println(_))(A,0)(A,2)(B,1)(B,2)(C,3)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.