In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Brief introduction of action operator
Action class operators are also a class of operators (functions) called action operators, such as foreach,collect,count and so on. Transformations class operators delay execution and Action class operators trigger execution. An application application (that is, an application we wrote) has several Action class operators executed and several job running.
1.reduce
All elements in the dataset are aggregated by the function func, which must be associative to ensure that it can be executed concurrently correctly.
Scala > val rdd1 = sc.makeRDD (1 to 10) rdd1: org.apache.spark.rdd.RDD [Int] = ParallelCollectionRDD [3] at makeRDD at: 24scala > rdd1.reduce (_ + _) res3: Int = 552.collect
In driver programs, return all elements of a dataset as an array, which usually returns a sufficiently small subset of data for reuse after using filter or other operations
Scala > var rdd1 = sc.makeRDD (1 to 10) rdd1: org.apache.spark.rdd.RDD [Int] = ParallelCollectionRDD [2] at makeRDD at: 24scala > rdd1.collectres2: Array [Int] = Array (1,2,3,4,5,6,7,8,9,10) 3.count
Returns the number of elements in the dataset
Scala > val rdd1 = sc.makeRDD (1 to 10) rdd1: org.apache.spark.rdd.RDD [Int] = ParallelCollectionRDD [4] at makeRDD at: 24scala > rdd1.countres4: Long = 104.first
Returns the first element of the dataset (similar to take (1))
Scala > val rdd1 = sc.makeRDD (1 to 10) rdd1: org.apache.spark.rdd.RDD [Int] = ParallelCollectionRDD [5] at makeRDD at: 24scala > rdd1.firstres5: Int = 15.take
Returns an array consisting of the first n elements of the dataset. Note that this operation is not currently performed in parallel, but on the machine where the driver program is located
Scala > val rdd1 = sc.makeRDD (1 to 10) rdd1: org.apache.spark.rdd.RDD [Int] = ParallelCollectionRDD [7] at makeRDD at: 24scala > rdd1.take (3) res6: Array [Int] = Array (1,2,3) 6.takeSample (withReplacement,num,seed)
WithReplacement: whether it can be repeated in the result
Num: how many do you take
Seed: random seed
Returns an array consisting of randomly sampled num elements in the dataset. You can choose whether to replace the insufficient parts with random numbers. Seed is used for the specified random number generator seed.
Principle
The takeSample () function and the sample function are the same principle, but do not use relative proportional sampling, but sample according to the set number of samples. At the same time, the return result is no longer RDD, but is equivalent to collect () on the sampled data, and the set of returned results is a stand-alone array.
Scala > val rdd1 = sc.makeRDD (1 to 10) rdd1: org.apache.spark.rdd.RDD [Int] = ParallelCollectionRDD [20] at makeRDD at: 24scala > rdd1.takeSample (true,4,10) res19: Array [Int] = Array (10,10,2,3) 7.takeOrdered
TakeOrdered is similar to top, except that elements are returned in reverse order from top.
Top default reverse order, taskOrdered default positive order
The top method is actually the result of calling taskOrdered and then reversing it.
Def top (num: Int) (implicit ord: Ordering [T]): Array [T] = withScope {takeOrdered (num) (ord.reverse)} scala > val rdd1 = sc.makeRDD (1 to 10) rdd1: org.apache.spark.rdd.RDD [Int] = ParallelCollectionRDD [23] at makeRDD at: 24scala > rdd1.top (5) res22: Array [Int] = Array (10,9,8,7,6) scala > rdd1.takeOrdered (5) res23: Array [Int] = Array (1,2,3,4,5) 8.saveAsTextFile
SaveAsTextFile is used to store RDD in the file system as a text file.
Val conf = new SparkConf () .setAppName ("saveFile") .setMaster ("local [*]") val sc = new SparkContext (conf) val rdd1: RDD [Int] = sc.parallelize (1 to 10) rdd1.repartition (1). SaveAsTextFile ("/ tmp/fff") 9.saveAsSequenceFile
SaveAsSequenceFile is used to save RDD to HDFS in the file format of SequenceFile. The method of use is similar to saveAsTextFile
10.saveAsObjectFile
SaveAsObjectFile is used to serialize elements in RDD into objects and store them in a file. The method of use is similar to saveAsTextFile
11.countByKey
Valid for RDD of type (KMagne V). Return a map of (KMagneInt) pair, indicating the number of elements that each can correspond to.
Scala > val rdd1 = sc.makeRDD (Array (("A", 0), ("A", 2), ("B", 1), ("B", 2), ("C", 3)) rdd1: org.apache.spark.rdd.RDD [(String, Int)] = ParallelCollectionRDD [3] at makeRDD at: 24scala > rdd1.countByKeyres1: scala.collection.Map [String,Long] = Map (B-> 2, A-> 2, C-> 1) 12.foreach
On each element of a dataset, the run function func,t is usually used to update an accumulator variable or to interact with an external storage system
Scala > val rdd1 = sc.makeRDD (Array (("A", 0), ("A", 2), ("B", 1), ("B", 2), ("C", 3)) rdd1: org.apache.spark.rdd.RDD [(String, Int)] = ParallelCollectionRDD [9] at makeRDD at: 24scala > rdd1.collect.foreach (println (_)) (AMague 0) (A Magazine 2) (BMague 2) (BMague 2)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.