In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
1. The difference between transformation and action of spark
Spark has some basic transformation and action operations, in which transformation forms various types of RDD,action and does not form RDD, but accumulates, merges and saves RDD.
2. What are the transformation?
There are 13 species of transformation, including map, filter, flatMap (different from map), Sample, groupByKey, ReduceByKey, Union, Join, cogroup, crossProduct, mapValues, sort and partitionBy. And sortByKey?
1 、 map:
Val rdd = sc.parallelize (List (1, 2, 3, 4, 5, 6))
Val mapRdd = rdd.map (_ * 2) / / this is a typical functional programming
MapRdd.collect () / / the above map is transformation, and the collect here starts to execute. It is action, which returns an Array Array (2, 4, 6, 6, 10, 12)
Map (x = > (xmem1)), which maps something like map (x) to map (xmem1), which is generally used to count Key.
2 、 filter
Filter, select function
Val filterRdd = mapRdd.filter (_ > 5)
FilterRdd.collect () / / returns an Array of all data greater than 5, Array (6, 8, 10, 12)
3. Flatmap plus reduceBykey
Val wordcount = rdd.flatMap (_ .split ('). Map ((_, 1)). ReduceByKey (_ + _) / / splits each line according to spaces, and then flatMap merges multiple list into a list, and finally turns each element into a tuple
/ / then add the value with the same key elements. Refer to the function definition in the image above. For reduceByKey, the function passed in operates on value.
Wordcount.saveAsTextFile ("/ xxx/ss/aa") / / save the results to the file system
Wordcount.collect / / can get an array
4 、 groupByKey
After the files are divided by spaces, they are grouped into groupByKey by words.
Val wordcount=rdd.flatMap (_ .split (')) .map (_ .1) .groupByKey
Use collect to see the results
Wordcount.collect
5 、 Union
Two are merged into one
Val rdd1 = sc.parallelize (List (('astatine 1), (' axiom, 2)
Val rdd2 = sc.parallelize (List (('baked dint 1), (' baked, 2)
Val result_union = rdd1 union rdd2 / / the result is that the two list are merged into one, List (('aura, 2), (' baked, 1), ('baked, 2))
6 、 Join
Descartes' accumulated work, group round robin
Val rdd1 = sc.parallelize (List (('astatine 1), (' ajar, 2), ('baked, 3)
Val rdd2 = sc.parallelize (List ('aura dome 4), (' baked, 5)
Val result_union = rdd1 join rdd2 / / the result is to make a Cartesian product of two list, Array (('averse, (1) 4), ((2) 4), (' baked, (3, 5))
7 、 sortByKey
Sort, very easy to use, ha.
Val wordcount = rdd.flatMap (_ split (')). Map (_, 1). ReduceByKey (_ + _) .map (x = > (x.room2, x.room1)) .sortByKey (false) .map (x = > (x.room2, x.map))
/ / actually completed a sort by value process, sortByKey (false), indicating reverse order
What are the features of action?
There are count, collect, reduce, lookup and save5 species in action.
1 、 count
Calculate the number of rdd
Val rdd = sc.textFile ("/ xxx/sss/ee")
Rdd.count / / count rows
Rdd.cache / / can keep rdd in memory
Rdd.count / / counts the number of lines, but because of the cache above, the speed here will be very fast
2 、 collect
The collect function can extract all the data items in rdd.
Val rdd1=sc.parallelize (List ()) (()), (()
Val rdd2=sc.parallelize (List (('cymbr. 1), (' daddy.
Val result=rdd1 union rdd2
Use the collect operation to view the execution result
3 、 reduce
Map and reduce are the two cores of hadoop, map is mapping, and reduce is simplified.
Val rdd = sc.parallelize (List (1, 2, 3, 4)
Rdd.reduce (_ + _) / / reduce is an action, and the result here is 10
4 、 lookup
Looking for work.
Val rdd = sc.parallelize (List (('astatine 1), (' axiom, 2), ('baked dagger 1), (' baked, 2)
Rdd.lookup ("a") / / returns a seq. (1,2) proposes the value of all the elements corresponding to a to form a seq.
5 、 save
Query the data with the first click order and the second click order of the search results
Val rdd1 = sc.textFile ("hdfs://192.168.0.10:9000/input/SogouQ2.txt"). Map (_ .split ("\ t") / / length is 6 error, as if the log is not standard, some are 6, some are not .filter (_ .length = = 6)
Rdd1.count ()
Val rdd2=rdd1.filter (_ (3) .toInt = = 1). Filter (_ (4) .toInt = = 2) .count ()
Rdd2.saveAsTextFile ("hdfs://192.168.0.10:9000/output/sogou1111/")
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.