In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains "how to use spark sample". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn how to use spark sample.
/ / parallelize demo
Val num=sc.parallelize (1 to 10)
Val doublenum = num.map (_ * 2)
Val threenum = doublenum.filter (_% 3 = = 0)
Threenum.collect
Threenum.toDebugString
Val num1=sc.parallelize (1 to 10pc6)
Val doublenum1 = num1.map (_ * 2)
Val threenum1 = doublenum1.filter (_% 3 = = 0)
Threenum1.collect
Threenum1.toDebugString
Threenum.cache ()
Val fournum = threenum.map (x = > xxx)
Fournum.collect
Fournum.toDebugString
Threenum.unpersist ()
Num.reduce (_ + _)
Num.take (5)
Num.first
Num.count
Num.take (5) .foreach (println)
/ / Kmurv Demo
Val kv1=sc.parallelize (List (("A", 1), ("B", 2), ("C", 3), ("A", 4), ("B", 5)
Kv1.sortByKey () .collect / / Note that the parentheses of sortByKey cannot be omitted
Kv1.groupByKey () collect
Kv1.reduceByKey (_ + _). Collect
Val kv2=sc.parallelize (List (("A", 4), ("A", 4), ("C", 3), ("A", 4), ("B", 5)
Kv2.distinct.collect
Kv1.union (kv2). Collect
Val kv3=sc.parallelize (List (("A", 10), ("B", 20), ("D", 30)
Kv1.join (kv3). Collect
Kv1.cogroup (kv3). Collect
Val kv4=sc.parallelize (List (List (1)), List (3)
Kv4.flatMap (x = > x.map (_ + 1)). Collect
/ / File reading demo
Val rdd1 = sc.textFile ("hdfs://hadoop1:8000/dataguru/week2/directory/")
Rdd1.toDebugString
Val words=rdd1.flatMap (_ .split (""))
Val wordscount=words.map (x = > (xprimel)) .reduceByKey (_ + _)
Wordscount.collect
Wordscount.toDebugString
Val rdd2 = sc.textFile ("hdfs://hadoop1:8000/dataguru/week2/directory/*.txt")
Rdd2.flatMap (_ .split (")) .map (x = > (xmem1)) .reduceByKey (_ + _) .map
/ / gzip compressed file
Val rdd3 = sc.textFile ("hdfs://hadoop1:8000/dataguru/week2/test.txt.gz")
Rdd3.flatMap (_ .split (")) .map (x = > (xmem1)) .reduceByKey (_ + _) .map
/ / Log processing demonstration
/ / http://download.labs.sogou.com/dl/q.html full version (2GB): gz format
/ / access time\ t user ID\ t [query word]\ t ranking of the URL in the returned result\ t sequence number clicked by user\ t URL clicked by user
/ / SogouQ1.txt, SogouQ2.txt and SogouQ3.txt are intercepted from SogouQ data log files with head-n or tail-n, respectively.
/ / how many data are ranked No. 1 in search results, but ranked No. 2 in click order?
Val rdd1 = sc.textFile ("hdfs://hadoop1:8000/dataguru/data/SogouQ1.txt")
Val rdd2=rdd1.map (_ .split ("\ t")) .filter (_ .length = = 6)
Rdd2.count ()
Val rdd3=rdd2.filter (_ (3) .toInt = = 1). Filter (_ (4) .toInt = = 2)
Rdd3.count ()
Rdd3.toDebugString
/ / ranking of the number of session queries
Val rdd4=rdd2.map (x = > (x (1), 1)) .reduceByKey (_ + _) .map (x = > (x.fug2rect x.room1)) .sortByKey (false) .map (x = > (x.fug2jue x.map1))
Rdd4.toDebugString
Rdd4.saveAsTextFile ("hdfs://hadoop1:8000/dataguru/week2/output1")
/ / cache () demo
/ / check the block command: bin/hdfs fsck / dataguru/data/SogouQ3.txt-files-blocks-locations
Val rdd5 = sc.textFile ("hdfs://hadoop1:8000/dataguru/data/SogouQ3.txt")
Rdd5.cache ()
Rdd5.count ()
Rdd5.count () / / compare time
/ / join demo
Val format = new java.text.SimpleDateFormat ("yyyy-MM-dd")
Case class Register (d: java.util.Date, uuid: String, cust_id: String, lat: Float,lng: Float)
Case class Click (d: java.util.Date, uuid: String, landing_page: Int)
Val reg = sc.textFile ("hdfs://hadoop1:8000/dataguru/week2/join/reg.tsv") .map (_ .split ("\ t")) .map (r = > (r (1), Register (format.parse (r (0)), r (1), r (2), r (3) .toFloat, r (4) .toFloat)
Val clk = sc.textFile ("hdfs://hadoop1:8000/dataguru/week2/join/clk.tsv"). Map (_ .split ("\ t")) .map (c = > (c (1), Click (format.parse (c (0)), c (1), c (2) .trim.toInt)
Reg.join (clk) .take (2)
Thank you for your reading, the above is the content of "how to use spark sample", after the study of this article, I believe you have a deeper understanding of how to use spark sample, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.