Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to implement TopK with MapReduce

2025-02-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about how MapReduce implements TopK. Many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

Requirements: top 80% of all traffic records in HTTP log files, sorted by traffic value in descending order

Output format

HTTP log file:

1363157985066 13726230503 00-FD-07-A4-72-B8:CMCC 120.196.100.82 i02.c.aliimg.com 24 27 2481 24681 2001363157995052 13826544101 5C-0E-8B-C7-F1-E0:CMCC 120.197.40.4 40 264 0131363157991076 13926435656 20-10-7A-28-CC-0A:CMCC 120.196.100.99 24 132 1512 2001363154400022 13926251106 5C-0E-8B-8B-B1-50:CMCC 120.197.40.4 240 0 2001363157993044 18211575961-71 AC-CD-E6-18:CMCC-EASY 120.196.100.99 iface.qiyi.com Video website 15 12 1527 2106 2001363157995074 84138413 5C-0E-8B-8C-E8-20:7DaysInn 120.197.40.4 122.72.52.12 20 16 4116 1432 2001363157993055 13560439658 C4-17-FE-BA-DE-D9:CMCC 120.196.100.99 18 1516954 2001363157995033 15920133257 5C-0E-8B-C7-BA-20:CMCC 120.197.40.4 sug.so.360.cn Information Security 20 20 3156 2936 2001363157983019 13719199419 68-A1-B7-03-07-B1:CMCC-EASY 120.196.100.82 40 240 02001363157984041 13660577991 5C-0E-8B-92-5C-20:CMCC-EASY 120.197.40.4 s19.cnzz.com site Statistics 24 9 6960 690 2001363157973098 15013685858 5C-0E-8B-C7-F7-90:CMCC 120.197.40.4 rank.ie.sogou.com search engine 28 27 3659 3538 2001363157986029 15989002119 E8-99-C4-4e -93-E0:CMCC-EASY 120.196.100.99 www.umeng.com site Statistics 3 3 1938 180 2001363157992093 13560439658 C4-17-FE-BA-DE-D9:CMCC 120.196.100.99 15 9 918 4938 2001363157986041 13480253104 5C-0E-8B-C7-FC-80:CMCC-EASY 120.197.40.4 3 180 2001363157984040 13602846565 5C-0E-8B-8B-B6-00:CMCC 120.197.40.4 2052.flash3-http.qq.com Integrated Portal 15 12 1938 2910 2001363157995093 13922314466 00-FD-07-A2-EC-BA:CMCC 120.196.100.82 img.qfc.cn 12 3008 3720 2001363157982040 13502468823 5C-0A-5B-6A-0B-D4:CMCC-EASY 120.196.100.99 y0.ifengimg.com Integrated Portal 57 102 7335 110349 2001363157986072 18320173382-25-DB-4F-10-1A:CMCC-EASY 120.196.100.99 input.shouji.sogou.com search engine 21 18 9531 2412 2001363157990043 13925057413-1Fly64 E6-9A:CMCC 120.196.100.55 t3.baidu.com search engine 69 63 11058 48243 2001363157988072 13760778710 00-FD-07-A4-7B-08:CMCC 120.196.100.82 2 120 2001363157985066 13726238888 00-FD-07-A4-72-B8:CMCC 120.196.100.82 i02.c.aliimg.com 24 2481 2481 2001363157993055 13560436666 C4-17-FE-BA-DE-D9:CMCC 120.196.100.99 18 15 1116 954 200

Define a FlowBean class that implements the WritableComparable interface

Implement the write (), readFields (), compareTo () methods

Public class FlowBean implements WritableComparable {private String phoneNB;// number private long up_flow;// uplink traffic private long down_flow;// downlink traffic private long sum_flow;// total traffic public String getPhoneNB () {return phoneNB;} public void setPhoneNB (String phoneNB) {this.phoneNB = phoneNB;} public long getUp_flow () {return up_flow;} public void setUp_flow (long up_flow) {this.up_flow = up_flow } public long getDown_flow () {return down_flow;} public void setDown_flow (long down_flow) {this.down_flow = down_flow;} public long getSum_flow () {return sum_flow;} public void setSum_flow (long sum_flow) {this.sum_flow = sum_flow;} public FlowBean () {} public FlowBean (String phoneNB, long up_flow, long down_flow) {this.phoneNB = phoneNB; this.up_flow = up_flow This.down_flow = down_flow; this.sum_flow = up_flow + down_flow;} / * up_flow + "\ t" + down_flow + "\ t" + sum_flow * / @ Override public String toString () {return up_flow + "\ t" + down_flow + "\ t" + sum_flow } / * serialization, serialization and deserialization are in the same order * / @ Override public void write (DataOutput out) throws IOException {out.writeUTF (phoneNB); out.writeLong (up_flow); out.writeLong (down_flow); out.writeLong (sum_flow) } / * deserialization, deserialization and serialization are in the same order * / @ Override public void readFields (DataInput in) throws IOException {phoneNB = in.readUTF (); up_flow = in.readLong (); down_flow = in.readLong (); sum_flow = in.readLong () } / * sort by descending order of total traffic, but when the total traffic is equal, the contents of the two FlowBean objects are not equal * / @ Override public int compareTo (FlowBean o) {if (sum_flow = = o.sum_flow) {return 1;} return-Long.compare (sum_flow, o.sum_flow);}}

Define the Mapper class TopKFlowMapper

And override the map method

Public class TopKFlowMapper extends Mapper {/ / mapper output format: @ Override protected void map (LongWritable key, Text value, Context context) throws IOException, InterruptedException {String line = value.toString (); String [] data = StringUtils.split (line, "\ t"); String phoneNB = data [1]; long up_flow = Long.parseLong (data [7]); long down_flow = Long.parseLong (data [8]) Context.write (new Text (phoneNB), new FlowBean (phoneNB, up_flow, down_flow);}}

Define the Reducer class TopKFlowReducer

And implement reduce (), override the cleanup () method

Public class TopKFlowReducer extends Reducer {/ / uses the sorting function of TreeMap to sort FlowBean objects in descending order of total traffic: private Map treeMap = new TreeMap (); private double globalFlow = 0 / reducer / global traffic counter. The initial value is 0 / / reducer input format: @ Override protected void reduce (Text key, Iterable values, Context context) throws IOException, InterruptedException {long up_sum = 0; long down_sum = 0 For (FlowBean bean: values) {up_sum + = bean.getUp_flow (); down_sum + = bean.getDown_flow ();} / / every time the total traffic of a phoneNB is obtained, it is added to the global traffic counter globalCount globalFlow + = (up_sum + down_sum) / / use the sorting function of TreeMap to sort FlowBean objects in descending order of total traffic (new FlowBean (", up_sum, down_sum), key.toString ());} / / the cleanup method is called once @ Override protected void cleanup (Context context) throws IOException, InterruptedException {double itemCount = 0; for (Map.Entry item: treeMap.entrySet ()) {if (itemCount > globalFlow * 0.8) {return } / / output only the context.write (new Text (item.getValue ()), new VLongWritable (item.getKey (). GetSum_flow ()) of the top 80 records of the global traffic counter globalCount; itemCount + = item.getKey (). GetSum_flow ();}

Test TopK

Public static void main (String [] args) throws IOException, ClassNotFoundException, InterruptedException {Job job = Job.getInstance (new Configuration ()); job.setJarByClass (TopKFlowRunner.class); / / set the main class job.setMapperClass (TopKFlowMapper.class) of job; / / set Mapper class job.setReducerClass (TopKFlowReducer.class); / / set Reducer class job.setMapOutputKeyClass (Text.class); / / set the type job.setMapOutputValueClass (FlowBean.class) of Key output in map phase / / set the type of output Value of map phase job.setOutputKeyClass (Text.class); / / set the type of output Key of reduce phase job.setOutputValueClass (VLongWritable.class); / / set the type of output Value of reduce phase / / set the job input path (obtained from the main method parameter args) FileInputFormat.setInputPaths (job, new Path (args [0])) / / set the job output path (obtained from the main method parameter args) FileOutputFormat.setOutputPath (job, new Path (args [1])); job.waitForCompletion (true); / / submit job}

The result file of job output:

13726230503 27162

13726238888 27162

13925057413 11121

18320173382 9549

13502468823 7437

13660577991 6969

13922314466 6728

13560439658 6292

After reading the above, do you have any further understanding of how MapReduce implements TopK? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report