Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

6. MapReduce ranking example-- get the information of the goods with the highest price

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

1. Demand

Get the most expensive item in each order

Knowledge points used:

Custom sorting, including general sorting, secondary sorting, grouping sorting

Custom Partition

2. Data input and output formats

Data entry format:

One record order for each item sold id goods id commodity prices 0000001 Pdt_01 222.80000002 Pdt_06 722.40000001 Pdt_05 25.80000003 Pdt_01 222.80000003 Pdt_01 33.80000002 Pdt_03 522.80000002 Pdt_04 122.4

Data output format:

There is one file for each order, and each file displays the information about the most expensive item in the order.

3. Analysis

Map phase:

Because the most expensive item is required for each order, it must be sorted twice according to the order number and the price of the goods. After that, the order number, commodity id, and commodity price are combined into a bean object, as key, as the output of map.

Custom zones:

Our demand is to count the most expensive items in the same order, so this requires that all items of the same order must fall in the same partition (where the number of partitions is greater than 1) before statistical processing. If in different partitions, it is impossible to count, because there is no correlation between non-reduce. The implementation here is to customize the partition, using the order ID to partition, so that the goods items of the same order ID fall in the same partition. And in the process that the map output is automatically partitioned according to the order id, the key is sorted by id and price first, so that it is actually sorted according to the commodity price among the goods of the same order.

Reduce phase:

The previous map output data has sorted the prices of goods in each order, and the first item is the highest-priced item in the order. In fact, you only need to take out the first KV here. Use custom group grouping sorting to aggregate the KV of the same order ID but different goods into a group, because in fact the key of each group of KV is different, and the key in the group is based on the key of the first KV that enters the group, and the first KV that enters the group is actually the key of the highest price item in the same order obtained after the previous map ranking, so it can be output.

4. Code implementation

OrderBean

Package GroupOrder;import lombok.AllArgsConstructor;import lombok.Getter;import lombok.NoArgsConstructor;import lombok.Setter;import org.apache.hadoop.io.WritableComparable;import java.io.DataInput;import java.io.DataOutput;import java.io.IOException;@Setter@Getter@NoArgsConstructor@AllArgsConstructorpublic class OrderBean implements WritableComparable {private int ID; private String productID; private double price / * * Secondary sorting: first sort by id. If the same, sort by commodity price * / @ Override public int compareTo (OrderBean o) {if (this.ID > o.getID ()) {return 1;} else if (this.ID)

< o.getID()){ return -1; } else { return this.price >

O.getPrice ()?-1: 1;} @ Override public void write (DataOutput dataOutput) throws IOException {dataOutput.writeInt (this.ID); dataOutput.writeDouble (this.price); dataOutput.writeUTF (this.productID);} @ Override public void readFields (DataInput dataInput) throws IOException {this.ID = dataInput.readInt (); this.price = dataInput.readDouble () This.productID = dataInput.readUTF ();} @ Override public String toString () {return this.ID + "\ t" + this.productID + "\ t" + this.price; / / return this.ID + "\ t" + this.price;}}

Map

Package GroupOrder;import org.apache.avro.Schema;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.NullWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;import java.io.IOException;public class OrderMapper extends Mapper {OrderBean k = new OrderBean (); @ Override protected void map (LongWritable key, Text value, Context context) throws IOException, InterruptedException {String line = value.toString () String [] fields = line.split ("\ t"); k.setID (Integer.parseInt (fields [0])); k.setProductID (fields [1]); k.setPrice (Double.parseDouble (fields [2])); context.write (k, NullWritable.get ());}}

Partitioner

Package GroupOrder;import org.apache.hadoop.io.NullWritable;import org.apache.hadoop.mapreduce.Partitioner;public class OrderPartitioner extends Partitioner {/ / partition @ Override public int getPartition (OrderBean orderBean, NullWritable nullWritable, int I) {return (orderBean.getID () & Integer.MAX_VALUE)% I;}} according to order id

Reduce

Package GroupOrder;import org.apache.hadoop.io.NullWritable;import org.apache.hadoop.mapreduce.Reducer;import java.io.IOException;public class OrderReducer extends Reducer {@ Override protected void reduce (OrderBean key, Iterable values, Context context) throws IOException, InterruptedException {context.write (key, NullWritable.get ());}}

GroupCompartor

The grouping of package GroupOrder;import org.apache.hadoop.io.WritableComparable;import org.apache.hadoop.io.WritableComparator;import java.io.DataInput;import java.io.DataOutput;import java.io.IOException;/** before customizing reduce is based on * * / public class OrderGroupCompartor extends WritableComparator {protected OrderGroupCompartor () {super (OrderBean.class, true);} / * based on the ID in the orderbean object. * if the same ID thinks it is the same group, a group will only call reduce * * @ param a * @ param b * @ return * / @ Override public int compare (WritableComparable a, WritableComparable b) {OrderBean aOrderBean = (OrderBean) a; OrderBean bOrderBean = (OrderBean) b; if (aOrderBean.getID () > bOrderBean.getID ()) {return 1 } else if (aOrderBean.getID () < bOrderBean.getID ()) {return-1;} else {return 0;}

Driver

Package GroupOrder;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.NullWritable;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import java.io.IOException Public class OrderDriver {public static void main (String [] args) throws IOException, ClassNotFoundException, InterruptedException {args = new String [] {"G:\\ test\\ A\\ GroupingComparator.txt", "G:\\ test\ A\\ comparator6\"}; Configuration conf = new Configuration (); Job job = Job.getInstance (conf); job.setJarByClass (OrderDriver.class); job.setMapperClass (OrderMapper.class) Job.setReducerClass (OrderReducer.class); job.setMapOutputKeyClass (OrderBean.class); job.setMapOutputValueClass (NullWritable.class); job.setOutputKeyClass (OrderBean.class); job.setOutputValueClass (NullWritable.class); / / set partition implementation class job.setPartitionerClass (OrderPartitioner.class); job.setNumReduceTasks (3); / / set group implementation class job.setGroupingComparatorClass (OrderGroupCompartor.class) FileInputFormat.setInputPaths (job, new Path (args [0])); FileOutputFormat.setOutputPath (job, new Path (args [1])); job.waitForCompletion (true);}}

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report