Mapreduce template code 07/03 Update SLTechnology News&Howtos

Mapreduce template code

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Jai package

Org.apache.hadoop hadoop-core 1.2.1

After 2.x, it will be broken into some scattered packages, and there will be no core packages.

Code:

Package of package org.conan.myhadoop.mr;import java.io.IOException;import org.apache.hadoop.conf.Configuration;//org.apache.hadoop.mapred old system / / org.apache.hadoop.mapreduce package import org.apache.hadoop.conf.Configured;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer of new system Import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.Tool;import org.apache.hadoop.util.ToolRunner / * * ModuleMapReduce Class * simple comment * / public class ModuleMapReduce extends Configured implements Tool {/ * * ModuleMapper Class not only has the function of annotation, but also if you mouse over your annotation method, it will show the content of your comment. * * / public static class ModuleMapper extends Mapper {@ Override public void setup (Context context) throws IOException, InterruptedException {super.setup (context) @ Override public void map (LongWritable key, Text value, Context context) throws IOException, InterruptedException {/ / TODO} @ Override public void cleanup (Context context) throws IOException, InterruptedException {super.cleanup (context) }} / * * ModuleReducer Class * * / public static class ModuleReducer extends Reducer {@ Override public void setup (Context context) throws IOException, InterruptedException {/ / TODO Auto-generated method stub super.setup (context) @ Override protected void reduce (LongWritable key, Iterable value, Context context) throws IOException, InterruptedException {/ / TODO} @ Override protected void cleanup (Context context) throws IOException, InterruptedException {super.cleanup (context) }} / / Driver driver / / @ Override / / when implementing the interface, the JDK of keywords 1.5 and 1.7 will report errors, only 1.6 will not report errors public int run (String [] args) throws Exception {Job job = parseInputAndOutput (this, this.getConf (), args); / / 2.set job / / step 1:set input job.setInputFormatClass (TextInputFormat.class) / / step 3:set mappper class job.setMapperClass (ModuleMapper.class); / / step 4:set mapout key/value class job.setMapOutputKeyClass (LongWritable.class); job.setMapOutputValueClass (Text.class); / / step 5:set shuffle (sort,combiner,group) / / set sort job.setSortComparatorClass (LongWritable.Comparator.class) / / set combiner (optional,default is unset) must be a subclass of Reducer job.setCombinerClass (ModuleReducer.class); / / set grouping job.setGroupingComparatorClass (LongWritable.Comparator.class); / / step 6 set reducer class job.setReducerClass (ModuleReducer.class); / / step 7:set job output key/value class job.setOutputKeyClass (LongWritable.class); job.setOutputValueClass (Text.class) / / step 8:set output format job.setOutputFormatClass (FileOutputFormat.class); / / step 10: submit job Boolean isCompletion = job.waitForCompletion (true); / / submit job return isCompletion? 0: 1 } public Job parseInputAndOutput (Tool tool, Configuration conf, String [] args) throws IOException {/ / validity of input parameters if (args.length! = 2) {System.err.printf ("Usage:% s [generic options]\ n", tool .getClass (). GetSimpleName ()) / /% s represents the output string, that is, replacing% s ToolRunner.printGenericCommandUsage (System.err); return null;} / / 1.create job Job job = Job.getInstance (conf, this.getClass (). GetSimpleName ()); job.setJarByClass (ModuleMapReduce.class) / / step 2:set input path Path inputPath = new Path (args [0]); FileInputFormat.addInputPath (job, inputPath); / / step 9:set output path Path outputPath = new Path (args [0]); FileOutputFormat.setOutputPath (job, outputPath); return job;} public static void main (String [] args) {try {int status = ToolRunner.run (new ModuleMapReduce (), args) / / the return value is isCompletion? 0: 1 System.exit (status); / / System.exit (0) interrupts the operation of the virtual machine and exits the application. 0 means there is no abnormal normal exit. } catch (Exception e) {e.printStackTrace ();}

Inverted index code

The input file is as follows:

13588888888 112

13678987879 13509098987

18987655436 110

2543789 112

15699807656 110

011-678987 112

Description: each action has a telephone call record, and the number on the left (marked as a) is called to the number on the right (marked as b), separated by a space.

Request:

Output the above files in the following format:

110 18987655436 | 15699807656

112 13588888888 | 011-678987

13509098987 13678987879

Description: the number b is called on the left, and the number an of call b on the right is divided by "|"

Package org.conan.myhadoop.mr;import java.io.IOException;import java.text.DateFormat;import java.text.SimpleDateFormat;import java.util.Date;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.conf.Configured;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.*;import org.apache.hadoop.mapreduce.*;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat Import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;import org.apache.hadoop.util.Tool;import org.apache.hadoop.util.ToolRunner;public class ReverseIndex extends Configured implements Tool {enum Counter {LINESKIP, / / Line in error} public static class Map extends Mapper {public void map (LongWritable key, Text value, Context context) throws IOException, InterruptedException {String line = value.toString () / / read source data try {/ / data processing String [] lineSplit = line.split (""); String anum = lineSplit [0]; String bnum = lineSplit [1]; context.write (new Text (bnum), new Text (anum)) / / output} catch (java.lang.ArrayIndexOutOfBoundsException e) {context.getCounter (Counter.LINESKIP) .increment (1); / / error hang counter + 1 return;}} public static class Reduce extends Reducer {public void reduce (Text key, Iterable values, Context context) throws IOException, InterruptedException {String valueString String out = ""; for (Text value: values) {valueString = value.toString (); out + = valueString + "|"; System.out.println ("Ruduce:key=" + key + "value=" + value);} context.write (key, new Text (out)) } @ Override public int run (String [] args) throws Exception {Configuration conf = this.getConf (); Job job = new Job (conf, "ReverseIndex"); / / Task name job.setJarByClass (ReverseIndex.class); / / specify Class FileInputFormat.addInputPath (job, new Path (args [0])); / / enter path FileOutputFormat.setOutputPath (job, new Path (args [1])) / / output path job.setMapperClass (Map.class); / / call the above Map class as the Map task code job.setReducerClass (ReverseIndex.Reduce.class); / / call the above Reduce class as the Reduce task code job.setOutputFormatClass (TextOutputFormat.class); job.setOutputKeyClass (Text.class); / / specify the format job.setOutputValueClass (Text.class) of the output KEY / / specify the format of the output VALUE job.waitForCompletion (true); / / output task completion System.out.println ("task name:" + job.getJobName ()); System.out.println ("task success:" + (job.isSuccessful ()? "Yes": "No"); System.out.println ("enter number of lines:" + job.getCounters () .findCounter ("org.apache.hadoop.mapred.Task$Counter", "MAP_INPUT_RECORDS") .getValue ()) System.out.println ("output lines:" + job.getCounters () .findCounter ("org.apache.hadoop.mapred.Task$Counter", "MAP_OUTPUT_RECORDS") .getValue (); System.out.println ("skipped lines:" + job.getCounters () .findCounter (Counter.LINESKIP) .getValue () Return job.isSuccessful ()? 0: 1;} public static void main (String [] args) throws Exception {/ / determine whether the number of parameters is correct / / if there are no parameters, it will be displayed for program description if (args.length! = 2) {System.err.println ("); System.err.println (" Usage: ReverseIndex

< input path >

< output path >

"); System.err .println (" Example: hadoop jar ~ / ReverseIndex.jar hdfs://localhost:9000/in/telephone.txt hdfs://localhost:9000/out "); System.exit (- 1);} / / recording start time DateFormat formatter = new SimpleDateFormat (" yyyy-MM-dd HH:mm:ss "); Date start = new Date () / / run task int res = ToolRunner.run (new Configuration (), new ReverseIndex (), args); / / output task time Date end = new Date (); float time = (float) ((end.getTime ()-start.getTime ()) / 60000.0); System.out.println ("Task start:" + formatter.format (start)) System.out.println ("Task end:" + formatter.format (end)); System.out.println ("Task time:" + String.valueOf (time) + "minutes"); System.exit (res);}}

De-duplicating code

/ / Mapper task static class DDMap extends Mapper {private static Text line = new Text (); protected void map (LongWritable K1MagneText v1Magna context context) {line = v1; Text text = new Text (""); try {context.write (line,text);} catch (IOException e) {/ / TODO Auto-generated catch block e.printStackTrace () } catch (InterruptedException e) {/ / TODO Auto-generated catch block e.printStackTrace ();}};} / / Reducer task static class DDReduce extends Reducer {protected void reduce (Text K2Magic Iterable v2sMagna context context) {Text text = new Text (""); try {context.write (K2, text) } catch (IOException e) {/ / TODO Auto-generated catch block e.printStackTrace ();} catch (InterruptedException e) {/ / TODO Auto-generated catch block e.printStackTrace ();};}

Reference article

A classic MapReduce template code, inverted index (ReverseIndex)

Http://blog.itpub.net/26400547/viewspace-1214945/

Explain in detail the application scenario of data deduplication and inverted indexing by MapReduce.

Http://www.tuicool.com/articles/emi6Fb

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.