6. MapReduce custom partition implementation 02/11 Update SLTechnology News&Howtos

6. MapReduce custom partition implementation

2026-02-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

The partition that comes with MapReduce is HashPartitioner.

Principle: first calculate the hash value of the key output of map, and then the number of reduce task on the module, according to the result, determine this output kv pair, which is taken away by the matching reduce task.

Custom partitioning needs to inherit Partitioner and override the getpariton () method

Custom partition classes:

Note: the output of map is a key-value pair

Where int partitionIndex = dict.get (text.toString ()), and partitionIndex is the value of getting K

Attached: calculated text

Dear Dear Bear Bear River Car Dear Dear Bear RiveDear Dear Bear Bear River Car Dear Dear Bear Rive

You need to set it in the main function to specify a custom partition class.

Custom partition classes:

Import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Partitioner;import java.util.HashMap;public class CustomPartitioner extends Partitioner {public static HashMap dict = new HashMap (); / / Text represents the output key,IntWritable of the map phase represents the output value static {dict.put ("Dear", 0); dict.put ("Bear", 1); dict.put ("River", 2) Dict.put ("Car", 3);} public int getPartition (Text text, IntWritable intWritable, int I) {/ / int partitionIndex = dict.get (text.toString (); return partitionIndex;}}

Note: the output of map is the key-value pair, int partitionIndex = dict.get (text.toString ()); the partitionIndex in the map output key-value pair is the value of the key, that is, the value of K.

Maper class:

Import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;import java.io.IOException;public class WordCountMap extends Mapper {public void map (LongWritable key, Text value, Context context) throws IOException, InterruptedException {String [] words = value.toString () .split ("\ t") For (String word: words) {/ / each word appears once, and outputs context.write (new Text (word), new IntWritable (1)) as an intermediate result;}

Reducer class:

Main function:

Import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import java.io.IOException Public class WordCountMain {public static void main (String [] args) throws IOException, ClassNotFoundException, InterruptedException {if (args.length! = 2 | | args = = null) {System.out.println ("please input Path!"); System.exit (0);} Configuration configuration = new Configuration () Configuration.set ("mapreduce.job.jar", "/ home/bruce/project/kkbhdp01/target/com.kaikeba.hadoop-1.0-SNAPSHOT.jar"); Job job = Job.getInstance (configuration, WordCountMain.class.getSimpleName ()); / / pack jar job.setJarByClass (WordCountMain.class); / / set input / output format through job / / job.setInputFormatClass (TextInputFormat.class) / / job.setOutputFormatClass (TextOutputFormat.class); / / set the input / output path FileInputFormat.setInputPaths (job, new Path (args [0])); FileOutputFormat.setOutputPath (job, new Path (args [1])); / / set the class job.setMapperClass (WordCountMap.class) that handles the Map/Reduce phase; / / map combine / / job.setCombinerClass (WordCountReduce.class) Job.setReducerClass (WordCountReduce.class); / / if the output kv pairs of map and reduce are of the same type, just set the kv pair of reduce output directly If it is different, you need to set the kv type of map and reduce output / / job.setMapOutputKeyClass (.class) / / set the type of final output key/value m job.setOutputKeyClass (Text.class); job.setOutputValueClass (IntWritable.class); job.setPartitionerClass (CustomPartitioner.class); job.setNumReduceTasks (4); / / submit job job.waitForCompletion (true);}}

Parameter settings for main function:

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.