The method of realizing self-defined Partition by MapReduce 07/16 Update SLTechnology News&Howtos

The method of realizing self-defined Partition by MapReduce

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Brief introduction:

Mapreduce is a programming model for parallel operations on large datasets (larger than 1TB).

It achieves reliability by distributing large-scale operations of data sets to each node on the network, which greatly facilitates programmers to run their programs on distributed systems without distributed parallel programming.

The partition that comes with MapReduce is HashPartitioner.

Principle:

First, the hash value of the key output of map is calculated, and then the number of reduce task is added to the module. According to the result, it is decided that this output kv pair is taken away by the matching reduce task.

Custom partitioning needs to inherit Partitioner and override the getpariton () method

Custom partition classes:

Note: the output of map is a key-value pair

Where int partitionIndex = dict.get (text.toString ()), and partitionIndex is the value of getting K

Attached: calculated text

Dear Dear Bear Bear River Car Dear Dear Bear RiveDear Dear Bear Bear River Car Dear Dear Bear Rive

You need to set it in the main function to specify a custom partition class.

Custom partition classes:

Import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Partitioner;import java.util.HashMap;public class CustomPartitioner extends Partitioner {public static HashMap dict = new HashMap (); / / Text represents the output key,IntWritable of the map phase represents the output value static {dict.put ("Dear", 0); dict.put ("Bear", 1); dict.put ("River", 2) Dict.put ("Car", 3);} public int getPartition (Text text, IntWritable intWritable, int I) {/ / int partitionIndex = dict.get (text.toString (); return partitionIndex;}}

Note: the output of map is the key-value pair, int partitionIndex = dict.get (text.toString ()); the partitionIndex in the map output key-value pair is the value of the key, that is, the value of K.

Maper class:

Import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;import java.io.IOException;public class WordCountMap extends Mapper {public void map (LongWritable key, Text value, Context context) throws IOException, InterruptedException {String [] words = value.toString () .split ("\ t") For (String word: words) {/ / each word appears once, and outputs context.write (new Text (word), new IntWritable (1)) as an intermediate result;}

Reducer class:

Main function:

Import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import java.io.IOException Public class WordCountMain {public static void main (String [] args) throws IOException, ClassNotFoundException, InterruptedException {if (args.length! = 2 | | args = = null) {System.out.println ("please input Path!"); System.exit (0);} Configuration configuration = new Configuration () Configuration.set ("mapreduce.job.jar", "/ home/bruce/project/kkbhdp01/target/com.kaikeba.hadoop-1.0-SNAPSHOT.jar"); Job job = Job.getInstance (configuration, WordCountMain.class.getSimpleName ()); / / pack jar job.setJarByClass (WordCountMain.class); / / set input / output format through job / / job.setInputFormatClass (TextInputFormat.class) / / job.setOutputFormatClass (TextOutputFormat.class); / / set the input / output path FileInputFormat.setInputPaths (job, new Path (args [0])); FileOutputFormat.setOutputPath (job, new Path (args [1])); / / set the class job.setMapperClass (WordCountMap.class) that handles the Map/Reduce phase; / / map combine / / job.setCombinerClass (WordCountReduce.class) Job.setReducerClass (WordCountReduce.class); / / if the output kv pairs of map and reduce are of the same type, just set the kv pair of reduce output directly If it is different, you need to set the kv type of map and reduce output / / job.setMapOutputKeyClass (.class) / / set the type of final output key/value m job.setOutputKeyClass (Text.class); job.setOutputValueClass (IntWritable.class); job.setPartitionerClass (CustomPartitioner.class); job.setNumReduceTasks (4); / / submit job job.waitForCompletion (true);}}

Parameter settings for main function:

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.