What are the benefits of using Combiner in Hadoop 07/02 Update SLTechnology News&Howtos

What are the benefits of using Combiner in Hadoop

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the benefits of using Combiner in Hadoop, which has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, let the editor take you to understand it.

Benefits of using Combiner:

Reduce the output data of Mapper task, reduce the network transmission time, and reduce the overall Job running time.

Combiner only acts on a single Mapper task, and each Map task may produce a large amount of output. The function of Combiner is to merge the output once on the map in order to reduce the amount of data transferred to Reducer.

Combiner is the most basic implementation of local Key recursion, Combiner has a similar native Reduce function. If you do not use Combiner, then all the results will be completed by Reduce, and the efficiency will be relatively low. The Map that is completed first with Combiner will be aggregated locally and speed up.

Note: the output of Combiner is the input of Reduce, and Combiner can never change the final calculation result, so from my point of view, Combiner should only be used in scenarios where the input key/value of Reduce is consistent with the type of output key/value, and does not affect the final result. Such as accumulation, maximum, etc.

Why use Combiner:

The broadband available on the cluster limits the number of MapReduce jobs, so the most important thing is to avoid data transfer between Map tasks and Reduce tasks as much as possible.

Hadoop allows the user to specify a merge function (Combiner) for the output of the Map task-the output of the merge function as the input to the Reduce function.

Because the merge function is an optimization scheme, Hadoop cannot determine how many times the merge function needs to be called for any record in the Map task output. In other words, the output structure of Reduce is consistent no matter how many times the merge function is called.

Example: package combiner;import java.io.IOException;import java.net.URI;import java.net.URISyntaxException;import mapreduce.MyMapper;import mapreduce.MyReducer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat Import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;/** * calculation word * @ author Xr * * / public class WordCountApp {private static final String INPUT_PATH = "hdfs://hadoop:9000/hello"; private static final String OUTPUT_PATH = "hdfs://hadoop:9000/hello1" Public static void main (String [] args) throws Exception {Configuration conf = new Configuration (); / / sentence whether there is an input directory existsFile (conf); Job job = new Job (conf,WordCountApp.class.getName ()); / / 1.1 where to read data FileInputFormat.setInputPaths (job, INPUT_PATH) / / parse each line in the input text into key-value pairs job.setInputFormatClass (TextInputFormat.class); / / 1.2 set custom map functions job.setMapperClass (MyMapper.class); job.setMapOutputKeyClass (Text.class); job.setMapOutputValueClass (LongWritable.class); / / 1.3 partition job.setPartitionerClass (HashPartitioner.class) Job.setNumReduceTasks (1); / 1.4 TODO sort grouping / / 1.5 specification job.setCombinerClass (MyReducer.class); / / 2.1 is done by the framework and does not require manual intervention by programmers. / / 2.2 Custom reducer functions job.setReducerClass (MyReducer.class); job.setOutputKeyClass (Text.class); job.setOutputValueClass (LongWritable.class); / / 2.3 write to HDFS FileOutputFormat.setOutputPath (job, new Path (OUTPUT_PATH)); / / format class job.setOutputFormatClass (TextOutputFormat.class) / submit to JobTracker to execute job.waitForCompletion (true);} private static void existsFile (Configuration conf) throws IOException, URISyntaxException {FileSystem fs = FileSystem.get (new URI (INPUT_PATH), conf); if (fs.exists (new Path (OUTPUT_PATH) {fs.delete (new Path (OUTPUT_PATH), true) } Thank you for reading this article carefully. I hope the article "what are the benefits of using Combiner in Hadoop" shared by the editor will be helpful to everyone? at the same time, I also hope that you will support and pay attention to the industry information channel, and more related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.