Example Analysis of hadoop-reduce 07/06 Update SLTechnology News&Howtos

Example Analysis of hadoop-reduce

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Editor to share with you the example analysis of hadoop-reduce, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

The results of Map will be distributed to Reducer through partition. After Reducer completes the Reduce operation, it will be exported through OutputFormat.

* Licensed to the Apache Software Foundation (ASF) under onepackage org.apache.hadoop.mapreduce;import java.io.IOException * Reduces a set of intermediate values which share a key to a smaller set ofpublic class Reducer {public class Context extends ReduceContext {public Context (Configuration conf, TaskAttemptID taskid, RawKeyValueIterator input, Counter inputKeyCounter, Counter inputValueCounter, RecordWriter output, OutputCommitter committer, StatusReporter reporter, RawComparator comparator Class keyClass, Class valueClass) throws IOException, InterruptedException {super (conf, taskid, input, inputKeyCounter, inputValueCounter, output, committer, reporter, comparator, keyClass, valueClass) }} / * Called once at the start of the task. * / protected void setup (Context context) throws IOException, InterruptedException {/ / NOTHING} / * * This method is called once for each key. Most applications will define * their reduce class by overriding this method. The default implementation * is an identity function. * / @ SuppressWarnings ("unchecked") protected void reduce (KEYIN key, Iterable values, Context context) throws IOException, InterruptedException {for (VALUEIN value: values) {context.write ((KEYOUT) key, (VALUEOUT) value);}} / * * Called once at the end of the task. * / protected void cleanup (Context context) throws IOException, InterruptedException {/ / NOTHING} / * * Advanced application writers can use the * {@ link # run (org.apache.hadoop.mapreduce.Reducer.Context)} method to * control how the reduce task works. * / public void run (Context context) throws IOException, InterruptedException {setup (context); while (context.nextKey ()) {reduce (context.getCurrentKey (), context.getValues (), context);} cleanup (context);}}

The result of Mapper may be sent to the possible Combiner for merging. Combiner does not have its own base class in the system, but uses Reducer as the base class of Combiner. Their external function is the same, but the location and context are not the same.

The result of Mapper final processing needs to be sent to Reducer for merging. When merging, the key / value pairs of the same key will be sent to the same Reducer. The allocation process of which key to which Reducer is specified by Partitioner. It has only one method, the input is the result pair of Map and the number of Reducer, and the output is the assigned Reducer (integer number). The default Partitioner of the system is HashPartitioner, which takes the module of the number of Reducer with the hash value of key to get the corresponding Reducer.

* Licensed to the Apache Software Foundation (ASF) under onepackage org.apache.hadoop.mapreduce; * Partitions the key space.public abstract class Partitioner {/ * Get the partition number for a given key (hence record) given the total * number of partitions i.e. Number of reduce-tasks for the job. * *

Typically a hash function on an all or a subset of the key.

* * @ param key the key to be partioned. * @ param value the entry value. * @ param numPartitions the total number of partitions. * @ return the partition number for the key. * / public abstract int getPartition (KEY key, VALUE value, int numPartitions); * Licensed to the Apache Software Foundation (ASF) under onepackage org.apache.hadoop.mapreduce.lib.partition;import org.apache.hadoop.mapreduce.Partitioner;/** Partition keys by their {@ link Object#hashCode ()}. * / public class HashPartitioner extends Partitioner {/ * Use {@ link Object#hashCode ()} to partition. * / public int getPartition (K key, V value, int numReduceTasks) {return (key.hashCode () & Integer.MAX_VALUE)% numReduceTasks;}}

Reducer is the base class for all user-customized Reducer classes, similar to Mapper, it also has setup,reduce,cleanup and run methods, where setup and cleanup have the same meaning as Mapper, reduce is the place where Mapper results are really merged, and its input is an iterator of key and all value corresponding to this key, as well as the context of Reducer. Two very simple Reducer,IntSumReducer and LongSumReducer are defined in the system, which are used to sum the value of shaping / long integers, respectively.

* Licensed to the Apache Software Foundation (ASF) under onepackage org.apache.hadoop.mapreduce.lib.reduce;import java.io.IOException;public class IntSumReducer extends Reducer {private IntWritable result = new IntWritable (); public void reduce (Key key, Iterable values, Context context) throws IOException, InterruptedException {int sum = 0; for (IntWritableval: values) {sum + = val.get ();} result.set (sum); context.write (key, result);}}

The result of Reduce, which is output to a file through the Reducer.Context method collect, is similar to the input, Hadoop introduces OutputFormat. OutputFormat relies on two secondary interfaces, RecordWriter and OutputCommitter, to process the output. RecordWriter provides the write method for the output and the close method to close the corresponding output. OutputCommitter provides a series of methods that users can implement to customize the special operations required at certain stages of the OutputFormat lifetime. We discussed these methods in TaskInputOutputContext (obviously, TaskInputOutputContext is the bridge between OutputFormat and Reducer).

OutputFormat and RecordWriter correspond to InputFormat and RecordReader, respectively, and the system provides null output NullOutputFormat (no output, NullOutputFormat.RecordWriter is just an example, not defined in the system), LazyOutputFormat (does not appear in the class diagram, does not analyze), FilterOutputFormat (does not analyze), and file-based FileOutputFormat SequenceFileOutputFormat and TextOutputFormat output.

File-based output FileOutputFormat takes advantage of a number of configuration items to work together, including mapred.output.compress: whether to compress; mapred.output.compression.codec: compression method; mapred.output.dir: output path; mapred.work.output.dir: output work path. FileOutputFormat also relies on FileOutputCommitter to provide some temporary file management functions related to Job,Task through FileOutputCommitter. For example, setupJob for FileOutputCommitter will create a temporary directory called _ temporary under the output path, and cleanupJob will delete this directory.

SequenceFileOutputFormat output and TextOutputFormat output correspond to input SequenceFileInputFormat and TextInputFormat, respectively

The above is all the content of this article "sample Analysis of hadoop-reduce". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.