How to sort and group in MapReduce 03/24 Update SLTechnology News&Howtos

How to sort and group in MapReduce

2026-03-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about how to achieve sorting and grouping in MapReduce. Many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

Initial stage of Map

In the Map phase, the InputFormat defined by job.setInputFormatClass () is used to split the input data set into small data blocks split, while InputFormat provides an implementation of RecordReader. TextInputFormat is used in this lesson, and the RecordReader provided takes the line number of the text as Key, and the text of this line as Value. This is the input of the custom Mapper

< LongWritable,Text>

The reason. Then call the map method of the custom Mapper, and set the

< LongWritable,Text>

The key-value pair is input to the map method of Mapper.

The final phase of Map

At the end of the Map phase, job.setPartitionerClass () is called first to partition the output of this Mapper, each mapping to a Reducer. Within each partition, the Key set by job.setSortComparatorClass () is called to compare function class sorting. As you can see, this in itself is a secondary sort. If the Key comparison function class is not set through job.setSortComparatorClass (), the compareTo () method implemented by Key is used. We can either use the compareTo () method implemented by IntPair, or we can define Key comparison function classes specifically.

Reduce stage

In the Reduce phase, after the reduce () method accepts all the map output mapped to this Reduce, it also calls the Key comparison function class set by the job.setSortComparatorClass () method to sort all the data. Then we start to construct a Value iterator corresponding to Key. Grouping is used at this point, and the grouping function class is set using the job.setGroupingComparatorClass () method. As long as the two Key compared by this comparator are the same, they belong to the same group, their Value is placed in a Value iterator, and the Key of this iterator uses the first Key of all Key belonging to the same group. Finally, enter the reduce () method of Reducer, where the input to the reduce () method is all Key and its Value iterator. Also note that the input and output types must be the same as those declared in the custom Reducer.

The relationship between job.setPartitionerClass () and job.setGroupingComparatorClass ()

The reduce method reads one record at a time and reads the corresponding key, but when processing the value collection, after processing the values of the current record, it will determine whether the next record is in the same group as the current key. If so, it will continue to read the values of these records, and the record will be considered to have been processed until the record is not the current group. This reduce call ends. Such a reduce call will dispose of all the records in a group, not just one. What's the use of this? If there is no grouping, then the records of the same group will be processed independently in multiple reduce methods, and some state data will be passed, which will increase the complexity. If you deal with these states in one call, you can just use the variables in the method. For example, to find the maximum value, just read the first value.

After reading the above, do you have any further understanding of how to sort and group in MapReduce? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.