Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

II. Basic programming specification of MapReduce

2025-01-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

[TOC]

1. The basic composition of MapReduce programming

There are at least three essential parts to writing a MapReduce program: mapper,reducer,driver. Partitioner,combiner is optional.

And the input and output of mapper and the input and output of reducer are key value type, so when we write mapper and reducer, we must identify eight data types in these four key-value pairs, and they must be serializable types of hadoop. Also note that the output of map is actually the input of reduce, so the data types included are the same.

1. Map stage

Write the basic process

1) Custom map class, which needs to inherit Mapper class

2) when inheriting Mapper, you need to specify the type in the key-value pair of input and output

3) the map () method inherited from the parent class must be overridden

4) the map () method rewritten above is that each map task is called once for each key-value pair entered into the mapper.

The basic writing examples are as follows:

/ * specify that the four types of Mapper are: LongWritable, Text, Text, IntWritable, which is equivalent to the normal type: long,string,string,int*/public class TestMapper extends Mapper {public void map (LongWritable key, Text value, Context context) throws IOException, InterruptedException {here is the map method processing logic} 2, reduce stage

Basic writing process

1) Custom reduce class, which needs to inherit Reducer class

2) when inheriting Reducer, you need to specify the type in the key-value pair of input and output

3) the reduce () method inherited from the parent class must be overridden

4) the reduce () method rewritten above is that each reduer task is called once for each key-value pair entered into the reducer.

The basic writing examples are as follows:

/ * specify that the four types of Reducer are: Text, IntWritable, Text, IntWritable, which is equivalent to the normal type: string,int,string,int*/public class TestReducer extends Reducer {protected void reduce (Text key, Iterable values, Context context) throws IOException, InterruptedException {here is the reduce method processing logic} 3, driver stage

This section contains all kinds of necessary configuration information for configuring the job object. After the configuration is completed, submit the job to yarn for execution.

What is the specific configuration? let's go straight to the example. It mainly plays the role of scheduling map and reduce task execution.

4. Partitioner stage

In this stage, the output of map is partitioned, and the number of partitions of map directly determines the number of reduce task (generally speaking, one-to-one). The writing process is as follows:

1) Custom partition class, which inherits Partitioner

2) when inheriting Partitioner, the key-value pair type of the input processed

3) the getPartition () method inherited from the parent class must be overridden

4) the getPartition () () method rewritten above is called once for each key-value pair entered by each maptask.

5) according to the partition rule, 0roomn is returned, which means the partition format is 0roomn.

Write a case as follows:

Public class WordCountPartitioner extends Partitioner {@ Override public int getPartition (Text text, IntWritable intWritable, int I) {judgment condition 1: return 0; judgment condition 2: return 1;. Return n;}} 5, combiner

Combiner is not a separate phase, it is actually included in the map phase. In the key-value pair output by map itself, the value of each key-value pair is 1. Even if it is the same key, it is an independent key-value pair. If there are more duplicate key-value pairs, a lot of bandwidth will be consumed in the process of passing the map output to reduce. The way to optimize each map output is to merge and summarize locally under the current map task to reduce the occurrence of repetition. That is,

King,1 > the same key will be merged to reduce the amount of data transferred.

So in fact, we can know that the operation of combiner is the same as that of reduce, except that one is local and the other is global. The simple thing to do is to pass reducer directly into job as a combiner class, such as:

Job.setCombinerClass (WordCountReducer.class)

We can take a look at the source code of this method:

Public void setCombinerClass (Class

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report