What are the common algorithms of MapReduce in Hadoop 04/03 Update SLTechnology News&Howtos

What are the common algorithms of MapReduce in Hadoop

2026-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail what are the common algorithms of MapReduce in Hadoop. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.

1. Sort: 1) data:

Hadoop fs-mkdir / import

Create one or more texts and upload them

Hadoop fs-put test.txt / import/

2) Code: package com.cuiweiyou.sort;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.NullWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat Import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat / / hadoop default sort: / / if the K2 and v2 types are Text- text, the result is in dictionary order / / if the K2 and v2 types are LongWritable- numbers The result is public class SortTest {/ * inner class: mapper Mapper * / public static class MyMapper extends Mapper {/ * rewrite map method * / public void map (LongWritable K1, Text v1, Context context) throws IOException InterruptedException {/ / here v1 is converted to k2-numeric type Abandon K1. Null is v2 context.write (new LongWritable (Long.parseLong (v1.toString (), NullWritable.get ()) / / because v1 may be duplicated, at this time, K2 may also have a duplicate}} / * inner class: splitter Reducer * / public static class MyReducer extends Reducer {/ * override the reduce method * there is a shuffle procedure before this method is executed According to K2, the corresponding v2 is merged into v2 [...] * / protected void reduce (LongWritable K2, Iterable v2, Reducerk3, v2 [...] Give up. Null = > v3 context.write (K2, NullWritable.get ()); / / at this time, if K3 is duplicated, it will be overwritten according to the default algorithm, that is, only one K3}} public static void main (String [] args) throws Exception {/ / declares the configuration information Configuration conf = new Configuration () Conf.set ("fs.default.name", "hdfs://localhost:9000"); / / create job Job job = new Job (conf, "SortTest"); job.setJarByClass (SortTest.class); / / set mr job.setMapperClass (MyMapper.class) Job.setReducerClass (MyReducer.class); / / sets the output type, which is the same as the parameter type of the Context context object write job.setOutputKeyClass (LongWritable.class); job.setOutputValueClass (NullWritable.class) / / set the input and output path FileInputFormat.setInputPaths (job, new Path ("/ import/")); FileOutputFormat.setOutputPath (job, new Path ("/ out")); / / execute System.exit (job.waitForCompletion (true)? 0: 1);}}

3) Test:

As you can see, it is not only sorted but also duplicated.

two。 De-weight:

Demand: find out what your cell phone number is. The idea here is the same as that of the sorting algorithm above, only with the step of dividing the mobile phone number.

1) data:

Create two texts and enter some test content manually. Each field is separated by a tab. Date, phone number, address, method, amount of data.

2) Code: (1) map and reduce:/** * Mapper Mapper * / public static class MyMapper extends Mapper {/ * rewrite map method * / protected void map (LongWritable K1, Text v1, Context context) throws IOException InterruptedException {/ / split according to tabs String [] tels = v1.toString () .split ("\ t") / / K1 = > K2-the second column mobile phone number, null = > v2 context.write (new Text (tels [1]), NullWritable.get ()) }} / * there is a shuffle process after map and before reduce The corresponding v2 will be merged into v2 [...] * * / / * splitter Reducer * / public static class MyReducer extends according to K2 Reducer {/ * override reduce method * / protected void reduce (Text K2 Iterable v2, Context context) throws IOException, InterruptedException {/ / now If K3 is duplicated, it will be overwritten according to the default algorithm, that is, only one K3 will be saved to achieve the de-reproducing effect of context.write (K2, NullWritable.get ()). }}

(2) configure output: / / set the output type, which is the same as the parameter type of the Context context object write (job.setOutputKeyClass (Text.class); job.setOutputValueClass (NullWritable.class))

3) Test:

3. Filter:

Demand: check the online records that occurred in the Beijing area. The train of thought is the same as above, add a judgment when writing K2 and v2.

1) data:

Ditto.

2) Code: (1) map and reduce:/** * inner class: mapper Mapper * / public static class MyMapper extends Mapper {/ * rewrite map method * / protected void map (LongWritable K1, Text v1, Context context) throws IOException InterruptedException {/ / split according to tabs final String [] adds = v1.toString () .split ("\ t") / / address in column 3 / / K1 = > k2-address, null = > v2 if (adds [2] .equals ("beijing")) {context.write (new Text (v1.toString ()), NullWritable.get () Inner class: Splitter Reducer * / public static class MyReducer extends Reducer {/ * override reduce method * / protected void reduce (Text K2, Iterable v2, Context context) throws IOException InterruptedException {context.write (K2, NullWritable.get ()) }}

3) Test:

4.TopN:

This algorithm is very classic and must be asked for an interview. There are also many algorithms to achieve this effect. Here is a simple example.

Requirements: find the maximum traffic; find the first five maximum values.

1) data:

Ditto.

2) Code 1-maximum: (1) map and reduce://map public static class MyMapper extends Mapper {/ / first create a temporary variable to hold a minimum value that can be stored: Long.MIN_VALUE=-9223372036854775808 long temp = Long.MIN_VALUE / / find the maximum value protected void map (LongWritable K1, Text v1, Context context) throws IOException, InterruptedException {/ / divide according to tabs final String [] flows = v1.toString () .split ("\ t") / / turn the text value final long val = Long.parseLong (flows [4]); / / if v1 is larger than the temporary variable, save the v1 value if (temp)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.