How to use MapReduce 07/09 Update SLTechnology News&Howtos

How to use MapReduce

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to use MapReduce". Friends who are interested might as well take a look. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn how to use MapReduce.

What is MR?

MR is a distributed computing model, which is mainly used to solve the computing problems of massive data. It contains two kinds of calculation functions, one is Mapping, the other is Reducing. Mapping does the same operation for each target in the collection, while Reduceing returns a comprehensive result by traversing the elements in the collection. When we manipulate the code, we only need to rewrite the map and reduce methods, which is very simple. The formal parameters of these two functions are k _ focus v pairs, and the speed slows down when the amount of data reaches more than 10PB.

MR execution process

When the MR program starts, it will convert the input file into key-value pairs and pass them to the map function. If there are several key-value pairs, the map function will be executed several times, but this does not mean that there are several Mapper processes with several key-value pairs. This is wrong. After processing by the map function, it becomes a key-value pair. The process of changing the input from a reduce function to a function is called shuffle. Shuffle is not a function like map and reduce, it doesn't need to be run by a separate node, it's just a process. After being processed by the reduce function, it becomes the final output. The number of key-value pairs remains the same until the reduce function is reached.

Map stage

(1)。 Parse into pairs according to the input file, and each pair calls the map function once

(2)。 According to the map function written by yourself, the key-value pair is processed into a new key-value pair output.

(3)。 Partition the output key-value pairs, and different partitions correspond to different Reducer processes

(4)。 The key-value pairs in each partition are sorted and grouped according to key. Then put the val of the same key into the same collection.

(5)。 Proceed with the protocol (optional)

Reduce stage

(1)。 The kv pairs output by multiple map functions are transferred to different reduce nodes according to different partitions.

(2)。 Merge and sort the kv pairs output from multiple map functions. According to reduce function logic, process and convert to new key-value pair output

(3)。 Output save file

3. Simple example

Wordcount

Public class WordCount {public static class MyMapper extends Mapper {Text k2=new Text (); LongWritable v2=new LongWritable (); @ Override protected void map (LongWritable K1, Text v1Magna context context) throws IOException, InterruptedException {String [] words=v1.toString (). Split ("\ t"); for (String string: words) {k2.set (string); v2.set (1L); context.write (K2, v2);}} public static class MyReduce extends Reducer {LongWritable v3=new LongWritable () @ Override protected void reduce (Text K2, Iterable v2) throws IOException, InterruptedException {long sum=0; for (LongWritable longWritable: V2s) {sum=sum+longWritable.get ();} v3.set (sum); context.write (K2, v3);}} public static void main (String [] args) throws Exception {Configuration conf=new Configuration (); Job job=Job.getInstance (conf, WordCount.class.getSimpleName ()); job.setJarByClass (WordCount.class); job.setMapperClass (MyMapper.class) Job.setReducerClass (MyReduce.class); job.setMapOutputKeyClass (Text.class); job.setMapOutputValueClass (LongWritable.class); job.setOutputKeyClass (Text.class); job.setOutputValueClass (LongWritable.class); FileInputFormat.setInputPaths (job, new Path ("hdfs://115.28.138.100:9000/a.txt")); FileOutputFormat.setOutputPath (job, new Path ("hdfs://115.28.138.100:9000/out4")); job.waitForCompletion (true);}}

Serialization of 4.MR

Serialization is the transformation of structured objects into byte streams, and in MR, instead of using java's own serialization, he implements a set of serialization himself. Because in comparison, hadoop serialization has many advantages. In the mr program, our parameters and output key-value pairs are all serialized objects. What should we do when we need to customize a serialized object? You only need to implement the Writable interface, and of course key needs to implement the WritableComparable interface, because you need to sort and group according to key.

Then there is a small example to show serialization. It is an example of how to deal with telecommunication traffic.

Public class LiuLiang {public static class MyMapper extends Mapper {Text k2=new Text (); MyArrayWritable v2=new MyArrayWritable (); LongWritable v21=new LongWritable (); LongWritable v22=new LongWritable (); LongWritable v23=new LongWritable (); LongWritable v24=new LongWritable (); LongWritable [] values=new LongWritable [4]; @ Override protected void map (LongWritable K1, Text v1, Context context) throws IOException, InterruptedException {String [] words=v1.toString (). Split ("\ t"); k2.set (words [1]) V21.set (Long.parseLong (words [6])); v22.set (Long.parseLong (words [7])); v23.set (Long.parseLong (words [8])); v24.set (Long.parseLong (words [9])); values [0] = v21; values [1] = v22; values [2] = v23; values [3] = v24; v2.set (values); context.write (K2, v2) }} public static class MyReduce extends Reducer {Text v3=new Text (); @ Override protected void reduce (Text K2, Iterable v2s, Context context) throws IOException, InterruptedException {long sum1=0; long sum2=0; long sum3=0; long sum4=0; for (MyArrayWritable myArrayWritable: V2s) {Writable [] values= myArrayWritable.get (); sum1=sum1+ ((LongWritable) values [0]) .get (); sum2=sum2+ ((LongWritable) values [1]). Get () Sum3=sum3+ ((LongWritable) values [2]). Get (); sum4=sum4+ ((LongWritable) values [3]). Get ();} v3.set ("\ t" + sum1+ "\ t" + sum2+ "\ t" + sum3+ "\ t" + sum4); context.write (K2, v3);} public static void main (String [] args) throws Exception {Configuration conf=new Configuration (); Job job=Job.getInstance (conf, LiuLiang.class.getSimpleName ()); job.setJarByClass (LiuLiang.class) Job.setMapperClass (MyMapper.class); job.setReducerClass (MyReduce.class); job.setMapOutputKeyClass (Text.class); job.setMapOutputValueClass (MyArrayWritable.class); job.setOutputKeyClass (Text.class); job.setOutputValueClass (Text.class); FileInputFormat.setInputPaths (job, new Path ("hdfs://115.28.138.100:9000/HTTP_20130313143750.dat")); FileOutputFormat.setOutputPath (job, new Path ("hdfs://115.28.138.100:9000/ceshi3")) Job.waitForCompletion (true);}} class MyArrayWritable extends ArrayWritable {public MyArrayWritable () {super (LongWritable.class);} public MyArrayWritable (String [] arg0) {super (arg0);}}

5.SequenceFile

In the study of HDFS, we mentioned the solution to small files, one of which is this SequenceFile. It is an unordered storage that serializes kv pairs into files, merging many small files and supporting compression. The disadvantage is that you have to traverse to see the small files inside.

Public class SequenceFileTest {public static void main (String [] args) throws Exception {Configuration conf = new Configuration (); FileSystem fileSystem = FileSystem.get (new URI ("hdfs://115.28.138.100:9000"), conf, "hadoop"); / / Write (conf, fileSystem); Read (conf, fileSystem);} private static void Read (Configuration conf, FileSystem fileSystem) throws IOException {Reader reader=new SequenceFile.Reader (fileSystem, new Path ("/ sqtest"), conf); Text key=new Text (); Text val=new Text () While (reader.next (key, val)) {System.out.println (key.toString () + "-" + val.toString ());} IOUtils.closeStream (reader);} private static void Write (Configuration conf, FileSystem fileSystem) throws IOException {Writer writer = SequenceFile.createWriter (fileSystem, conf, new Path ("/ sqtest"), Text.class, Text.class); Collection files = FileUtils.listFiles (new File ("F:\\ ceshi1"), new String [] {"txt"}, false) For (File file: files) {Text text = new Text (); text.set (FileUtils.readFileToString (file)); writer.append (new Text (file.getName ()), text);} IOUtils.closeStream (writer);}} so far, I believe you have a deeper understanding of "how to use MapReduce". You might as well do it in practice! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.