How to use mapreduce to realize wordcount and Movie score Prediction in hadoop 04/06 Update SLTechnology News&Howtos

How to use mapreduce to realize wordcount and Movie score Prediction in hadoop

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail how to use mapreduce to achieve wordcount and movie rating prediction in hadoop. The content of the article is of high quality, so the editor shares it for you as a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

In mapreduce, map refers to mapping and map refers to reduction.

Mapreduce is a key-value programming model that uses map to map one set of key-value to another set of key-value.

It is passed to reduce through the underlying layer, and in reduce, it reduces all the key-value passed by the map process, and the same key value and value value are put together. Mapreduce also sorts the key values in the reduce process once internally.

I. WordCount

Public class WordCount {/ / public static final String HDFS = "hdfs://localhost:8888"; public static final Pattern DELIMITER = Pattern.compile ("\\ b ([a-zA-Z] +)\\ b") / / Custom Map type execution "mapping" this part of public static class Map extends Mapper {/ / mapreduce, Text is equivalent to String type, IntWritable is equivalent to Int type / / LongWritable is a data type that implements WritableComparable. Private final static IntWritable one = new IntWritable (1); private Text word = new Text (); @ Override / / override the parent map () function public void map (LongWritable key, Text value, Context context) throws IOException, InterruptedException {/ / read a row of data String line = value.toString () / / changes all the characters of this line to lowercase line = line.toLowerCase (); / / splits a line of strings based on the defined regular expression. Matcher matcher = DELIMITER.matcher (line); while (matcher.find ()) {/ / converts the decomposed word type to Text. Word.set (matcher.group ()); / / pass in the corresponding key-value value. The key value is a word, and the value value is 1. Context.write (word,one);} / / the custom Combine process first performs a reduce process on the local map to reduce the amount of data transferred to the host. Public static class Combine extends Reducer {@ Override public void reduce (Text key, Iterable values, Context context) throws IOException, InterruptedException {int sum = 0; / / iterates through all value of the same key value, all value in the same Iterable. For (IntWritable line: values) {sum + = line.get ();} IntWritable value = new IntWritable (sum); / / outputs key-value in the specified output format. Context.write (key, value);}} public static class Reduce extends Reducer {@ Override public void reduce (Text key, Iterable values, Context context) throws IOException, InterruptedException {int sum = 0; for (IntWritable line: values) {sum + = line.get ();} IntWritable value = new IntWritable (sum) Context.write (key, value);}} public static void main (String [] args) throws Exception {JobConf conf = WordCount.config (); String input = "data/1.txt"; String output = HDFS + "/ user/hdfs/wordcount"; / / Custom HDFS file manipulation tool class HdfsDAO hdfs = new HdfsDAO (WordCount.HDFS, conf) / / remove the existing file, otherwise it will report the error hdfs.rmr (output); Job job = new Job (conf); job.setJarByClass (WordCount.class); / / set the output key value type job.setOutputKeyClass (Text.class); / / set the output value value type job.setOutputValueClass (IntWritable.class) Job.setMapperClass (WordCount.Map.class); job.setCombinerClass (WordCount.Combine.class); job.setReducerClass (WordCount.Reduce.class); job.setInputFormatClass (TextInputFormat.class); / / format the output, which is a custom FileOutputFormat class, as shown below. Job.setOutputFormatClass (ParseTextOutputFormat.class); FileInputFormat.setInputPaths (job, new Path (input)); FileOutputFormat.setOutputPath (job, new Path (output)); System.exit (job.waitForCompletion (true)? 0: 1);} public static JobConf config () {JobConf conf = new JobConf (WordCount.class); conf.setJobName ("WordCount") Conf.addResource ("classpath:/hadoop/core-site.xml"); conf.addResource ("classpath:/hadoop/hdfs-site.xml"); conf.addResource ("classpath:/hadoop/mapred-site.xml"); / / conf.set ("io.sort.mb", ""); return conf;}}

Custom file output format

Import java.io.DataOutputStream;import java.io.IOException;import java.io.UnsupportedEncodingException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FSDataOutputStream;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.NullWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.io.compress.CompressionCodec;import org.apache.hadoop.io.compress.GzipCodec;import org.apache.hadoop.mapreduce.RecordWriter Import org.apache.hadoop.mapreduce.TaskAttemptContext;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.ReflectionUtils;public class ParseTextOutputFormat extends FileOutputFormat {protected static class LineRecordWriter extends RecordWriter {private static final String utf8 = "UTF-8"; private static final byte [] newline; static {try {newline = "\ n" .getBytes (utf8) } catch (UnsupportedEncodingException uee) {throw new IllegalArgumentException ("can't find" + utf8 + "encoding");}} protected DataOutputStream out; private final byte [] keyValueSeparator; public LineRecordWriter (DataOutputStream out, String keyValueSeparator) {this.out = out; try {this.keyValueSeparator = keyValueSeparator.getBytes (utf8) } catch (UnsupportedEncodingException uee) {throw new IllegalArgumentException ("can't find" + utf8 + "encoding");} public LineRecordWriter (DataOutputStream out) {this (out, "\ t");} / * * Write the object to the byte stream, handling Text as a special * case. * @ param o the object to print * @ throws IOException if the write throws, we pass it on * / private void writeObject (Object o) throws IOException {if (o instanceof Text) {Text to = (Text) o; out.write (to.getBytes (), 0, to.getLength ());} else {out.write (o.toString (). GetBytes (utf8)) }} public synchronized void write (K key, V value) throws IOException {boolean nullKey = key = = null | | key instanceof NullWritable; boolean nullValue = value = = null | | value instanceof NullWritable; if (nullKey & & nullValue) {return;} if (! nullKey) {writeObject (key) } if (! (nullKey | | nullValue)) {out.write (keyValueSeparator);} if (! nullValue) {writeObject (value);} out.write (newline);} public synchronized void close (TaskAttemptContext context) throws IOException {out.close () } public RecordWriter getRecordWriter (TaskAttemptContext job) throws IOException, InterruptedException {Configuration conf = job.getConfiguration (); boolean isCompressed = getCompressOutput (job); String keyValueSeparator= conf.get ("mapred.textoutputformat.separator", ":"); CompressionCodec codec = null String extension = ""; if (isCompressed) {Class

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.