How to realize WordCount in Hadoop 07/03 Update SLTechnology News&Howtos

How to realize WordCount in Hadoop

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Editor to share with you how to achieve WordCount in Hadoop, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

WordCount is the most classic example of Hadoop application.

With the hadoop-2.6.0 version, the package directory that needs to be introduced is located in hadoop-2.6.0/share/hadoop/common/lib.

Source code import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat Import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;public class WordCount {public static class WordCountMap extends Mapper {private final static IntWritable one = new IntWritable (1); private Text word = new Text () Public void map (Object key, Text value, Context context) throws IOException, InterruptedException {String line = value.toString (); StringTokenizer tokenizer = new StringTokenizer (line) While (tokenizer.hasMoreTokens ()) {word.set (tokenizer.nextToken ()); context.write (word, one) } public static class WordCountReduce extends Reducer {public void reduce (Text key, Iterable values, Context context) throws IOException, InterruptedException {int sum = 0 For (IntWritableval: values) {sum + = val.get ();} context.write (key, new IntWritable (sum));}} public static void main (String [] args) throws Exception {Configuration conf = new Configuration () Job job = new Job (conf); job.setJarByClass (WordCount.class); job.setJobName ("wordcount"); job.setOutputKeyClass (Text.class); job.setOutputValueClass (IntWritable.class); job.setMapperClass (WordCountMap.class); job.setReducerClass (WordCountReduce.class) Job.setInputFormatClass (TextInputFormat.class); job.setOutputFormatClass (TextOutputFormat.class); FileInputFormat.addInputPath (job, new Path (args [0])); FileOutputFormat.setOutputPath (job, new Path (args [1])); job.waitForCompletion (true);}}

The input type of Mapper is text, the key is Object instead, and the value is text (Text).

The output type of Mapper is text, the key is Text, and the value is IntWritable, which is equivalent to the Integer integer variable in java. The segmented string is formed into key-value pairs.

For each line of input text, the map method is called once to split the input line.

While (tokenizer.hasMoreTokens ()) {word.set (tokenizer.nextToken ()); context.write (word, one);}

Change a line of text to such a key-value pair.

For each key, the reduce method is called once to sum the number of occurrences of the key.

Run the test

Export WordCount's Runable jar package with eclipse and put it in the directory hadoop-2.6.0/bin.

Create a new input folder under the directory hadoop-2.6.0/bin, and create new files file1, file2.

File1 content is one titus two titus three titus

File2 content is one huangyi two huangyi

. ├── container-executor ├── hadoop ├── hadoop.cmd ├── hdfs ├── hdfs.cmd ├── input │ ├── file1.txt │ └── file2.txt ├── mapred ├── mapred.cmd ├── rcc ├── test-container-executor ├── wordcount.jar ├── yarn └── yarn.cmd

Run. / hadoop jar wordcount.jar input output

The output directory and results are generated.

Huangyi 2one 2three 1titus 3two 2 and above are all the contents of the article "how to implement WordCount in Hadoop". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.