Hadoop development-WordCount 07/19 Update SLTechnology News&Howtos

Hadoop development-WordCount

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Reference http://hadoop.apache.org/docs/r2.7.6/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

Eclipse New maven Project

Pom file content

4.0.0

Hadoop_mapreduce

WordCount

0.0.1-SNAPSHOT

Jar

WordCount

Http://maven.apache.org

UTF-8

Org.apache.hadoop

Hadoop-client

2.8.0

Jdk.tools

1.8

System

C:\ Program Files\ Java\ jdk1.8.0_151\ lib\ tools.jar

Note: only hadoop-client packages are needed. If hbase-related packages are introduced, package conflicts are likely to occur and exceptions will occur in operation.

WordCount class code

Package hadoop_mapreduce.WordCount

Import java.io.IOException

Import java.io.InterruptedIOException

Import java.util.StringTokenizer

Import org.apache.hadoop.conf.Configuration

Import org.apache.hadoop.fs.Path

Import org.apache.hadoop.io.IntWritable

Import org.apache.hadoop.io.Text

Import org.apache.hadoop.mapreduce.Job

Import org.apache.hadoop.mapreduce.Mapper

Import org.apache.hadoop.mapreduce.Reducer

Import org.apache.hadoop.mapreduce.lib.input.FileInputFormat

Import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat

Public class WordCount {

Public static class TokenizerMapper

Extends Mapper {

Private final static IntWritable one = new IntWritable (1)

Private Text word = new Text ()

Public void map (Object key,Text value,Context context) throws IOException,InterruptedIOException, InterruptedException

{

StringTokenizer itr = new StringTokenizer (value.toString ())

While (itr.hasMoreTokens ()) {

Word.set (itr.nextToken ())

Context.write (word, one)

}

Public static class IntSumReducer

Extends Reducer {

Private IntWritable result = new IntWritable ()

Public void reduce (Text key, Iterable values,Context context) throws IOException,InterruptedException {

Int sum = 0

For (IntWritableval: values) {

Sum + = val.get ()

}

Result.set (sum)

Context.write (key, result)

}

/ / public static void main (String [] args) throws IOException, ClassNotFoundException, InterruptedException

Public static void main (String [] args) throws IOException, ClassNotFoundException, InterruptedException

{

/ *

* IntWritable intwritable = new IntWritable (1)

Text text = new Text ("abc")

System.out.println (text.toString ())

System.out.println (text.getLength ())

System.out.println (intwritable.get ())

System.out.println (intwritable)

StringTokenizer itr = new StringTokenizer ("www baidu com")

While (itr.hasMoreTokens ()) {

System.out.println (itr.nextToken ()); hdfs://192.168.50.107:8020/input hdfs://192.168.50.107:8020/output

, /

/ / String path = WordCount.class.getResource ("/") .toString ()

/ / System.out.println ("path =" + path)

System.out.println ("Connection end")

/ / System.setProperty ("hadoop.home.dir", "file://192.168.50.107/home/hadoop-user/hadoop-2.8.0");"

/ / String StringInput = "hdfs://192.168.50.107:8020/input/a.txt"

/ / String StringOutput = "hdfs://192.168.50.107:8020/output/b.txt"

Configuration conf = new Configuration ()

/ / conf.set ("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem")

/ / conf.addResource ("classpath:core-site.xml")

/ / conf.addResource ("classpath:hdfs-site.xml")

/ / conf.addResource ("classpath:mapred-site.xml")

/ / conf.set ("HADOOP_HOME", "/ home/hadoop-user/hadoop-2.8.0")

Job job = Job.getInstance (conf, "word count")

Job.setJarByClass (WordCount.class)

Job.setMapperClass (TokenizerMapper.class)

Job.setCombinerClass (IntSumReducer.class)

Job.setOutputKeyClass (Text.class)

Job.setOutputValueClass (IntWritable.class)

/ / FileInputFormat.addInputPath (job, new Path (StringInput))

/ / FileOutputFormat.setOutputPath (job, new Path (StringOutput))

FileInputFormat.addInputPath (job, new Path (args [0]))

FileOutputFormat.setOutputPath (job, new Path (args [1]))

System.exit (job.waitForCompletion (true)? 0:1)

}

The location of the configuration file for connecting to hadoop is shown in the figure

Eclipse execution will report an error: HADOOP_HOME and hadoop.home.dir are unset.

Compile and package and put it into linux system

Mvn clean

Mvn compile

Mvn pacakge

I put the packaged WordCount-0.0.1-SNAPSHOT.jar into the / home/hadoop-user/work directory

Run hadoop jar WordCount-0.0.1-SNAPSHOT.jar hadoop_mapreduce.WordCount.WordCount hdfs://192.168.50.107:8020/input hdfs://192.168.50.107:8020/output on linux

Note: if I don't have the classpath here, I will report an error and can't find the WordCount class. Put the files to be analyzed into the input directory of hdfs. The Output directory does not have to be created by yourself. The final analysis results will exist in the output directory

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.