Sample code for mapreduce in hadoop 07/06 Update SLTechnology News&Howtos

Sample code for mapreduce in hadoop

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the sample code of mapreduce in hadoop, which is very detailed and has certain reference value. Friends who are interested must finish it!

Package cn.itheima.bigdata.hadoop.mr.wordcount

Import java.io.IOException

Import org.apache.commons.lang.StringUtils

Import org.apache.hadoop.io.LongWritable

Import org.apache.hadoop.io.Text

Import org.apache.hadoop.mapreduce.Mapper

Public class WordCountMapper extends Mapper {

@ Override

Protected void map (LongWritable key, Text value,Context context)

Throws IOException, InterruptedException {

/ / get the contents of an one-line file

String line = value.toString ()

/ / the content of this line is divided into an array of words

String [] words = StringUtils.split (line, "")

/ / traverse the output

For (String word:words) {

Context.write (new Text (word), new LongWritable (1))

}

Package cn.itheima.bigdata.hadoop.mr.wordcount

Import java.io.IOException

Import org.apache.hadoop.io.LongWritable

Import org.apache.hadoop.io.Text

Import org.apache.hadoop.mapreduce.Reducer

Public class WordCountReducer extends Reducer {

/ / key: hello, values: {1,1,1,1,1.}

@ Override

Protected void reduce (Text key, Iterable values,Context context)

Throws IOException, InterruptedException {

/ / define an accumulation counter

Long count = 0

For (LongWritable value:values) {

Count + = value.get ()

}

/ / output key-value pairs

Context.write (key, new LongWritable (count))

}

Package cn.itheima.bigdata.hadoop.mr.wordcount

Import java.io.IOException

Import org.apache.hadoop.conf.Configuration

Import org.apache.hadoop.fs.Path

Import org.apache.hadoop.io.LongWritable

Import org.apache.hadoop.io.Text

Import org.apache.hadoop.mapreduce.Job

Import org.apache.hadoop.mapreduce.lib.input.FileInputFormat

Import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat

/ * *

* used to describe a job job (which mapper class is used, which reducer class is used, where the input file is located, and where the output results are placed. )

* then submit the job to the hadoop cluster

* @ author duanhaitao@itcast.cn

, /

/ / cn.itheima.bigdata.hadoop.mr.wordcount.WordCountRunner

Public class WordCountRunner {

Public static void main (String [] args) throws Exception {

Configuration conf = new Configuration ()

Job wcjob = Job.getInstance (conf)

/ / set the jar package used by job

Conf.set ("mapreduce.job.jar", "wcount.jar")

/ / set the jar package where the resources in the wcjob are located

Wcjob.setJarByClass (WordCountRunner.class)

/ / which mapper class to use for wcjob

Wcjob.setMapperClass (WordCountMapper.class)

/ / which reducer class to use for wcjob

Wcjob.setReducerClass (WordCountReducer.class)

/ / the kv data type output by the mapper class of wcjob

Wcjob.setMapOutputKeyClass (Text.class)

Wcjob.setMapOutputValueClass (LongWritable.class)

/ / the kv data type output by the reducer class of wcjob

Wcjob.setOutputKeyClass (Text.class)

Wcjob.setOutputValueClass (LongWritable.class)

/ / specify the path where the original data to be processed is stored

FileInputFormat.setInputPaths (wcjob, "hdfs://192.168.88.155:9000/wc/srcdata")

/ / specify the path to which the processed results are output.

FileOutputFormat.setOutputPath (wcjob, new Path ("hdfs://192.168.88.155:9000/wc/output"))

Boolean res = wcjob.waitForCompletion (true)

System.exit (res?0:1)

}

Pack it into mr.jar and put it on hadoop server.

[root@hadoop02 ~] # hadoop jar / root/Desktop/mr.jar cn.itheima.bigdata.hadoop.mr.wordcount.WordCountRunner

Java HotSpot (TM) Client VM warning: You have loaded library / home/hadoop/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.

It's highly recommended that you fix the library with 'execstack-c', or link it with'- z noexecstack'.

15-12-05 06:07:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicable

15-12-05 06:07:07 INFO client.RMProxy: Connecting to ResourceManager at hadoop02/192.168.88.155:8032

15-12-05 06:07:08 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.

15-12-05 06:07:09 INFO input.FileInputFormat: Total input paths to process: 1

15-12-05 06:07:09 INFO mapreduce.JobSubmitter: number of splits:1

15-12-05 06:07:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1449322432664_0001

15-12-05 06:07:10 INFO impl.YarnClientImpl: Submitted application application_1449322432664_0001

06:07:10 on 15-12-05 INFO mapreduce.Job: The url to track the job: http://hadoop02:8088/proxy/application_1449322432664_0001/

15-12-05 06:07:10 INFO mapreduce.Job: Running job: job_1449322432664_0001

15-12-05 06:07:22 INFO mapreduce.Job: Job job_1449322432664_0001 running in uber mode: false

15-12-05 06:07:22 INFO mapreduce.Job: map 0 reduce 0

15-12-05 06:07:32 INFO mapreduce.Job: map 100% reduce 0

15-12-05 06:07:39 INFO mapreduce.Job: map 100 reduce 100%

15-12-05 06:07:40 INFO mapreduce.Job: Job job_1449322432664_0001 completed successfully

15-12-05 06:07:41 INFO mapreduce.Job: Counters: 49

File System Counters

FILE: Number of bytes read=635

FILE: Number of bytes written=212441

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=338

HDFS: Number of bytes written=223

HDFS: Number of read operations=6

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=1

Launched reduce tasks=1

Data-local map tasks=1

Total time spent by all maps in occupied slots (ms) = 7463

Total time spent by all reduces in occupied slots (ms) = 4688

Total time spent by all map tasks (ms) = 7463

Total time spent by all reduce tasks (ms) = 4688

Total vcore-seconds taken by all map tasks=7463

Total vcore-seconds taken by all reduce tasks=4688

Total megabyte-seconds taken by all map tasks=7642112

Total megabyte-seconds taken by all reduce tasks=4800512

Map-Reduce Framework

Map input records=10

Map output records=41

Map output bytes=547

Map output materialized bytes=635

Input split bytes=114

Combine input records=0

Combine output records=0

Reduce input groups=30

Reduce shuffle bytes=635

Reduce input records=41

Reduce output records=30

Spilled Records=82

Shuffled Maps = 1

Failed Shuffles=0

Merged Map outputs=1

GC time elapsed (ms) = 211

CPU time spent (ms) = 1350

Physical memory (bytes) snapshot=221917184

Virtual memory (bytes) snapshot=722092032

Total committed heap usage (bytes) = 137039872

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=224

File Output Format Counters

Bytes Written=223

The above is all the content of this article "sample Code of mapreduce in hadoop". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.