Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to implement inverted Index in MapReduce

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about how to achieve inverted index in MapReduce. Many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

Requirements: inverted indexing of words in a, b, c text files

Output format:

A:

Hello world

Hello hadoop

Hello world

B:

Spark hadoop

Hello hadoop

World hadoop

C:

Spark world

Hello world

Hello spark

Map stage

Context.write ("hello:a", "1") context.write ("hello:a", "1") context.write ("hello:a", "1")

Map phase output:

Combine stage

Context.write ("hello", "context.write 3"); context.write ("hello", "BRV 1"); context.write ("hello", "CRV 2")

Combine phase output:

Reduce stage

Context.write ("hello", "aVera 3 miner blane 1 Magi cRO 2")

Reduce phase output:

Define the Mapper class, which inherits from org.apache.hadoop.mapreduce.Mapper

And override the map () method

Public class IIMapper extends Mapper {@ Override protected void map (LongWritable key, Text value, Context context) throws IOException, InterruptedException {String line = value.toString (); String [] words = StringUtils.split (line, ""); / / get file slices inputSplit FileSplit inputSplit = (FileSplit) context.getInputSplit () from context; / / get the absolute path of the file from inputSplit path String path = inputSplit.getPath (). ToString (); int index = path.lastIndexOf ("/") / / intercept the file name String fileName = path.substring (index + 1) from path; for (String word: words) {context.write (new Text (word + ":" + fileName), new Text ("1"));} / / map output}}

Define the Combiner class, which inherits from org.apache.hadoop.mapreduce.Reducer

The combine stage is the intermediate process between the map stage and the reduce stage.

And override the reduce () method

Public class IICombiner extends Reducer {@ Override protected void reduce (Text key, Iterable values, Context context) throws IOException, InterruptedException {String [] data = key.toString (). Split (":"); String word = data [0]; String fileName = data [1]; int count = 0; for (Text value: values) {count + = Integer.parseInt (value.toString ());} context.write (new Text (word), new Text (fileName + ":" + count)) / / combine output result}}

Define the Reducer class, which inherits from org.apache.hadoop.mapreduce.Reducer

And override the reduce () method

Public class IIReducer extends Reducer {@ Override protected void reduce (Text key, Iterable values, Context context) throws IOException, InterruptedException {StringBuilder sb = new StringBuilder (); for (Text value: values) {sb.append (value.toString () + "\ t");} context.write (key, new Text (sb.toString (); / / reduce output}}

Test inverted index

Public static void main (String [] args) throws IOException, ClassNotFoundException, InterruptedException {Job job = Job.getInstance (new Configuration ()); job.setJarByClass (InverseIndexRunner.class); / / set the main class job.setMapperClass (IIMapper.class) of job; / / set Mapper class job.setCombinerClass (IICombiner.class); / / set Combiner class job.setReducerClass (IIReducer.class); / / set Reducer class job.setMapOutputKeyClass (Text.class) / / set the type of map phase output Key job.setMapOutputValueClass (Text.class); / / set the type of map phase output Value job.setOutputKeyClass (Text.class); / / set the type of reduce phase output Key type job.setOutputValueClass (Text.class) / / set the type of reduce output Value / / set the job input path (obtained from the main method parameter args) FileInputFormat.setInputPaths (job, new Path (args [0])); / / set the job output path (obtained from the main method parameter args) FileOutputFormat.setOutputPath (job, new Path (args [1])); job.waitForCompletion (true); / / submit job}

The result file of job output:

Hadoop a:1 b:3

Hello b:1 c:2 a:3

Spark b:1 c:2

World c:2 b:1 a:2

After reading the above, do you have any further understanding of how to implement inverted indexes in MapReduce? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report