In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
Today, I will talk to you about how to achieve inverted index in MapReduce. Many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.
Requirements: inverted indexing of words in a, b, c text files
Output format:
A:
Hello world
Hello hadoop
Hello world
B:
Spark hadoop
Hello hadoop
World hadoop
C:
Spark world
Hello world
Hello spark
Map stage
Context.write ("hello:a", "1") context.write ("hello:a", "1") context.write ("hello:a", "1")
Map phase output:
Combine stage
Context.write ("hello", "context.write 3"); context.write ("hello", "BRV 1"); context.write ("hello", "CRV 2")
Combine phase output:
Reduce stage
Context.write ("hello", "aVera 3 miner blane 1 Magi cRO 2")
Reduce phase output:
Define the Mapper class, which inherits from org.apache.hadoop.mapreduce.Mapper
And override the map () method
Public class IIMapper extends Mapper {@ Override protected void map (LongWritable key, Text value, Context context) throws IOException, InterruptedException {String line = value.toString (); String [] words = StringUtils.split (line, ""); / / get file slices inputSplit FileSplit inputSplit = (FileSplit) context.getInputSplit () from context; / / get the absolute path of the file from inputSplit path String path = inputSplit.getPath (). ToString (); int index = path.lastIndexOf ("/") / / intercept the file name String fileName = path.substring (index + 1) from path; for (String word: words) {context.write (new Text (word + ":" + fileName), new Text ("1"));} / / map output}}
Define the Combiner class, which inherits from org.apache.hadoop.mapreduce.Reducer
The combine stage is the intermediate process between the map stage and the reduce stage.
And override the reduce () method
Public class IICombiner extends Reducer {@ Override protected void reduce (Text key, Iterable values, Context context) throws IOException, InterruptedException {String [] data = key.toString (). Split (":"); String word = data [0]; String fileName = data [1]; int count = 0; for (Text value: values) {count + = Integer.parseInt (value.toString ());} context.write (new Text (word), new Text (fileName + ":" + count)) / / combine output result}}
Define the Reducer class, which inherits from org.apache.hadoop.mapreduce.Reducer
And override the reduce () method
Public class IIReducer extends Reducer {@ Override protected void reduce (Text key, Iterable values, Context context) throws IOException, InterruptedException {StringBuilder sb = new StringBuilder (); for (Text value: values) {sb.append (value.toString () + "\ t");} context.write (key, new Text (sb.toString (); / / reduce output}}
Test inverted index
Public static void main (String [] args) throws IOException, ClassNotFoundException, InterruptedException {Job job = Job.getInstance (new Configuration ()); job.setJarByClass (InverseIndexRunner.class); / / set the main class job.setMapperClass (IIMapper.class) of job; / / set Mapper class job.setCombinerClass (IICombiner.class); / / set Combiner class job.setReducerClass (IIReducer.class); / / set Reducer class job.setMapOutputKeyClass (Text.class) / / set the type of map phase output Key job.setMapOutputValueClass (Text.class); / / set the type of map phase output Value job.setOutputKeyClass (Text.class); / / set the type of reduce phase output Key type job.setOutputValueClass (Text.class) / / set the type of reduce output Value / / set the job input path (obtained from the main method parameter args) FileInputFormat.setInputPaths (job, new Path (args [0])); / / set the job output path (obtained from the main method parameter args) FileOutputFormat.setOutputPath (job, new Path (args [1])); job.waitForCompletion (true); / / submit job}
The result file of job output:
Hadoop a:1 b:3
Hello b:1 c:2 a:3
Spark b:1 c:2
World c:2 b:1 a:2
After reading the above, do you have any further understanding of how to implement inverted indexes in MapReduce? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.