In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "Hadoop how to achieve data de-duplication", the article explains the content is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in depth, together to study and learn "Hadoop how to achieve data de-duplication" bar!
Import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser Public class QuChong {/ * the idea of data de-reuse and merging * @ author hadoop * / public static class Engine extends Mapper {public void map (Object key, Text value, Context context) throws IOException, InterruptedException {String line = value.toString (); context.write (new Text (line), new Text ("")) }} public static class IntSumReducer extends Reducer {public void reduce (Text key, Iterable values, Context context) throws IOException, InterruptedException {context.write (key, new Text (""));}} public static void main (String [] args) throws Exception {/ / set engine configuration class, including engine address, engine input and output parameters (directory) Configuration conf = new Configuration () String [] otherArgs = new GenericOptionsParser (conf, args). GetRemainingArgs (); if (otherArgs.length! = 2) {System.err.println ("Usage: wordcount"); System.exit (2);} Job job = new Job (conf, "wordcount"); job.setJarByClass (QuChong.class); / / set Map, Combine and Reduce processing classes job.setMapperClass (Engine.class); job.setCombinerClass (IntSumReducer.class) Job.setReducerClass (IntSumReducer.class); / / set output class job.setOutputKeyClass (Text.class); job.setOutputValueClass (Text.class); / / set input class and input directory FileInputFormat.addInputPath (job, new Path (otherArgs [0])); FileOutputFormat.setOutputPath (job, new Path (otherArgs [1])); System.exit (job.waitForCompletion (true)? 0: 1) }} Thank you for your reading. The above is the content of "how to achieve data de-duplication in Hadoop". After the study of this article, I believe you have a deeper understanding of how to achieve data de-duplication in Hadoop, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.