Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to implement cdh3u3 hadoop 0.20.2 MultipleOutputs Multi-output File

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "how to achieve cdh3u3 hadoop 0.20.2 MultipleOutputs multi-output files". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

1. Create a new multest.txt file

11111usernamedepartment password22, Hebei normal University, Software College, 200811112, usernamepence password22, Hebei normal University, computer College, 200811113, usernamepr passwordpr 22xx University, Software Academy, 200811114usernamepr passwordpr 22rexxx University, School of computer, 200811115PersonnamePasswordPower232008

two。 Create a new directory on hdfs, hadoop dfs-mkdir multest

3. Upload the newly created text file to the multest directory: hadoop dfs-put / home/wjk/hadoop/multest.txt multest

4. New Map/Reduce project, save the format (7 bits) to dirtydata, save the data outside the Software School of Hebei normal University to otherschool, and save the data to the default file.

Public class Multest {public static class MultestMapper extends Mapper {private Text outkey = new Text (""); private MultipleOutputs mos; protected void map (Object key, Text value, Context context) throws IOException, InterruptedException {String line = value.toString () String details [] = line.split (","); if (details.length! = 7) {outkey.set (line); mos.write ("dirtydata", outkey, NullWritable.get ()) } else {String school = details [4]; String college = details [5] If (school.equals (Hebei normal University) & & college.equals (Software Institute)) {outkey.set (line); context.write (outkey, NullWritable.get ()) } else {outkey.set (line); mos.write ("otherschool", outkey, NullWritable.get ()) @ Override protected void setup (Context context) throws IOException, InterruptedException {mos = new MultipleOutputs (context); super.setup (context) } @ Override protected void cleanup (Context context) throws IOException, InterruptedException {mos.close (); super.cleanup (context) }} public static class MultestReducer extends Reducer {protected void reduce (Text key, Iterable values, Context context) throws IOException, InterruptedException {context.write (key, NullWritable.get ()) }} public static void main (String [] args) throws Exception {Configuration conf = new Configuration (); String [] otherArgs = new GenericOptionsParser (conf, args) .getRemainingArgs (); if (otherArgs.length! = 2) {System.err.println ("Usage:") System.exit (2);} Job job = new Job (conf, "multest"); job.setJarByClass (Multest.class); job.setMapperClass (MultestMapper.class); job.setReducerClass (MultestReducer.class); job.setOutputKeyClass (Text.class) Job.setOutputValueClass (NullWritable.class); FileInputFormat.addInputPath (job, new Path (otherArgs [0])); FileOutputFormat.setOutputPath (job, new Path (otherArgs [1])); MultipleOutputs.addNamedOutput (job, "dirtydata", TextOutputFormat.class, Text.class, NullWritable.class) MultipleOutputs.addNamedOutput (job, "otherschool", TextOutputFormat.class, Text.class, NullWritable.class); System.exit (job.waitForCompletion (true)? 0: 1);}}

5. Compile, export jar, run: hadoop jar. /.. / multest.jar com.wjk.test.Multest multest multestout

6. Run screenshot

= attention = =

Defect: there will be multiple scattered files running on the cluster

Add: according to the above way to produce a lot of files, merge is very difficult, you can execute the output directory, then merge by directory getmerge is easy. The main modification point is on mos.write, refer to the official code, it is very simple, understand it by yourself.

Public void write (String namedOutput, K key, V value) throws IOException, InterruptedException {write (namedOutput, key, value, namedOutput);} public void write (String namedOutput, K key, V value,String baseOutputPath) throws IOException, InterruptedException {checkNamedOutputName (this.context, namedOutput, false); checkBaseOutputPath (baseOutputPath); if (! (this.namedOutputs.contains (namedOutput) {throw new IllegalArgumentException ("Undefined named output'" + namedOutput + "');} TaskAttemptContext taskContext = getContext (namedOutput) GetRecordWriter (taskContext, baseOutputPath). Write (key, value);} "cdh3u3 hadoop 0.20.2 MultipleOutputs multi-output file implementation" content is introduced here, thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report