Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Common classes of mapreduce in hadoop (1)

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Cloud Wisdom (Beijing) Technology Co., Ltd. Chen Xin

When I wrote this article, I realized that the new API and the old hadoop exist in 1.1.2 at the same time. I used to wonder why sometimes jobClient submitted the task, sometimes Job... Whether the API is updated or not, the following classes still exist in API, and the principles are the same after tracking the source code yourself. It's just that it's reorganized and encapsulated to make it more scalable. So put these things in from your notepad.

About the introduction and use of these classes, some of them are seen in their own debug, most of them are pure translation API comments, but the translation process benefits a lot.

GenericOptionsParser

ParseGeneralOptions (Optionsopts, Configuration conf, String [] args) parses command line arguments

GenericOptionsParser is a utility class that parses command-line arguments for the hadoop framework. It recognizes standard command-line parameters, making it easy for app to specify namenode,jobtracker, as well as additional configuration resources or information. It supports the following features:

-conf specified profile

-D specify configuration information

-fs specifies namenode

-jt specifies jobtracker

-files specifies files that need to be copy to the MR cluster, separated by commas

-libjars specifies the jar packages that require copy to classpath of the MR cluster, separated by commas

-archives specifies the compressed files that need to be copy to the MR cluster, separated by commas, and will be decompressed automatically

1.String [] otherArgs = new GenericOptionsParser (job, args)

2. GetRemainingArgs ()

3.if (otherArgs.length! = 2) {

4. System.err.println ("Usage: wordcount")

5. System.exit (2)

6.}

ToolRunner

Tools used to run and implement the Tool interface. It works with GenericOptionsParser to parse command-line arguments, changing only the parameters of configuration during this run.

Tool

The interface that handles command line arguments. Tool is the standard for any tool/app of MR. These implementations should be proxies for handling standard command line arguments. Here is a typical implementation:

1.public class MyApp extends Configured implements Tool {

two。

3. Public int run (String [] args) throws Exception {

4. / / Configuration to be executed by ToolRunner

5. Configuration conf = getConf ()

6.

7. / / use conf to establish JobConf

8. JobConf job = new JobConf (conf, MyApp.class)

9.

10. / / execute client parameters

11. Path in = new Path (args [1])

12. Path out = new Path (args [2])

13.

14. / / specify parameters related to job

15. Job.setJobName ("my-app")

16. Job.setInputPath (in)

17. Job.setOutputPath (out)

18. Job.setMapperClass (MyApp.MyMapper.class)

19. Job.setReducerClass (MyApp.MyReducer.class)

20.*

21. / / submit the job and then monitor the progress until the job is complete

twenty-two。 JobClient.runJob (job)

23. }

24.

25. Public static void main (String [] args) throws Exception {

twenty-six。 / / Let ToolRunner handle command line arguments

twenty-seven。 Int res = ToolRunner.run (new Configuration (), new Sort (), args); / / GenericOptionsParser parsing args is encapsulated here

twenty-eight。

twenty-nine。 System.exit (res)

thirty。 }

thirty-one。 }

MultipleOutputFormat

Customize the output file name or name format. In jobconf, setOutputFormat (a subclass of MultipleOutputFormat) is fine. Not the kind of part-r-00000 or something. And the results can be assigned to multiple files.

MultipleOutputFormat inherits FileOutputFormat and allows you to write output data to different output files. There are three application scenarios:

a. There is at least one mapreduce task for reducer. This reducer wants to write the output to different files based on the actual key. Suppose a key encodes the actual key and the location specified for the actual key

b. Only map's mission. This task wants to set the input file or part of the input content name to the output file name.

c. Only map's mission. When naming the output, this task relies on keys and the input file name.

1.Compact / here is the place where multiple files are generated based on key. You can see that there are also parameters such as value,name.

2.@Override

3.protected String generateFileNameForKeyValue (Text key

4. IntWritable value, String name) {

5. Char c = key.toString (). ToLowerCase (). CharAt (0)

6. If (c > ='a'& & c

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report