Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Learning log-partitioner and sampler

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

In Mapreduce:

The shuffle phase is between map and reduce, and you can customize sorting, custom partitions, and custom grouping!

In Mapreduce, the data from map is a key-value pair. By default, hashPatitionner is used to partition the data from map.

There are several other ways to partition:

RandomSampler sampler = new InputSampler.RandomSampler (3000, 3000, 10); IntervalSampler sampler2 = new InputSampler.IntervalSampler (0.333, 10); SplitSampler sampler3 = new InputSampler.SplitSampler (reduceNumber)

Implementation and details

Public class TotalSortMR {@ SuppressWarnings ("deprecation") public static int runTotalSortJob (String [] args) throws Exception {Path inputPath = new Path (args [0]); Path outputPath = new Path (args [1]); Path partitionFile = new Path (args [2]); int reduceNumber = Integer.parseInt (args [3]) / / three samplers RandomSampler sampler = new InputSampler.RandomSampler (1, 3000, 10); IntervalSampler sampler2 = new InputSampler.IntervalSampler (0.333, 10); SplitSampler sampler3 = new InputSampler.SplitSampler (reduceNumber); / / Task initialization Configuration conf = new Configuration (); Job job = Job.getInstance (conf); job.setJobName ("Total-Sort") Job.setJarByClass (TotalSortMR.class); job.setInputFormatClass (KeyValueTextInputFormat.class); job.setMapOutputKeyClass (Text.class); job.setMapOutputValueClass (Text.class); job.setNumReduceTasks (reduceNumber); / / set all partition classes job.setPartitionerClass (TotalOrderPartitioner.class); / / partition files referenced by partition classes TotalOrderPartitioner.setPartitionFile (conf, partitionFile) / / which sampler InputSampler.writePartitionFile (job, sampler) is used in the partition; / / the input and output paths of job are FileInputFormat.setInputPaths (job, inputPath); FileOutputFormat.setOutputPath (job, outputPath); outputPath.getFileSystem (conf) .delete (outputPath, true); return job.waitForCompletion (true)? 0: 1 } public static void main (String [] args) throws Exception {System.exit (runTotalSortJob (args));}}

The default input format for job is TextInputFormat, which is in the form of key-value, key is the line label of each line, and value is the content of each line. Can be changed

Job.setInputFormatClass (,....)

Generally, you need to set the output format of mapper for later use.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report