In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
Today, I would like to share with you the relevant knowledge about how to customize the partition in hadoop. The content is detailed and the logic is clear. I believe most people still know too much about this, so share this article for your reference. I hope you can get something after reading this article. Let's take a look.
Zoning concept
The word partition is not new to many students, for example, in many middleware of Java, such as kafka partition, mysql partition table and so on, the meaning of partition is to divide the data reasonably according to business rules, so as to facilitate the subsequent efficient processing of each partition data.
Hadoop partition
The partition in hadoop is to export different data to different reduceTask and finally to different files.
Hadoop default Partition rules
Hash partition
By hashCode% reduceTask quantity of key = partition number
The default number of reduceTask is 1, which can also be set on the driver side.
The following is the source code extracted from the Partition class, which is easy to understand
Hash Partition Code Demo
Here is the code for the driver part of the wordcount case. By default, we don't make any settings and finally output a txt file that counts the number of words. If we add a line like this to this code,
What happens when you run the following program again?
Public class DemoJobDriver {public static void main (String [] args) throws Exception {/ / 1, get job Configuration configuration = new Configuration (); Job job = Job.getInstance (configuration); / / 2, set jar path job.setJarByClass (DemoJobDriver.class); / / 3, associate mapper and Reducer job.setMapperClass (DemoMapper.class); job.setReducerClass (DemoReducer.class) / / 4. Set the key/val type of map output job.setMapOutputKeyClass (Text.class); job.setMapOutputValueClass (IntWritable.class); / / 5. Set the key/val type of final output job.setOutputKeyClass (Text.class); job.setOutputValueClass (IntWritable.class) / / 6. Set the final output path String inputPath = "F:\\ network disk\\ csv\\ hello.txt"; String outPath = "F:\\ network disk\\ csv\\ wordcount\\ hello_result.txt"; / / set the output file to 2 job.setNumReduceTasks (2); FileInputFormat.setInputPaths (job,new Path (inputPath)); FileOutputFormat.setOutputPath (job,new Path (outPath)) / / 7 submit job boolean result = job.waitForCompletion (true); System.exit (result? 0: 1);}}
You can see that two statistical result files are finally output, with different contents in each file. That is, by default, when the number of reducer is set to multiple, the results will be calculated according to the hash partition algorithm and output to the corresponding files of different partitions.
Custom Partition step
Custom classes inherit from Partitioner
Override the getPartition method, and in this method control different data into different partitions according to business rules
In the driver class of Job, set the custom Partitioner class
After customizing the Partition, set the appropriate number of ReduceTask according to the customized Partition logic
Business requirements
Put the names of the characters in the following file into the first partition according to their surnames, those with the surname "Ma", those with the surname "Li" into the second partition, and the others into the third partition.
Custom Partition
Import org.apache.commons.lang3.StringUtils;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.mapreduce.Partitioner;import org.apache.hadoop.io.Text;public class MyPartioner extends Partitioner {@ Override public int getPartition (Text text, IntWritable intWritable, int partion) {String key = text.toString (); if (StringUtils.isNotEmpty (key.trim () {if (key.startsWith ("horse")) {partion = 0 } else if (key.startsWith ("Li")) {partion = 1;} else {partion = 2;} return partion;}}
Associate custom partitions to the Driver class, and note that the number of ReduceTasks here is the same as the number of custom partitions
Job.setNumReduceTasks (3); job.setPartitionerClass (MyPartioner.class)
Next, run the Driver class to observe the final output, and output different surname data to different files as expected
Summary of Custom Partition
If the number of ReduceTask > the number of partitions in the custom partion, several more empty output files will be generated
If 1 < ReduceTask < customize the number of partitions in the partion, an exception will be thrown if the corresponding partition file storage cannot be found during data processing.
If ReduceTask = 1, no matter how many partitions are in the custom partion, the final result will only be handed over to this ReduceTask for processing, and only one result file will be produced.
The partition number must start at 0 and accumulate one by one
These are all the contents of the article "how to customize partitions in hadoop". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.