Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Partitioner in MapReduce

2025-01-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to use Partitioner in MapReduce". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let the editor take you to learn how to use Partitioner in MapReduce.

Question guide:

What is the purpose of the 1.Partitioner partition class?

What are the three parameters of 2.getPartition ()?

3.numReduceTasks refers to the number of Reducer tasks set. What is the default value?

Extend:

If different types of data are assigned to the same partition, is the output data still orderly?

In MapReduce calculation, it is sometimes necessary to divide the final output data into different files, for example, by province, you need to put the data from the same province in one file; if you are divided by gender, you need to put the data of the same gender into one file. We know that the final output data is from the Reducer task. So, if you want to get multiple files, it means that there are the same number of Reducer tasks running. The data of the Reducer task comes from the Mapper task, that is, the Mapper task should divide the data and assign it to different Reducer tasks to run for different data. The process by which the Mapper task divides the data is called Partition. The class responsible for implementing partitioning data is called Partitioner.

In the example we talked about earlier, partitions are never mentioned because the framework has built-in partition classes, called HashPartitioner. Let's take a look at the source code, as shown in figure 6-6

Figure 6-6

In figure 6-6, HashPartitioner handles the output of Mapper tasks, the getPartition () method has three formal parameters, key and value refer to the output of Mapper tasks, and numReduceTasks refers to the number of Reducer tasks set, and the default value is 1. Then the remainder of any integer divided by 1 must be 0. In other words, getPartition (…) The return value of method is always 0. That is, the output of a Mapper task is always sent to a Reducer task and can only be output to one file.

According to this analysis, if you want to eventually output to multiple files, the data should be divided into multiple extents in the Mapper task. So, we just need to follow certain rules to let getPartition (…) The return value of the method is 0pm 1pm 2pm 3. That's it.

Assuming we are zoned by gender, we can override the getpartition (…) of the Partitioner class. Method, the code is shown in figure 6-7

Figure 6-7

In figures 6-7, we divide by numPartitions using 0, 1, and 2, respectively. If you want to divide the data into three different outputs, it means that the value of numPartitions is 3. In this way, the values of 0% 3, 1% 3, and 2% 3 are three different. So, how do we use it? Only two operations are needed in the driver, as shown in figure 6-8

Figure 6-8

In figure 6-8, we used a custom partition class and developed a numReduceTasks. The numReduceTasks here internally assigns the value to the formal parameter numPartitions in the partition class.

At this point, I believe you have a deeper understanding of "how to use Partitioner in MapReduce". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 225

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report