Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to realize the number Control of hadoop map

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "how to achieve hadoop map number control". In daily operation, I believe many people have doubts about how to realize hadoop map number control. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts of "how to achieve hadoop map number control". Next, please follow the editor to study!

Hadooop provides a parameter mapred.map.tasks to set the number of map, which we can use to control the number of map. However, setting the number of map in this way is not valid every time. The reason is that mapred.map.tasks is only a reference value of hadoop, and the final number of map also depends on other factors.

For convenience, let's take a look at a few nouns:

Block_size: the file block size of hdfs. 1.x defaults to 64m, 2.x to 128m, which can be set by parameter dfs.block.size.

Total_size: the overall size of the input file

Input_file_num: number of input files

Mapred.tasktracker.map.tasks.maximum: the maximum number of mapping tasks that the task tracker will run simultaneously. Default is 2.

(1) default number of map

If nothing is set, the default number of map is related to blcok_size.

Default_num = total_size / block_size

(2) expected size

The number of map expected by the programmer can be set through the parameter mapred.map.tasks, but this number will only take effect if it is greater than default_num.

Goal_num = mapred.map.tasks

(3) set the file size for processing

You can set the file size processed by each task through mapred.min.split.size, but this size takes effect only if it is greater than block_size.

Split_size = max (mapred.min.split.size, block_size)

Split_num = total_size / split_size

(4) the number of map calculated

Compute_map_num = min (split_num, max (default_num, goal_num))

In addition to these configurations, mapreduce has some principles to follow. The data processed by each map of mapreduce cannot be across files, that is, min_map_num > = input_file_num. Therefore, the final number of map should be:

Final_map_num = max (compute_map_num, input_file_num)

After the above analysis, when setting the number of map, it can be simply summarized as follows:

(1) if you want to increase the number of map, set mapred.map.tasks to a larger value.

(2) if you want to reduce the number of map, set mapred.min.split.size to a larger value.

(3) if there are many small files in the input, and you still want to reduce the number of map, you need to merger the small files into large files, and then use guideline 2.

(5) mapred.tasktracker.map.tasks.maximum > = mapred.map.tasks

Add: 1.x and 2.x parameter names have been changed in hadoop version

1.x name 2.x name

Mapred.map.tasks mapreduce.job.maps

Mapred.min.split.size mapreduce.input.fileinputformat.split.minsize

Mapred.tasktracker.map.tasks.maximum mapreduce.tasktracker.map.tasks.maximum

At this point, the study of "how to control the number of hadoop map" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report