Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the number of map in hadoop

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "what is the number of map in hadoop". In daily operation, I believe many people have doubts about the number of map in hadoop. Xiaobian consulted all kinds of data and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the question of "how many map in hadoop". Next, please follow the editor to study!

Hadooop provides a parameter mapred.map.tasks to set the number of map, which we can use to control the number of map. However, setting the number of map in this way is not valid every time. The reason is that mapred.map.tasks is only a reference value of hadoop, and the final number of map also depends on other factors.

For convenience, let's take a look at a few nouns:

Block_size: the file block size of hdfs. The default is 64m, which can be set by parameter dfs.block.size.

Total_size: the overall size of the input file

Input_file_num: number of input files

(1) default number of map

If nothing is set, the default number of map is related to blcok_size.

Default_num = total_size / block_size

(2) expected size

The number of map expected by the programmer can be set through the parameter mapred.map.tasks, but this number will only take effect if it is greater than default_num.

Goal_num = mapred.map.tasks

(3) set the file size for processing

You can set the file size processed by each task through mapred.min.split.size, but this size takes effect only if it is greater than block_size.

Split_size = max (mapred.min.split.size, block_size)

Split_num = total_size / split_size

(4) the number of map calculated

Compute_map_num = min (split_num, max (default_num, goal_num))

In addition to these configurations, mapreduce has some principles to follow. The data processed by each map of mapreduce cannot be across files, that is, min_map_num > = input_file_num. Therefore, the final number of map should be:

Final_map_num = max (compute_map_num, input_file_num)

After the above analysis, when setting the number of map, it can be simply summarized as follows:

(1) if you want to increase the number of map, set mapred.map.tasks to a larger value.

(2) if you want to reduce the number of map, set mapred.min.split.size to a larger value.

(3) if there are many small files in the input, and you still want to reduce the number of map, you need to merger the small files into large files, and then use guideline 2.

At this point, the study of "what is the number of map in hadoop" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report