Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to determine the number of map and reduce in hadoop

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article is about how to determine the number of maps and reduce in hadoop. Xiaobian thinks it is quite practical, so share it with everyone for reference. Let's follow Xiaobian and have a look.

Number of maps

The number of maps is usually determined by the DFS block size of the hadoop cluster, that is, the total number of input files. The normal parallel size of the number of maps is roughly 10~100 for each Node. For jobs with low CPU consumption, the number of maps can be set to about 300. However, since each task of hadoop needs a certain amount of time to initialize, it is reasonable that each map takes at least 1 minute to execute. The specific data fragmentation is as follows. By default, InputFormat will fragment according to the DFS block size of the hadoop cluster. Each fragment will be processed by a map task. Of course, users can customize it in the job submission client through the parameter mapred.min.split.size. Another important parameter is mapred.map.tasks. The number of maps set by this parameter is only a hint. It only works if the InputFormat determines that the number of map tasks is smaller than the mapred.map.tasks. Similarly, the number of Map tasks can be manually set using JobConf's conf.setNumMapTasks(int num) method. This method can be used to increase the number of map tasks, but you cannot set the number of tasks to be less than the Hadoop system gets by splitting the input data. Of course, in order to improve the concurrency efficiency of the cluster, you can set a default number of maps. When the number of maps of the user is small or smaller than the value of the automatic partition itself, you can use a relatively large default value to improve the efficiency of the overall hadoop cluster.

2 Number of reduece

Reduce often needs to copy data from the relevant map side to the reduce node for processing at runtime, so it is compared to map tasks. Reduce node resources are relatively scarce and relatively slow to run. The correct number of reduce tasks should be 0.95 or 1.75 *(number of nodes ×mapred.tasktracker.tasks.maximum parameter value). If the number of tasks is 0.95 times the number of nodes, then all reduce tasks can start running simultaneously after the output of the map task has been transmitted. If the number of tasks is 1.75 times the number of nodes, then the high-speed nodes will start computing the second batch of reduce tasks after completing their first batch of reduce tasks, which is more conducive to Load Balancer. At the same time, although increasing the number of reduce will increase the resource overhead of the system, it can improve load balancing and reduce the negative impact of task failure. Similarly, the Reduce task can also be increased by setting the conf.setNumReduceTasks(int num) method of JobConf, just like the map task.

3 reduce the number to 0

Some jobs do not need to be reduced for processing, so you can set the number of reduce to 0 for processing. In this case, the user's job runs relatively fast, and the output of map will be written directly to the output directory set by SetOutputPath(path) instead of being written locally as an intermediate result. Also, the Hadoop framework does not sort the file system before writing to it.

Thank you for reading! About "how to determine the number of map and reduce in hadoop" this article is shared here, I hope the above content can be of some help to everyone, so that everyone can learn more knowledge, if you think the article is good, you can share it to let more people see it!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report