How to set the number of map and reduce in hadoop 07/15 Update SLTechnology News&Howtos

How to set the number of map and reduce in hadoop

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article shows you how to set the number of map and reduce in hadoop. The content is concise and easy to understand. It will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Number of 1 map

The number of map is usually determined by the DFS block size of the hadoop cluster, that is, the total number of input files. The parallel scale of the normal number of map is approximately 10 million per Node. For jobs with low CPU consumption, you can set the number of Map to about 300. However, since none of the tasks in hadoop takes a certain amount of time to initialize, it is reasonable that each map takes at least 1 minute to execute. The specific data sharding is like this. By default, InputFormat is sharded according to the DFS block size of the hadoop cluster, and each shard is processed by a map task. Of course, users can customize the settings in the job submission client by using the parameter mapred.min.split.size parameter. Another important parameter is mapred.map.tasks, which sets the number of map only as a hint, and only works when InputFormat determines the number of map tasks that are smaller than the mapred.map.tasks value. Similarly, the number of Map tasks can be set manually by using JobConf's conf.setNumMapTasks (int num) method. This method can be used to increase the number of map tasks, but cannot set the number of tasks to be less than the value obtained by dividing the input data of the Hadoop system. Of course, in order to improve the concurrency efficiency of the cluster, you can set a default number of map. When the number of users' map is small or smaller than the value of automatic segmentation, you can use a relatively large default value, so as to improve the efficiency of the overall hadoop cluster.

2 number of reduece

At run time, reduce often needs to copy data from the relevant map side to the reduce node to process, so compared to the map task. Reduce node resources are relatively scarce and run relatively slowly, so the number of correct reduce tasks should be 0.95or 1.75* (number of nodes × mapred.tasktracker.tasks.maximum parameter value). If the number of tasks is 0.95 times the number of nodes, then all reduce tasks can start running at the same time after the output transfer of the map task ends. If the number of tasks is 1.75 times the number of nodes, then the high-speed nodes will start to calculate the second batch of reduce tasks after completing their first batch of reduce tasks, which is more conducive to load balancing. At the same time, it should be noted that increasing the number of reduce will increase the resource overhead of the system, but it can improve the load balance and reduce the negative impact of task failure. Similarly, like map tasks, Reduce tasks can increase the number of tasks by setting the conf.setNumReduceTasks (int num) method of JobConf.

3 the number of reduce is 0

Some jobs do not need to be reduced for processing, so you can set the number of reduce to 0 for processing. In this case, the user's job runs at a relatively high speed, and the output of map is written directly to the output directory set by SetOutputPath (path) instead of being written locally as an intermediate result. At the same time, the Hadoop framework does not sort the file system before it is written.

The above is how to set the number of map and reduce in hadoop. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.