Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

MapReduce tuning

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

1. Related resource parameters

Mapreduce.map.memory.mb: the upper limit of resources that can be used by a maptask. The default is 1G. If it exceeds the set value, it will be forcibly killed.

Mapreduce.reduce.memory.mb: the upper limit of resources available for a Reduce Task is 1G by default. If it exceeds the set value, it will be forcibly killed.

Mapreduce.map.cpu.vcores: the maximum CPU core per maptask is 1 by default

Mapreduce.reduce.cpu.vcores: the maximum CPU core per reducetask is 1 by default

The following parameters should be configured in the server configuration file before yarn starts to take effect

Yarn.scheduler.minimum-allocation-mb=1024: the minimum memory allocated to the application container

Yarn.scheduler.maximum-allocation-mb=8192: the maximum memory allocated to the application container

Yarn.scheduler.minimum-allocation-vcores=1: the minimum number of CPU assigned to the application container

Yarn.scheduler.maximum-allocation-vcores=32: the maximum number of CPU assigned to the application container

Yarn.nodemanager.resource.memory-mb=8192: the nodemanager initiates the task

Mapreduce.task.io.sort.mb=100: the size of the ring buffer in the shuffle phase

Threshold of ring buffer in mapreduce.map.sort.spill.percent=0.8:shuffle phase

two。 Relevant fault-tolerant parameters

Mapreduce.map.maxattempts=4: the maximum number of retries per Map Task. Once the retry parameter exceeds this value, the Map Task is considered to have failed.

Mapreduce.reduce.maxattempts=4: the maximum number of retries per Reduce Task. Once the retry parameter exceeds this value, the Map Task is considered to have failed.

Mapreduce.map.failures.maxpercent=0: when the percentage of failed Map Task failures exceeds this value, the whole job fails, and the percentage of failed maptask cannot be greater than this value.

Mapreduce.reduce.failures.maxpercent=0: when the percentage of failed Reduce Task failures exceeds this value, the entire job fails

Mapreduce.task.timeout: if a task does not enter within a certain period of time, that is, it will not read new data or output data, it is considered that the task is in the block state, and it may be stuck or stuck forever. In order to prevent the user program from never exiting from block, a timeout (in milliseconds) is forcibly set, which defaults to 300000.

3. Run mapreduce jobs locally

Mapreduce.framework.name=local

Mapreduce.jobtracker.address=local

4. Parameters related to efficiency and stability

Mapreduce.map.speculative: whether to enable speculative execution mechanism for Map Task. Default is false.

Mapreduce.reduce.speculative: whether to enable speculative execution mechanism for Reduce Task. Default is false.

Mapreduce.input.fileinputformat.split.minsize: the minimum slice size of FileInputFormat when slicing

The maximum slice size of mapreduce.input.fileinputformat.split.maxsize:FileInputFormat when slicing (the default size of the slice is equal to blocksize, that is, 134217728)

Note: all parameter debugging is based on their own actual business logic debugging!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report