How to tune hadoop performance 02/13 Update SLTechnology News&Howtos

How to tune hadoop performance

2026-02-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article shows you how to tune the performance of hadoop, the content is concise and easy to understand, it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Operating system tuning

Increase the upper limit for opening file descriptors and network connections at the same time

The default maximum number of connections for the operating system is 128 (sysctl-a | grep net.core.somaxconn), and / etc/sysctl.conf adds net.core.somaxconn=32767

The default number of file descriptors opened by linux is 183731, and fs.file-max=800000 is also added to sysctl.conf.

Execute systcl-a to view systcl-p to refresh the configuration

Close the swap partition

In MR distributed environment. Users can avoid using swap partition by controlling the amount of data processed by each job and the size of each buffer used in each task.

Set a reasonable pre-read buffer size

Disk IO performance lags behind CPU and memory. Setting pre-read can reduce disk seek and application IO waiting time, and use linux blockdev to set read buffer size.

File system configuration

Enable the noatime property of linux. (/ etc/fstab)

IO scheduler selection

Reference Hadoop Performance Tuning Guide

Hadoop parameter tuning

Disk block configuration

The process of analyzing shuffle in previous blog posts has mentioned how to configure mapreduce.cluster.local.dir to write tmp files to other local hard drives, which can improve IO.

Choose the appropriate compression algorithm

Mapreduce.map.output.compress=true

Mapreduce.map.output.compress.codec=XXCodec

Modify ifile pre-read size

The pre-read buffer size mapreduce.ifile.readahead.bytes can be modified appropriately according to the project requirements.

Application tuning

Set up Combiner

Increase the number of copies of the input file

The above is how to tune the performance of hadoop. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.