What is the method of CDH cluster tuning 07/04 Update SLTechnology News&Howtos

What is the method of CDH cluster tuning

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

In this article, the editor introduces in detail "what is the method of CDH cluster tuning". The content is detailed, the steps are clear, and the details are handled properly. I hope this article "what is the method of CDH cluster tuning" can help you solve your doubts.

DRF and related parameters

DRF: Dominant Resource Fairness to schedule resources fairly according to CPU and memory. The CDH dynamic resource pool defaults to the DRF scheduling strategy. The simple understanding is that when there is not enough memory, the extra CPU will not allocate the task and leave it empty; when there is not enough CPU, the extra memory will no longer start the task.

After understanding this scheduling policy, look at the resource-related parameters when Yarn starts the task, and find that the following parameters may have an impact:

Mapreduce.map.memory.mb, map task memory, cdh defaults to 1G

Mapreduce.map.cpu.vcores, virtual CPU cores of map task. Cdh defaults to 1.

Mapreduce.reduce.memory.mb, reduce task memory, cdh defaults to 1G

Mapreduce.reduce.cpu.vcores, virtual CPU cores of reduce task. Cdh defaults to 1.

Yarn.nodemanager.resource.memory-mb, container memory, cdh defaults to 8G

Yarn.nodemanager.resource.cpu-vcores, virtual CPU cores of the container. Cdh defaults to 8, but CM automatically detects and modifies the number of cores. Here, it is automatically changed to 24.

You can see that in the default configuration, the number of CPU cores and memory is 1G / 1G to start the task.

Then take a look at the memory allocated to Yarn, and it is sure enough that it is 8 × 15cm 120g, so the available memory is much smaller than the available vcores. As a result, a maximum of 120g vcores can be used according to the ratio of 1vcores to 1G.

test

To confirm my conjecture, I adjusted the yarn.nodemanager.resource.memory-mb to 16G (we have 128g of memory, enough). After restarting yarn, start MR again, so you have the following figure:

You can see that before the parameter adjustment, the available memory of Yarn is 120g. After adjustment, the memory of 240G has changed from 120g to 240g. At this point, it is proved that the conjecture is correct.

So for this cluster, with 128 gigabytes of memory and 24 kernels, you can set the yarn.nodemanager.resource.memory-mb parameter to 24 gigabytes so that all the CPU can be used.

Test result

When yarn.nodemanager.resource.memory-mb is 8G:

Time taken: 3794.17 secondsTotal MapReduce CPU Time Spent: 3 days 10 hours 43 minutes 22 seconds 640 msec

When the yarn.nodemanager.resource.memory-mb is 16G:

Time taken: 2077.138 secondsTotal MapReduce CPU Time Spent: 3 days 12 hours 55 minutes 43 seconds 210 msec

As you can see, it is indeed much faster. (ps: tasks that run two times use different data to avoid cache causing the same task to run the second time faster than the first one, but the amount of data used by the two tasks is about the same, about 650g.)

Other view VCores SQLSELECT allocated_vcores_cumulative, available_vcores where category=YARN_POOL and serviceName= "yarn" and queueName=root to view the memory allocated to Yarn SQLSELECT allocated_memory_mb_cumulative, available_memory_mb where category=YARN_POOL and serviceName= "yarn" and queueName=root

Of course, the easiest way to view it is on the dynamic Resource Pool page of CM.

After reading this, the article "what is the method of CDH cluster tuning" has been introduced. If you want to master the knowledge points of this article, you still need to practice and use it yourself. If you want to know more about related articles, welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.