How to configure Yarn in Hadoop 04/18 Update SLTechnology News&Howtos

How to configure Yarn in Hadoop

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article is about how to configure Yarn in Hadoop. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

As part of the HDP 2.0 Beta, YARN takes the resource management features in MapReduce and packages them so that new engines can use them. This also simplifies MapReduce to do the best it can do with data. With YARN, you can now run multiple applications in Hadoop, all of which share a common resource management.

In this blog post, we will show you how to plan and configure processing power in an enterprise HDP 2.0 cluster deployment. This will overwrite YARN and MapReduce 2. We will use the slave nodes of an example physical cluster, each with 48 GB RAM,12 disks and 2 Hex core CPU (a total of 12 cores).

YARN takes into account all available computing resources on each computer in the cluster. Based on available resources, YARN will negotiate resource requests for applications running in the cluster, such as MapReduce. YARN then provides processing power for each application through the allocation container. Container is the basic unit of processing power in YARN and the encapsulation of resource elements (memory, cpu, etc.).

Configure YARN

In a Hadoop cluster, it is important to balance the use of RAM,CPU and disks so that processing is not limited by any of the cluster resources. As a general recommendation, we found that 1-2 containers per disk and core are allowed to provide the best balance for cluster utilization. Therefore, for a sample cluster node with 12 disks and 12 cores, we will allow 20 maximum containers to be allocated to each node.

Each machine in our cluster has a 48 GB RAM. Some RAM should be reserved for the operating system. On each node, we will allocate 40 GB RAM for YARN use and reserve 8 GB for the operating system. The following properties set the maximum memory that YARN can use on the node:

In yarn-site.xml

Yarn.nodemanager.resource.memory-mb40960

The next step is to provide YARN guidance on how to decompose the total resources available in the container. Do this by specifying the smallest unit of RAM to assign to the container. We want to allow a maximum of 20 containers, so we need (total 40 GB memory) / (20 containers) = at least 2 GB per container:

In yarn-site.xml

Yarn.scheduler.minimum-allocation-mb 2048

YARN will allocate RAM capacity with a capacity greater than yarn.scheduler.minimum-allocation-mb.

Configure MAPREDUCE 2

MapReduce 2 runs on top of YARN and uses the YARN container to schedule and execute its mapping and reduction tasks.

When configuring MapReduce 2 resource utilization on YARN, there are three aspects to consider:

Physical RAM limits per Map and Reduce task

JVM heap size limit per task

The amount of virtual memory that will be obtained per task

You can define the maximum amount of memory that each Map and Reduce task will consume. Since each Map and each Reduce will run in a separate container, these maximum memory settings should be at least equal to or greater than the YARN minimum container allocation.

For our example cluster, we have a container with a minimum RAM (yarn.scheduler.minimum-allocation-mb) = 2 GB. Therefore, we will assign 4 GB to the Map task container and 8 GB to the Reduce task container.

In mapred-site.xml:

Mapreduce.map.memory.mb 4096 mapreduce.reduce.memory.mb 8192

Each container will run the JVM of the Map and Reduce tasks. The JVM heap size should be set below the Map and Reduce memory defined above so that they fall within the boundaries of the Container memory allocated by YARN.

In mapred-site.xml:

Mapreduce.map.java.opts-Xmx3072m mapreduce.reduce.java.opts-Xmx6144m

The above setting sets the upper limit of the physical RAM that will be used by the configuration Map and Reduce tasks. The upper limit of virtual memory (physical + paged memory) for each Map and Reduce task is determined by the virtual memory ratio allowed per YARN container. This is set through the following configuration, with a default value of 2.1:

In yarn-site.xml:

Yarn.nodemanager.vmem-pmem-ratio 2.1

Therefore, using the above settings on our sample cluster, each Map task will get the following memory allocation with the following:

Total physical RAM allocation = 4 GB

Upper limit of JVM heap space in Map task Container = 3 GB

Upper limit of virtual memory = 4 * 2.1 = 8.2 GB

With YARN and MapReduce 2, there are no preconfigured static slots for Map and Reduce tasks. The whole cluster can be used for dynamic resource allocation of Maps and can be reduced according to the needs of the job. In our example cluster, with the above configuration, YARN will be able to allocate up to 10 mappers (40amp 4) or five reducer (40amp 8) or their replacements on each node.

Thank you for reading! This is the end of this article on "how to configure Yarn in Hadoop". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.