Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Adjustment of cdh YARN (tuning yarn production must be optimized) 004

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Tuesday, 2019-3-26

Adjust YARN

This topic applies only to YARN clusters and describes how to tune and optimize YARN for clusters.

Note: download the Cloudera YARN tuning spreadsheet to help calculate the YARN configuration. For a short video overview, see tuning YARN applications.

An overview

This overview provides an abstract description of the YARN cluster and the goals for YARN tuning.

The YARN cluster consists of hosts. The host provides memory and CPU resources. Vcore or virtual core is the share of host CPU usage.

Tuning the YARN mainly involves defining the container optimally on the working host. You can think of the container as a rectangle made up of memory and vcores. The container performs the task.

Some tasks use a lot of memory and deal with the least amount of data.

Other tasks require a lot of processing power, but use less memory. For example, Monte Carlo simulations evaluate many possible "what if?" The scenario uses a lot of processing power on relatively small datasets.

YARN ResourceManager allocates memory and vcores to use all available resources in the most efficient way possible. Ideally, little or no resources are idle.

An application is a YARN client program that consists of one or more tasks. Typically, tasks use all available resources in the container. A task cannot exceed its assigned allocation, ensuring that it cannot use all host CPU cycles or exceed its memory allocation.

Tune the YARN host to optimize vcores and memory usage by configuring the container to use all available resources except for overhead and other services.

There are three stages of YARN adjustment. These phases correspond to the tabs in the YARN adjustment spreadsheet.

1. Cluster configuration, where you can configure hosts.

2. YARN configuration, you can quantify memory and vcores.

3. MapReduce configuration, you can assign minimum and maximum resources to specific map and reduce tasks.

YARN and MapReduce have many configurable properties. For a complete list, see Cloudera Manager configuration Properties. The YARN tuning spreadsheet lists a basic subset of these attributes, which are most likely to improve the performance of common MapReduce applications.

Cluster configuration

In the Cluster Configuration tab, you can define the working host configuration and cluster size for the YARN implementation.

Machine configuration

Step 1: write a single host configuration in the worker host configuration / / table

Enter your possible machine configuration in the input box below. If you are not sure what kind of machine you plan to buy, please provide some minimum values that are appropriate for your expected purchase.

As with any system, the more memory and CPU resources available, the faster the cluster can process large amounts of data. A machine with four CPU with hyperthreading, each CPU has six cores, and each host provides 48 vcore.

The 3 TB hard drives in 2 unit server installations have 12 available slots in the JBOD (Just a Bunch Of Disks) configuration, which is a reasonable balance between performance and price when creating spreadsheets. Storage costs decrease over time, so you may consider using 4 TB disks. Larger disks are expensive and not all use cases are required.

Two 1 Gigabit Ethernet ports provide sufficient throughput when the spreadsheet is published, but 10 Gigabit Ethernet ports are a lower-than-speed option.

Step 2: worker moderator scheme

Now that you have obtained the basic host configuration from step 1, use the following table to allocate resources (mainly CPU and memory) to the various software components running on the host.

Start at least 8 GB from your operating system and at least 1 GB for Cloudera Manager. If services other than CDH require additional resources, add these numbers under other Services.

HDFS DataNode uses at least 1 kernel and approximately 1 GB of memory. The same requirements apply to YARN NodeManager.

The spreadsheet lists several optional services:

The Impala daemon requires at least 16 GB daemons.

HBase Region Servers requires 12-16 GB of memory.

The Solr server requires at least 1 GB of memory.

The Kudu tablet server requires at least 1 GB of memory.

Any remaining resources are available for YARN applications (Spark and MapReduce). In this example, 44 CPU cores are available. Set the required vcores multiplier on each physical core to calculate the total available vcores.

Step 3: cluster size

After defining the specification for each host in the cluster, enter the number of work hosts required to support the business case. To understand the benefits of parallel computing, set the number of hosts to a minimum of 10.

YARN configuration

On the YARN Configuration tab, verify the available resources and set the minimum and maximum limits for each container.

Step 4: configure YARN on the cluster

These are the first set of configuration values for the cluster. You can set these values in YARN- > Configuration

Steps 4 and 5: verify the settings

Step 4 extract the memory and vcore number from step 2.

Step 5 shows the total memory and vcores of the cluster.

Go to the resource manager Web UI (usually http://: 8088 / and verify that "memory totals" and "Vcores Total" match the above values. If your machine does not have bad nodes, the numbers should match exactly.

Step 6: verify the container settings on the cluster

In order for the YARN job to run cleanly, you need to configure container properties.

In step 6, you can change the values that affect the container size.

The minimum number of vcores should be 1. When additional vcores is required, adding 1 at a time will result in the most efficient allocation. Set the maximum number of vcore reservations to the size of the node.

Set the minimum and maximum reservation of memory. The increment should be the minimum amount that can affect performance. Here, the minimum value is about 1 GB, the maximum value is about 8 GB, and the increment is 512 MB.

Step 6A: cluster container capacity

This section will tell you the capacity of the cluster (in the case of containers).

Step 6A allows you to verify the minimum and maximum number of containers in the cluster based on the number entered.

Maximum number of possible containers, based on memory configuration

Maximum number of possible containers, based on vcore configuration

Container number based on 2 containers per spindle

According to the memory configuration, the minimum number of possible containers

According to the vcore configuration, the minimum number of possible containers

Step 6B: container health check

This section will do some basic checks on container parameters in STEP 6 for the host.

MapReduce configuration

On the MapReduce Configuration tab, you can plan for increased task-specific memory capacity.

Step 7: MapReduce configuration

For CDH 5.5 and later, we recommend that you specify heap or container sizes only for map and reduce tasks. Unspecified values are calculated based on the setting mapreduce.job.heap.memory-mb.ratio. This calculation follows the Cloudera Manager and calculates the heap size based on the ratio and container size.

Step 7A:MapReduce integrity check

Integrity check sets the minimum / maximum properties of the MapReduce to the container.

With step 7A, you can clearly confirm at a glance that all minimum and maximum resource allocations are within the parameters you set.

Continuous scheduling

Enabling or disabling continuous scheduling changes the frequency of YARN continuous or node-based heartbeat scheduling. For larger clusters (more than 75 nodes) to see heavy YARN workloads, it is recommended that you typically disable continuous scheduling using the following settings:

Yarn.scheduler.fair.continuous-scheduling-enabled should be false

Yarn.scheduler.fair.assignmultiple should be real.

On large clusters, continuous scheduling can cause ResourceManager to become unresponsive because continuous scheduling traverses all nodes in the cluster.

For more information about continuous scheduling tuning, see the following knowledge base article: FairScheduler uses assignmultiple tuning and continuous scheduling

We don't have 75 nodes, set to yarn.scheduler.fair.continuous-scheduling-enabled and true

Yarn.scheduler.fair.assignmultiple = true

Summary:

Calculation method of Container memory in yarn

Yarn.nodemanager.resource.memory-mb

Container virtual CPU kernel

Yarn.nodemanager.resource.cpu-vcores

/ / these two values are more useful than the last used resources. How much is left for yarn?

Tip: if yarn.nodemanager.resource.memory-mb=2G yarn.nodemanager.resource.cpu-vcores=2 has 4 nodes, then there will be "Memory Total=8G" and "VCores Total=8" on the 192.168.0.142:8088/cluster interface.

Minimum container memory

Yarn.scheduler.minimum-allocation-mb = 1G / / recommended value

Minimum number of container virtual CPU cores

Yarn.scheduler.minimum-allocation-vcores = 1 / / recommended value

Maximum container memory

Yarn.scheduler.maximum-allocation-mb = < yarn.nodemanager.resource.memory-mb

Maximum number of container virtual CPU cores

Yarn.scheduler.maximum-allocation-vcores = < yarn.nodemanager.resource.cpu-vcores / / recommended value

Container memory increment

Yarn.scheduler.increment-allocation-mb = 512m

Container virtual CPU kernel increment

Yarn.scheduler.increment-allocation-vcores = 1 / / recommended value

ApplicationMaster virtual CPU kernel

Yarn.app.mapreduce.am.resource.cpu-vcores = 1 / / recommended value

ApplicationMaster memory

Yarn.app.mapreduce.am.resource.mb = 1024m

ApplicationMaster Java option Library

Yarn.app.mapreduce.am.command-opts =-Djava.net.preferIPv4Stack=true-Xmx768m

Ratio of heap to container size

Mapreduce.job.heap.memory-mb.ratio = 0.8 default

Map task CPU virtual kernel

Mapreduce.map.cpu.vcores = 1

Map task memory

Mapreduce.map.memory.mb = 1024

Map Task Java option Library

Mapreduce.map.java.opts = ignore ignored

ICrow O sort memory buffer (MiB)

Mapreduce.task.io.sort.mb = 400

Reduce task CPU virtual kernel

Mapreduce.reduce.cpu.vcores = 1

Reduce task memory

Mapreduce.reduce.memory.mb = 1024m

Reduce Task Java option Library

Mapreduce.reduce.java.opts neglect

Map task maximum stack 819 / / is mapreduce.map.memory.mb 0.8

Reduce task maximum stack 819m / / mapreduce.reduce.memory.mb 0.8

ApplicationMaster Java maximum stack 819.2 / / this number is obtained by yarn.app.mapreduce.am.resource.mb * 0.8.

In the link, there is a suggested value, but this is a test cluster with very small resources. For more information on the production cluster, please see the last suggested value of the link.

Reference link:

Https://www.cloudera.com/documentation/enterprise/5-13-x/topics/cdh_ig_yarn_tuning.html

CDH5 performance tuning http://blog.hylstudio.cn/archives/488

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report