Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the frequently asked questions about Hadoop YARN

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains "what are the common questions about Hadoop YARN". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn what are the common problems in Hadoop YARN.

(1) by default, the load of each node is uneven (the number of tasks varies), some nodes have many tasks running, some have no tasks, how to make the number of tasks on each node as balanced as possible?

A: by default, the resource scheduler is in batch scheduling mode, that is, a heartbeat assigns as many tasks as possible. In this way, the node that sends the heartbeat first will lead the task (premise: the number of tasks is much less than the number of tasks that the cluster can run at the same time). To avoid this situation, you can configure the parameters as follows:

If you are using fair scheduler, in yarn-site.xml, set the parameter yarn.scheduler.fair.max.assign to 1 (default is-1,)

If you are using capacity scheduler (the default scheduler), you cannot configure it, which currently does not have features such as load balancing.

Of course, from the perspective of hadoop cluster utilization, this problem is not a problem, because in general, the number of user tasks is much larger than the concurrent processing capacity of the cluster, that is to say, the cluster is busy all the time and no node is idle all the time.

(2) if there are too many tasks on a node and the resource utilization is too high, how to control the number of tasks on a node?

Answer: the number of tasks running on a node is mainly determined by two factors, one is the total resources available for NodeManager, and the other is the resource requirements for a single task. For example, if the resources available on a NodeManager are 8 GB memory and 8 cpu, and the resource requirements for a single task are 1 GB memory and 1cpu, the node can run a maximum of 8 tasks.

The resources available on NodeManager are configured by the administrator in the configuration file yarn-site.xml, and the relevant parameters are as follows:

Yarn.nodemanager.resource.memory-mb: total amount of physical memory available. Default is 8096.

Yarn.nodemanager.resource.cpu-vcores: total number of available CPU. Default is 8.

For MapReduce, the amount of task resources for each job can be set by the following parameters:

Mapreduce.map.memory.mb: amount of physical memory. Default is 1024.

Number of mapreduce.map.cpu.vcores:CPU. Default is 1.

Note: for a detailed description of these configuration properties, please refer to the article: analysis of Hadoop YARN configuration parameters (1)-parameters related to RM and NM.

By default, each scheduler will only schedule memory resources and will not consider CPU resources. You need to make relevant settings in the scheduler configuration file.

(3) how to set the amount of memory and the number of CPU consumed by a single task?

A: for MapReduce, the amount of task resources for each job can be set by the following parameters:

Mapreduce.map.memory.mb: amount of physical memory. Default is 1024.

Number of mapreduce.map.cpu.vcores:CPU. Default is 1.

It is important to note that by default, each scheduler will only schedule memory resources and will not consider CPU resources. You need to make relevant settings in the scheduler configuration file.

(4) when the amount of memory set by the user to the task is 1000MB, why is the final allocated memory 1024MB?

A: in order to facilitate the management and scheduling of resources, Hadoop YARN has a built-in resource regularization algorithm, which specifies the minimum amount of resources that can be applied for, the amount of resources that can be applied for and the resource standardization factor. If the amount of resources requested by the application is less than the minimum amount of resources that can be applied for, YARN will change its size to the minimum amount of resources that can be applied for, that is to say, the resources obtained by the application will not be less than the resources it applies for, but it is not necessarily equal. If the amount of resources requested by the application is greater than the amount of resources that can be applied for, an exception will be thrown and the application will not succeed. The regularization factor is used to normalize the resources of the application. If the resource requested by the application is not an integral multiple of that factor, it will be modified to the value corresponding to the smallest integer multiple. The formula is ceil (a _ b) * b, where an is the resource applied by the application and b is the regularization factor.

The parameters described above need to be set in yarn-site.xml. The relevant parameters are as follows:

Yarn.scheduler.minimum-allocation-mb: the minimum amount of memory that can be applied for. Default is 1024.

Yarn.scheduler.minimum-allocation-vcores: the minimum number of CPU that can be applied for. Default is 1.

Yarn.scheduler.maximum-allocation-mb:*** can apply for the amount of memory. The default is 8096.

Yarn.scheduler.maximum-allocation-vcores:*** can apply for the number of CPU. The default is 4.

For normalization factors, different schedulers are different, as follows:

FIFO and Capacity Scheduler, the regularization factor is equal to the minimum amount of resources that can be applied for, and cannot be configured separately.

Fair Scheduler: the normalization factor is set by the parameters yarn.scheduler.increment-allocation-mb and yarn.scheduler.increment-allocation-vcores, and the default is 1024 and 1.

As you can see from the above, the amount of resources applied for by an application may be greater than that applied for. For example, the minimum amount of resource memory that can be applied for in YARN is 1024, and the normalization factor is 1024. If an application applies for 1500 memory, it will get 2048 memory. If the normalization factor is 512, it will get 1536 memory.

(5) Fairscheduler is used and multiple queues are configured. When a user submits a job and the specified queue does not exist, Fair Scheduler will automatically create a new queue instead of reporting an error (for example, error: queue XXX does not exist). How can this be avoided?

Answer: set yarn.scheduler.fair.allow-undeclared-pools in yarn-site.xml and configure its value to false (default is true).

(6) when errors are encountered in the process of using Hadoop 2.0, how to troubleshoot them?

A: start with the hadoop log

At this point, I believe you have a deeper understanding of "what are the common problems with Hadoop YARN?" you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report