Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Detailed explanation of Yarn Architecture Design

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

1.Yarn basic Services component

Yarn is a new Hadoop resource manager. It is a general resource management system, which can provide unified resource management and scheduling for upper-level applications. Its introduction has brought great benefits to the cluster in terms of utilization, unified resource management and data sharing.

ResourceManager (RM): responsible for unified management and scheduling of resources on each NM. Assign AM to idle Container to run and monitor its running status. Allocate the corresponding free Container to the resource request requested by the AM. It mainly consists of two components: the scheduler and the application manager. Scheduler: the scheduler allocates resources in the system to running applications according to capacity, queues and other constraints (for example, each queue allocates certain resources, up to a certain number of jobs, etc.). The scheduler allocates resources only according to the resource requirements of each application, and the resource allocation unit is Container, thus limiting the amount of resources used by each task. Shceduler is not responsible for monitoring or tracking the status of the application, nor is it responsible for restarting tasks for various reasons (ApplicationMaster is responsible for it). In short, the scheduler allocates resources encapsulated in Container to the application according to the resource requirements of the application and the resources of the cluster machine.

The scheduler is pluggable, such as CapacityScheduler, FairScheduler. Application Manager (Applications Manager): the Application Manager is responsible for managing all applications in the entire system, including application submission, negotiating resources with the scheduler to start AM, monitoring the running status of AM and restarting in case of failure, etc., and tracking the progress and status of assigned Container is also its responsibility. NodeManager (NM): NM is the resource and task manager on each node. It periodically reports to RM the resource usage on this node and the running status of each Container; at the same time, it receives and processes requests such as Container start / stop from AM. ApplicationMaster (AM): all applications submitted by users contain an AM, which is responsible for monitoring the application, tracking the execution status of the application, restarting failed tasks, etc. ApplicationMaster is an application framework, which is responsible for coordinating resources to ResourceManager and working with NodeManager to complete the implementation and monitoring of Task. MapReduce is a natively supported framework that runs Mapreduce jobs on YARN. Many distributed applications have developed corresponding application frameworks for running tasks on YARN, such as Spark,Storm and so on. If necessary, we can also write a YARN application that conforms to the specification. Container: resource abstraction in YARN, which encapsulates multi-dimensional resources on a node, such as memory, CPU, disk, network, etc. When AM applies for resources from RM, the resources returned by RM for AM are represented by Container. YARN assigns a Container to each task and the task can only use the resources described in that Container. Resource Management of 2.Yarn

1. Resource scheduling and isolation are the two most important and basic functions of yarn as a resource management system. Resource scheduling is done by resourcemanager, while resource isolation is implemented by each nodemanager.

After 2.Resourcemanager assigns resources on a nodemanager to tasks (this is the so-called "resource scheduling"), nodemanager needs to provide corresponding resources for tasks as required, and even ensure that these resources should be exclusive, providing the basis and guarantee for task operation, which is the so-called resource isolation.

3. When it comes to resources, we usually refer to memory, cpu, and io. So far, Hadoop yarn only supports cpu and memory resource management and scheduling.

4. The amount of memory resources determines the life and death of the task. If there is not enough memory, the task may fail; by contrast, the cpu resource only determines the speed of the task, not the life and death of the task.

Related parameters:

Memory parameters:

1.yarn.nodemanager.resource.memory-mb

Indicates the total amount of physical memory that can be used by yarn on this node. The default is 8192 m. Note that if your node memory resources are less than 8g, you need to reduce this value. Yarn will not intelligently detect that the total amount of physical memory of the node can be adjusted to 80%2.yarn.nodemanager.vmem-pmem-ratio of local memory.

Tasks using 1m physical memory can use up to the amount of virtual memory. Default is 2.13.yarn.nodemanager.pmem-check-enabled

Whether to enable a thread to check the amount of physical memory used by each task certificate, and if the task exceeds the allocation value, it will be directly kill. The default is true. 4.yarn.nodemanager.vmem-check-enabled

Whether to enable a thread to check the amount of virtual memory used by each task certificate, and if the task exceeds the allocation value, it will be directly kill. The default is true. 5.yarn.scheduler.minimum-allocation-mb

A minimum amount of physical memory can be used for a single task, which defaults to 1024m. If a task requests less physical memory than this value, the corresponding value is changed to this number. 6.yarn.scheduler.maximum-allocation-mb

The maximum amount of memory that can be applied for for a single task. Default is 8192 m.

CPU parameter: 1.yarn.nodemanager.resource.cpu-vcores

Indicates the number of virtual cpu that can be used by yarn on this node. The default is 8. Note that it is recommended that this value be the same as the number of physical cpu cores. If your node cpu number is less than 8, you need to reduce this value, and yarn will not intelligently detect the total number of node physical cpu. 2.yarn.scheduler.minimum-allocation-vcores

The minimum number of cpu can be applied for for a single task. The default is 1. If the number of cpu requested by a task is less than this number, the corresponding value is modified to this number of 3.yarn.scheduler.maximum-allocation-vcores.

You can apply for a maximum number of virtual cpu for a single task. The default is 32.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report