In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article will explain in detail what is the basic structure and operation principle of Yarn in the Hadoop framework. The content of the article is of high quality, so the editor will share it with you for reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.
I. the basic structure of Yarn
Three core components of Hadoop: distributed file system HDFS, distributed computing framework MapReduce, distributed cluster resource scheduling framework Yarn. Yarn did not exist in the early days of Hadoop, but was born in the upgrade and development of Hadoop, a typical Master-Slave architecture.
Yarn includes two main processes: resource manager Resource-Manager and node manager Node-Manager.
Resource manager
Usually deployed on a separate server to handle client requests
Deal with resource allocation and scheduling management in a cluster
Node Manager
Manage resources on the current node
Execute and deal with various specific commands
Monitor node resources and report to resource manager
ApplicationMaster
Provide fault tolerance and cut data
Request resources and assign tasks to the application
Container
A concept of dynamic Resource allocation in Yarn
The container contains a certain amount of memory, CPU and other computing resources
Started and managed by the NodeManager process
II. Basic implementation process
Submit a MapReduce application to Yarn for scheduling
The RM component returns the resource submission path and ApplicationId
RM process NM process communication, allocating containers according to cluster resources
Distribute the MRAppMaster to the containers assigned above
The resources needed for running are submitted to HDFS to apply for running MRAppMaster.
RM converts the client request into a Task task by doing the above
What runs in the container is a Map or Reduce task
The task communicates with MRAppMaster to report the status during operation
After task execution, the process logs out and releases container resources
MapReduce application development follows the MapReduceApplicationMaster of Yarn specification, so it can be run on Yarn. If other computing frameworks also follow this specification, the unified scheduling and management of resources can be realized.
Resource Scheduler
The basic function of the scheduler is to schedule tasks to each node for execution according to the use of node resources and job requirements. The key factors to understand the task queue are as follows: entry and exit mode, priority, capacity, etc.
There are three main types of Hadoop job schedulers: FIFO, CapacityScheduler, and FairScheduler. The default resource scheduler is CapacityScheduler.
FIFO scheduler
FIFO is a batch scheduler in which the scheduling strategy is based on the priority of jobs, and then the jobs to be executed are selected according to the order of arrival time.
Capacity scheduler
CapacityScheduler supports multiple queues, and each queue can be configured with a certain amount of resources. Each queue adopts FIFO scheduling policy, calculates the ratio of running task books to computing resources in the queue, selects the queue with a small ratio and relatively idle, and then installs the sorting of job priority and submission time. In order to prevent jobs of the same user from monopolizing resources in the queue, the scheduler limits the amount of resources that jobs submitted by the same user can occupy.
For example, in the above illustration, suppose the slot is divided into three queues (ABC) according to the following allocation rules: queue A gives 20% of the resources, queue B gives 50% of the resources, and queue C gives 30% of the resources; all three queues are executed in the order of tasks, and the above job11, job21 and job31 are the first to run and run in parallel.
Fair scheduler
Similar to the principle of capacity scheduler, it supports multiple queues and multiple users, the amount of resources in each queue can be configured, and jobs in the same queue share all resources in the queue fairly.
For example, there are three queues (ABC). The job in each queue allocates resources according to priority. The higher the priority, the more resources are allocated, but each job is assigned resources to ensure fairness. In the case of limited resources, there is a gap between the ideal computing resources obtained by each job and the actual computing resources, and this gap is called vacancy. In the same queue, the larger the resource shortage of job, the first to get the priority of resource execution, and the jobs are executed according to the level of vacancy.
Source code address GitHub address https://github.com/cicadasmile/big-data-parentGitEE address https://gitee.com/cicadasmile/big-data-parent
On the Hadoop framework of the basic structure and operation of Yarn what is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.