In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces the relevant knowledge of "what is the use of Hadoop's resource management module YAR". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
1. The generation of YARN
The difference between hadoop1 and hadoop2 architecture described in the previous article is that hadoop2 separates resource management functions from the MapReduce framework, which is now the YARN module.
Before YARN, it was a cluster and a computing framework. For example: MapReduce a cluster, Spark a cluster, HBase a cluster and so on.
As a result, the management of each cluster is complex, and the utilization rate of resources is very low; for example, in a certain period of time, the Hadoop cluster is busy and the Spark cluster is idle, and vice versa, resources between clusters can not be shared, resulting in insufficient utilization of inter-cluster resources.
And the mode of "one framework, one cluster" also requires multiple administrators to manage these clusters, thus increasing the cost of operation and maintenance, while the shared cluster mode usually requires a small number of administrators to complete the unified management of multiple frameworks; with the surge in the amount of data, data movement across clusters not only takes longer, but also greatly increases hardware costs. The shared cluster mode allows multiple frameworks to share data and hardware resources, which will greatly reduce the cost of data movement.
Solution:
Run all computing frameworks in a cluster, share the resources of one cluster and allocate them as needed; if Hadoop needs resources, allocate resources to Hadoop,Spark; if you need resources, allocate resources to Spark, and then the resource utilization rate of the whole cluster is higher than that of multiple small clusters.
2. The basic composition of YARN.
Master/Slave structure: one ResourceManager (RM) corresponds to multiple NodeManager (NM); YARN is composed of Client, ResourceManager, NodeManager, ApplicationMaster (AM); Client submits tasks to RM, kills tasks, etc.
AM is done by the corresponding application.
Each application corresponds to an AM,AM to apply for resources from RM to launch the corresponding Task;NM on NM to RM through heartbeat information: report NM health status, task execution status, task collection, etc.
RM: there is only one cluster, which is responsible for the unified management and scheduling of cluster resources
1) process requests from the client (start / kill the application)
2) start / monitor AM; once an AM is down, RM will start the AM on another node
3) Monitor the NM, receive the heartbeat report from the NM and assign tasks to the NM to execute; once a NM dies, mark the task on the NM to tell the corresponding AM how to handle it
4) responsible for resource allocation and scheduling of the whole cluster
NM: there are multiple in the entire cluster, responsible for the management and use of single-node resources
1) periodically report to RM the resource usage on this node and the operation status of each Container
2) receive and process all kinds of Container start / stop commands from RM
3) handle commands from AM
4) responsible for resource management and task scheduling on a single node
AM: one for each application, responsible for application management
1) data segmentation
2) request resources (Container) from RM for applications / jobs and assign them to internal tasks
3) communicate with NM to start / stop tasks
4) Task monitoring and fault tolerance (re-apply resources for the task to restart the task when the task fails)
5) deal with the commands sent by RM: kill Container, restart NM, etc.
Container: abstraction of the environment in which the task runs
1) Task running resources (node, memory, CPU)
2) Task start command
3) Task running environment; the task is run in Container, and you can run both AM and specific Map/Reduce/MPI/Spark Task in a Container
3. The working principle of YARN
1) users submit applications / jobs to YARN, including ApplicaitonMaster programs, commands to start ApplicationMaster, user programs, etc.
2) ResourceManager assigns the first Container to the job and communicates with the corresponding NodeManager, asking it to start the ApplicationMaster of the job in this Containter
3) ApplicationMaster first registers with ResourceManager, so that users can query the running status of the job through ResourceManager; then it will apply for resources for each task and monitor the running status of the task until the end of the run. That is, repeat steps 4-7
4) ApplicationMaster applies for and receives resources from ResourceManager through RPC request by polling.
5) once the ApplicationMaster requests a resource, it communicates with the corresponding NodeManager, asking it to start the task
6) NodeManager starts the task
7) each task reports its status and progress to ApplicationMaster through the RPC protocol, so that ApplicaitonMaster can keep track of the running status of each task at any time, so that it can restart the task when the task fails. During the running of the job, users can query the current running status of the job with ApplicationMaster through RPC at any time.
8) after the job is completed, ApplicationMaster logs out to ResourceManager and closes itself
4. Fault tolerance of YARN
ResourceMananger implements HA based on ZooKeeper to avoid single point of failure
After the NodeManager fails, ResourceManager tells the corresponding ApplicationMaster the failed task
It is up to ApplicationMaster to decide how to handle failed tasks
After the failure of ApplicationMaster execution, ResourceManager is responsible for restarting.
ApplicationMaster needs to deal with the fault tolerance of internal tasks.
RMAppMaster saves the Task that has already been run, and there is no need to rerun it after restart.
5. The scheduling framework of YARN
1. Two-tier scheduling framework
1) ResourceManager allocates resources to ApplicationMaster
2) ApplicationMaster further allocates resources to each TASK
2. Scheduling strategy based on resource reservation.
1) when there are not enough resources, the Task will be reserved until the resources are sufficient. Description: when a Task needs 10G resources and each node is less than 10G, then select a node, but there is only 2G on a NodeManager, then reserve it on this NodeManager. When other resources are released on this NodeManager, resources will be reserved for 10G jobs until you start Task when 10G is saved. Disadvantages: the resource utilization rate is not high, so you have to save it first and wait until 10G to make use of it, resulting in low resource utilization of the cluster.
2) different from the "all or nothing" policy (Apache Mesos) description: when a job needs 10G resources, the nodes are less than 10G, so wait slowly until there are 10G free resources on a node, which will probably cause the Task to starve to death.
This is the end of the content of "what is the use of Hadoop's resource management module YAR". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.