In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces what the Yarn architecture in Hadoop is like, which is very detailed and has certain reference value. Friends who are interested must finish reading it!
Let's first take a look at the basic architecture of the Yarn platform:
In the structure of Yarn, the original JobTracker management (resource management, task scheduling) is taken apart, resource scheduling is done by ResourceManager, and task scheduling is managed by ApplicationMaster. This advantage is that each module can do its own job and do one thing, just like a big leader who does not concentrate on managing the team every day and always knocks on the code, in the end, the team must have problems. To get back to the point, since you want to understand the architecture of Yarn, it is necessary to explain the process for users to submit a task:
When NodeManager starts, it registers its own memory, CPU utilization and other resource information with ResourceManager for NodeManager scheduling.
2. ApplicationMaster
ApplicationMaster is responsible for managing the entire life cycle of applications, and each application corresponds to an AM. The main functions are:
(1) communicate with the RM scheduler to negotiate and manage the allocation of resources.
(2) work with NM to run the corresponding task in the appropriate container and monitor the task execution.
(3) if the container fails, AM will re-apply for resources from the scheduler.
(4) calculate the amount of resources required by the application and convert it into a protocol recognized by the scheduler.
(5) when AM fails, ASM restarts it, and AM restores the application from the previously saved application execution state.
3. NodeManager
NodeManager replaces TaskTracker in Hadoop v1. Each node has a NM. The main functions are:
(1) start the container for the application while ensuring that the requested container does not use more resources than the total resources on the node.
(2) build a container environment for task, including binary executables, jars, etc.
(3) provides a simple service for the node to manage local storage resources, and the application can continue to use local storage resources even if it does not apply from RM. For example, MapReduce can use this service to store the intermediate output of map task.
A NodeManager can run multiple Container,Container resources isolated from each other, similar to multiple systems of a virtual machine, each using its own allocated resources. NodeManager initiates a monitor to monitor the Container running on it, and when the resource consumed by a Container exceeds the agreed threshold, NodeManager kills it.
4. Container
Container can be said to be a collection (or container) that uses resource descriptions for Application, and can be thought of as a serializable java object that encapsulates some descriptive information, such as:
Message ContainerProto {
Optional ContainerIdProto id = 1; / / container id
Optional NodeIdProto nodeId = 2; / / the node where the container (resource) resides
Optional string node_http_address = 3
Optional ResourceProto resource = 4; / / amount of container resources
Optional PriorityProto priority = 5; / / container priority
Optional hadoop.common.TokenProto container_token = 6; / / container token for security authentication
}
Some basic concepts and workflows of Container are as follows:
(1) Container is the abstraction of resources in YARN, which encapsulates a certain amount of resources (CPU and memory resources) on a node. It has nothing to do with Linux Container, just a concept proposed by YARN (in terms of implementation, it can be thought of as a serializable / deserializable Java class).
(2) Container is applied to ResourceManager by ApplicationMaster and asynchronously assigned to ApplicationMaster by the resource scheduler in ResouceManager
(3) the operation of Container is initiated by ApplicationMaster to the NodeManager where the resources are located. The Container runtime needs to provide internally executed task commands (which can make any command, such as java, Python, C++ process start commands), as well as environment variables and external resources (such as dictionary files, executable files, jar packages, etc.) needed for the command execution.
In addition, the Container required by an application is divided into two main categories, as follows:
(1) Container that runs ApplicationMaster: this is requested and started by ResourceManager (from the internal resource scheduler). When users submit the application, they can specify the resources required by the unique ApplicationMaster.
(2) Container that runs all kinds of tasks: this is requested by ApplicationMaster to ResourceManager and started by communication between ApplicationMaster and NodeManager.
The above two types of Container may be on any node, and their locations are usually random, that is, the ApplicationMaster may run on the same node as the tasks it manages.
5. Resource management scheme of YARN platform
In YARN, users are organized as queues, and each user can belong to one or more queues to which only application can be submitted. Each queue is divided into a certain proportion of resources.
The resource allocation process of YARN is asynchronous, that is to say, after the resource scheduler allocates resources to an application, it will not immediately push the corresponding ApplicaitonMaster, but temporarily put it in a buffer, waiting for ApplicationMaster to take the initiative through the periodic RPC function, that is to say, it uses the pull-based model instead of the push-based model, which is consistent with MRv1.
Compared with the resource scheduler in MRv1, although YANR's scheduler is also plug-and-plug, because YARN uses an event-driven model, it is more complex to write and is much more difficult than MRv1.
Like MRv1, YARN also comes with three commonly used schedulers, namely FIFO,Capacity Scheduler and Fair Scheduler. The first is the default scheduler, which belongs to batch scheduler, and the last two belong to multi-tenant scheduler. It organizes resources in the form of tree-shaped multi-queues, which is more suitable for corporate application scenarios. It should be noted that the algorithms used by these three schedulers are exactly the same as those in MRv1, except that they are reimplemented according to the external interface of the resource scheduler in YARN, so I won't repeat them here.
The above is all the content of the article "what is the Yarn Architecture in Hadoop?" Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.