Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the advantages of Apache Hadoop's MapReduce?

2025-01-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what are the advantages of Apache Hadoop's MapReduce". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "what are the advantages of Apache Hadoop's MapReduce?"

MapReduce has undergone a thorough overhaul in hadoop-0.23, and now we have a new framework called MapReduce2.0 (MRv2) or YARN.

The basic idea of MRv2 is to split the two main functions of JobTracker (resource management and job scheduling / monitoring) into separate daemons. The idea is to have a global ResourceMaager (RM) and a corresponding ApplicationMaster (AM) for each application. An Application is a Map-Reduce or DAG assignment in the traditional sense.

ResourceManager and the subordinate NodeManager (NM) on each node constitute the framework of data computing. RM is the ultimate authority on arbitration system resources in all applications.

In fact, the ApplicationMaster of each application is a specific framework library for negotiating resources with RM, working with NodeManager (s), executing and monitoring tasks.

ResourceManager has two important components: Scheduler and ApplicationsManager.

Scheduler is responsible for allocating limited resources to a wide variety of running applications, such as similar constraints, capacity, queues, and so on. Scheduler is just a scheduler that does not monitor and track the status of applications. Also, he does not guarantee to restart tasks that fail due to application failure or hardware failure. Scheduler performs its scheduling function based on the resource requirements of applications; based on the abstract concept of merging elements such as memory, cpu, network, disk, etc.-resource container Container. In the first version, only memory is supported.

Scheduler has a plug-in that can be plugged into policies, which is responsible for partitioning resources in the cluster between different queues and applications. Current Map-Reduce schedulers, such as CapacityScheduler,FairScheduler, are some examples of this plug-in.

CapacityScheduler takes into account that shared cluster resources are more predictable and supports hierarchical queues.

ApplicationsManager is responsible for receiving the submission of the task, negotiating the first container to execute a specific ApplicationMaster, and providing it in the (task? Restart the service of the ApplicationMaster container on failure

NodeManager is the agent framework for each machine, responsible for containers, monitoring their resource usage (cpu,memory,disk,network), and reporting to ResourceManager/Scheduler.

The ApplicationMaster of each application is responsible for negotiating appropriate resource containers with Scheduler, tracking their status and monitoring progress.

MRV2 is compatible with the previous stable version (hadoop-1.x), which means that the desired Map-Reduce jobs only needs to be re-counted to run on MRV2.

Understanding: YARN framework is based on the previous Map-Reduce, the previous two main functions of JobTracker were split, separated, a resource (RM boss), a monitoring (NM,ApplicationMaster), a clear division of labor.

RM assigned his work to two small leaders (Scheduler,ApplicationsManager), the reception of job was handed over to ApplicationsManager, and the scheduling of job was handed over to Scheduler,ApplicationsManager to be responsible for restarting ApplicationMaster in case of failure.

The monitoring work is also subdivided, and the NM is responsible for monitoring the situation of the node (memory, CPU, hard disk, network, etc.) and reporting it to the leader (RM). In more detail, it should be reported to the Scheduler leader, so that he will assign tasks according to the status of your node when scheduling. The status and progress of each application will be monitored by its own ApplicationMaster. If the ApplicationMaster fails (task?) Oh, it's all right. ApplicationsManager will restart it for you.

At this point, I believe you have a deeper understanding of "what are the advantages of Apache Hadoop's MapReduce?" you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report