Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Big data IMF-L38-MapReduce Insider declassified lecture notes and summary

2025-01-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

The contents of this issue:

1 MapReduce architecture decryption

2 Research on MapReduce running cluster

3 practical operation of MapReduce through Java programming

Hadoop from 2. 0 already had to run on Yarn from the beginning, and didn't care about Yarn at all at 1.0.

Now it is MR, and it is also about Yarn, and it is already in the basic entry stage. Zero foundation is a thing of the past.

Starting tomorrow-a collection of about 20 MapReduce codes

One: MapReduce architecture based on Yarn

The 1.MR code program is based on the implementation of Mapper and Reducer, in which Mapper divides a computing task into many.

Small tasks for parallel computing, Reducer is the final statistical work

2.Hadoop 2.x started running on Yarn.

Yarn manages all the resources of the cluster (such as memory and CPU), ResourceManager, each node is arranged with a JVM process, NodeManager, receive requests to use Container to wrap these resources, when RM receives a job request

3. When ResourceManager receives the request submitted by Client, it will order NodeManager to start the first Container of the program on the node where the NodeManager is located according to the status of the cluster resources. The Container is the ApplicationMaster of the program, which is responsible for the execution process of the task scheduling of the program. ApplicationManager registers itself with ResourceManager, and then applies for specific Container computing resources from ReourceManager after registration.

4. How many Container is needed for ApplicationMaster in a program?

When Application starts, it will run the program's Main method, in which there will be data input and related configuration, from which you can know how much Container is needed.

(container is a unit of computer resources. According to the calculation requested by the client, the cluster parses the computing job, and the calculation result contains the required contain resources.)

Application needs to run the Main method to know how many shards the analyzer has and how many shards correspond to Container, and then consider other resources, such as Shuffle, and allocate some more resources.

Summary of 5.MapReduce running on Yarn

Master-slave structure

Master node, there is only one: ResourceManager

Control node, each Job has one MRAppMaster

There are many slave nodes: YarnChild

ResourceManager is responsible for:

Receive computing tasks submitted by customers

Assign Job to MRAppMaster for execution

Monitor the implementation of MRAppMaster

MRAppMaster is responsible for:

Responsible for scheduling tasks executed by a Job

Assign Job to YarnChild for execution

Monitor the implementation of YarnChild

YarnChild is responsible for:

Perform computing tasks assigned by MRAppMaster

HA should be done in RM production environment.

MRAppMaster in 6.Hadoop MapReduce is equivalent to YarnChildren in Driver,Hadoop MapReduce in Spark and CoarseGrainedExecutorBackend in Spark

(Hadoop consumes a lot of resources relative to Spark)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report