Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Introduction to MapReduce Architectur

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Foreword:

MapReduce is a programming model for data processing, simple but powerful enough to be designed for parallel processing of big data.

The processing of MapReduce is divided into two steps: map and reduce. The input and output of each stage is in the form of key-value, and the types of key and value can be specified by yourself. In the map phase, the segmented data is processed in parallel, and the processing results are transmitted to reduce, and the final summary is completed by the reduce function.

After 2. 0, MapReduce can be understood as a jar package or a program, which runs on Yarn. There are two processes on it. What are the two modules in ResourceManager and NodeManager,ResourceManager? Application Manager: application Manager and Scheduler: scheduler. NodeManager is equivalent to executing a container with CPU+Memory in it. The container runs an encapsulated task, MapTask (mapping task) or running ReduceTask (reduction task).

The following is a brief description of the workflow of MapReduce2 architecture design or MapReduce submission to Yarn:

1: users submit applications to YARN, including ApplicationMaster programs, commands to start ApplicationMaster, user programs, and so on. 2:ResourceManager assigns the first Container to the application and communicates with the corresponding Node-Manager, requiring it to start the application in this Container

The ApplicationMaster of the program. 3:ApplicationMaster first registers with ResourceManager so that users can view the running status of the application directly through ResourceManage, and then

It will request resources for each task and monitor its running status until the end of the run, that is, repeat step 4x7. 4:ApplicationMaster applies for and receives resources from ResourceManager through RPC protocol by polling. 5: once the ApplicationMaster requests a resource, it communicates with the corresponding NodeManager, asking it to start the task. After setting up the running environment for the task (including environment variables, JAR packages, binary programs, etc.), 6:NodeManager writes the task startup commands into a script and communicates

Start the task by running the script. 7: each task reports its status and progress to ApplicationMaster through a certain RPC protocol, so that ApplicationMaster can keep track of the operation of each task at any time.

State so that the task can be restarted if it fails. During the running of the application, users can query the application to ApplicationMaster through RPC at any time.

Gets or sets the current running state of the 8: when the application is finished, ApplicationMaster logs out to ResourceManager and closes itself.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report