In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Foreword:
MapReduce is a programming model for data processing, simple but powerful enough to be designed for parallel processing of big data.
The processing of MapReduce is divided into two steps: map and reduce. The input and output of each stage is in the form of key-value, and the types of key and value can be specified by yourself. In the map phase, the segmented data is processed in parallel, and the processing results are transmitted to reduce, and the final summary is completed by the reduce function.
After 2. 0, MapReduce can be understood as a jar package or a program, which runs on Yarn. There are two processes on it. What are the two modules in ResourceManager and NodeManager,ResourceManager? Application Manager: application Manager and Scheduler: scheduler. NodeManager is equivalent to executing a container with CPU+Memory in it. The container runs an encapsulated task, MapTask (mapping task) or running ReduceTask (reduction task).
The following is a brief description of the workflow of MapReduce2 architecture design or MapReduce submission to Yarn:
1: users submit applications to YARN, including ApplicationMaster programs, commands to start ApplicationMaster, user programs, and so on. 2:ResourceManager assigns the first Container to the application and communicates with the corresponding Node-Manager, requiring it to start the application in this Container
The ApplicationMaster of the program. 3:ApplicationMaster first registers with ResourceManager so that users can view the running status of the application directly through ResourceManage, and then
It will request resources for each task and monitor its running status until the end of the run, that is, repeat step 4x7. 4:ApplicationMaster applies for and receives resources from ResourceManager through RPC protocol by polling. 5: once the ApplicationMaster requests a resource, it communicates with the corresponding NodeManager, asking it to start the task. After setting up the running environment for the task (including environment variables, JAR packages, binary programs, etc.), 6:NodeManager writes the task startup commands into a script and communicates
Start the task by running the script. 7: each task reports its status and progress to ApplicationMaster through a certain RPC protocol, so that ApplicationMaster can keep track of the operation of each task at any time.
State so that the task can be restarted if it fails. During the running of the application, users can query the application to ApplicationMaster through RPC at any time.
Gets or sets the current running state of the 8: when the application is finished, ApplicationMaster logs out to ResourceManager and closes itself.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.