In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces the relevant knowledge of "how the MapReduce implementation process is". In the operation of actual cases, many people will encounter such a dilemma. Then let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
MapReduce execution flow chart
Overview
MapReduce is a distributed computing model proposed by Google, which is mainly used in the search field to solve the computing problems of massive data.
MapReduce is distributed and consists of two phases: the Map and Reduce,Map phases are independent programs, there are many nodes running at the same time, and each node processes part of the data.
The Reduce phase is an independent program with many nodes running at the same time, and each node processes part of the data.
Use
MapReduce framework has a default implementation, users only need to override map () and reduce () two functions to achieve distributed computing, which is very simple.
The formal parameters and return values of these two functions are both, so you must pay attention to the construction when using them.
Execution process (illustrated here) the number of times each word appears in a text (saved on HDFS, two block): hello you hello marry hello me really-> block-1 hello kate ready xiao wang hello tomcat-> block-2
1. Get the text in each block block, iterate through all, and go back to one line of str
Because you want to count the number of times of each word I, you still need to know which words are in the text, and you can use split () to cut them according to the characteristics of the string.
String [] words=str.split ("")
According to the requirements, each word I needs to be converted into the form, k is the word itself, v is the number of times the word appears.
two。 Because mr computing is distributed, each map (called a mapper task) calculates one of the block blocks of data.
Map stage: input K1, offset, v1, the current line text content map () function operates to output K2, specific words, v2, the corresponding statistical items of words, such as times output shuffle phase, it is found that if you output data to reduce in this way, there will be a lot of redundant data. For example, if there are five hello after the map phase, then the output, five times, will actually put some pressure on the network. Can you make a local combination of these five before entering the reduce? For example, become or. This process becomes the shuffle, shuffle and reorganize the stage to achieve the above results, called the protocol. > shuffle stage, that is, reshuffle the output of map: partition, grouping, sorting. = = > the output of map in reduce phase is collected and counted. For values, a simple accumulation is carried out, and the corresponding number of times reduce calls reduce () function for a key = > input K2 of reduce phase is calculated, which is the K2 of the output of map. V2s is the result set of map after shuffle. The operation of the reduce () function is converted to
After the above operation, the system will output the calculation results to the user, generally will first store (landing) to hdfs, and then feedback to the user.
This is the end of the content of "what is the implementation process of MapReduce". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.