In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article focuses on "what are the steps of MapReduce programming". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how the MapReduce programming steps are.
Three modules of Hadoop: distributed storage HDFS, distributed computing MapReduce, resource scheduling engine Yarn
We have introduced how hadoop software stores data (HDFS) in the previous lessons. In the next few lessons starting today, we will
Learn this: MapReduce distributed computing framework, which is difficult to understand and very important, although in most cases
We all use tools such as Hive and spark instead of writing MapReduce programs directly to handle business, but these tools are still based on
The idea of MapReduce to achieve, therefore, now can well understand and master MapReduce programming, which is very beneficial to future learning.
1. Definition of mapreduce
MapReduce is a programming framework for distributed computing programs and the core framework for users to develop "data analysis applications based on Hadoop".
The core function of MapReduce is to integrate the business logic code written by users and its own default components into a complete distributed operation program, which runs concurrently on a Hadoop cluster.
2. The core idea of mapreduce
The thought of MapReduce can be seen everywhere in life. Have been exposed to this kind of thought more or less. The core of MapReduce is "divide and conquer", which is suitable for large-scale data processing scenarios.
Map is responsible for "dividing" complex tasks into several "simple tasks" to process them in parallel. (the premise of the split: these small tasks can be calculated in parallel and have little dependence on each other.)
Reduce is responsible for the "close", that is, a global summary of the results of the map phase.
The combination of these two stages is the embodiment of MapReduce's thought.
There is also a more vivid example to explain MapReduce:
Example 1: we have to count all the books in the library. Students A count bookshelves 1 and students B count bookshelves 2. This is "Map". The more students you have, the faster you will count books.
Then add up the statistics of the students. This is "Reduce".
3. MapReduce programming model
MapReduce consists of two phases:
Map phase (split into small tasks)
Reduce phase (summarizing the results of small tasks)
4. Mapreduce programming steps
This is a place that is not easy to understand. For a moment, if you don't understand, don't worry, write it down first and take your time.
Mapreduce programming is roughly divided into three stages, a total of eight steps, the following is a brief description of these eight steps
We will introduce these eight steps in detail with an example in the next class.
1. 2 steps in Map phase
Step 1: set up the inputFormat class, split the data into key,value pairs, and enter it to step 2
Step 2: customize the map logic, process our first step of inputting kv pair data, and then convert it to a new key,value pair for output
2. 4 steps in shuffle phase
Step 3: partition the key,value pairs output in the previous step. (kv pairs with the same key belong to the same partition)
Step 4: sort the data of each partition by key
Step 5: specify the data in the partition (combine operation) and reduce the network copy of the data (optional)
Step 6: group the sorted kv to the data; in the process of grouping, the same kv pairs of key are grouped into one group; put all the value of the same group of kv pairs into a collection (each group of data calls the reduce method once)
3. 2 steps in reduce phase
Step 7: merge and sort multiple map tasks, write the logic of the reduce function, process the input key,value pairs, and convert them into new key,value pairs for output
Step 8: set the output key,value pair data to be saved to a file
It doesn't matter if you are confused here. This lesson only needs to understand why there is such a computing framework as MapReduce. In fact, it is to make full use of cluster resources and put a large amount of data into a task.
First break it into several small tasks, and then merge the calculation results of several small tasks into the final
In this way, the computing resources of the cluster are fully utilized at the same time, instead of waiting in a queue.
At this point, I believe you have a deeper understanding of "what the MapReduce programming steps are". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.