Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the MapReduce programming steps?

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article focuses on "what are the steps of MapReduce programming". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how the MapReduce programming steps are.

Three modules of Hadoop: distributed storage HDFS, distributed computing MapReduce, resource scheduling engine Yarn

We have introduced how hadoop software stores data (HDFS) in the previous lessons. In the next few lessons starting today, we will

Learn this: MapReduce distributed computing framework, which is difficult to understand and very important, although in most cases

We all use tools such as Hive and spark instead of writing MapReduce programs directly to handle business, but these tools are still based on

The idea of MapReduce to achieve, therefore, now can well understand and master MapReduce programming, which is very beneficial to future learning.

1. Definition of mapreduce

MapReduce is a programming framework for distributed computing programs and the core framework for users to develop "data analysis applications based on Hadoop".

The core function of MapReduce is to integrate the business logic code written by users and its own default components into a complete distributed operation program, which runs concurrently on a Hadoop cluster.

2. The core idea of mapreduce

The thought of MapReduce can be seen everywhere in life. Have been exposed to this kind of thought more or less. The core of MapReduce is "divide and conquer", which is suitable for large-scale data processing scenarios.

Map is responsible for "dividing" complex tasks into several "simple tasks" to process them in parallel. (the premise of the split: these small tasks can be calculated in parallel and have little dependence on each other.)

Reduce is responsible for the "close", that is, a global summary of the results of the map phase.

The combination of these two stages is the embodiment of MapReduce's thought.

There is also a more vivid example to explain MapReduce:

Example 1: we have to count all the books in the library. Students A count bookshelves 1 and students B count bookshelves 2. This is "Map". The more students you have, the faster you will count books.

Then add up the statistics of the students. This is "Reduce".

3. MapReduce programming model

MapReduce consists of two phases:

Map phase (split into small tasks)

Reduce phase (summarizing the results of small tasks)

4. Mapreduce programming steps

This is a place that is not easy to understand. For a moment, if you don't understand, don't worry, write it down first and take your time.

Mapreduce programming is roughly divided into three stages, a total of eight steps, the following is a brief description of these eight steps

We will introduce these eight steps in detail with an example in the next class.

1. 2 steps in Map phase

Step 1: set up the inputFormat class, split the data into key,value pairs, and enter it to step 2

Step 2: customize the map logic, process our first step of inputting kv pair data, and then convert it to a new key,value pair for output

2. 4 steps in shuffle phase

Step 3: partition the key,value pairs output in the previous step. (kv pairs with the same key belong to the same partition)

Step 4: sort the data of each partition by key

Step 5: specify the data in the partition (combine operation) and reduce the network copy of the data (optional)

Step 6: group the sorted kv to the data; in the process of grouping, the same kv pairs of key are grouped into one group; put all the value of the same group of kv pairs into a collection (each group of data calls the reduce method once)

3. 2 steps in reduce phase

Step 7: merge and sort multiple map tasks, write the logic of the reduce function, process the input key,value pairs, and convert them into new key,value pairs for output

Step 8: set the output key,value pair data to be saved to a file

It doesn't matter if you are confused here. This lesson only needs to understand why there is such a computing framework as MapReduce. In fact, it is to make full use of cluster resources and put a large amount of data into a task.

First break it into several small tasks, and then merge the calculation results of several small tasks into the final

In this way, the computing resources of the cluster are fully utilized at the same time, instead of waiting in a queue.

At this point, I believe you have a deeper understanding of "what the MapReduce programming steps are". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report