Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Based on the five steps of MapReduce programming, the working principle of MapReduce is discussed.

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

In the previous Hadoop, it has been said that MapReduce adopts the idea of divide and conquer. MapReduce is mainly divided into two parts, one is Map--, the other is Reduce--.

The data of the whole process of MapReduce exists in the form of key-value pairs.

If you want to know big data's learning route, if you want to learn big data knowledge and need free learning materials, you can add group: 784789432. Welcome to join us. Every day, a live broadcast will be held at 3 pm to share basic knowledge, and at 20:00 in the evening, a live broadcast will be held to share the actual combat of big data project.

First of all, let's assume that we have a file in which the following is stored

Hive spark hive hbase

Hadoop hive spark

Sqoop flume scala

An offset is involved here (a character or space is 1 bit)

The offset of the first line is 0 and the content is "hive spark hive hbase"

The offset of the second line is 21 and the content is "hadoop hive spark"

The offset of the third line is 39 and the content is "sqoop flume scala"

Map

Input

The data processed by MapReduce is read from HDFS.

If the offset is key and the content is value, there is:

(0, "hive spark hive hbase")

(21, "hadoop hive spark")

(39, "sqoop flume scala")

Output

Take out the words in the input value one by one by dividing them into spaces, do key,1, do value and save them.

(hive,1)

(spark,1)

(hive,1)

(hbase,1)

(hadoop,1)

Note: Map has to cycle through how many lines there are.

Shuffle (which will be explained in more detail later, which is briefly explained here)

Input

Output of map

Output

Merge the value of the same key

Here merging is not for accumulation or other operations, but for merging into a set.

(hive, [1pm 1pm 1])

(spark, [1pc1])

(hbase, [1])

(hadoop, [1])

.

Reduce

Input

Output of shuffle

Output

Merge value according to business

For example, the current business will accumulate value.

MapReduce takes five steps to process data

Throughout the MapReduce program, all data flows in the form of (key,value)

Step 1: input

Normally, you don't need to write code.

Just specify a path when the MapReduce program is running

Step 2: map (Core)

Map (key,value,output,context)

Key: the offset of each row of data-- basically useless

Value: the content of each row of data-- what really needs to be processed

Step 3: shuffle

You don't need to write code.

Step 4: reduce (Core)

Reduce (key,value,output,context)

Key: key in business requirements

Value: the value to be aggregated

Step 5: output

Normally, you don't need to write code.

Just specify a path when the MapReduce program is running

working principle

Write the picture description here.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

  • Spring cxf configuration

    Version-- cxf-2.5.2

    © 2024 shulou.com SLNews company. All rights reserved.

    12
    Report