In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
In the previous Hadoop, it has been said that MapReduce adopts the idea of divide and conquer. MapReduce is mainly divided into two parts, one is Map--, the other is Reduce--.
The data of the whole process of MapReduce exists in the form of key-value pairs.
If you want to know big data's learning route, if you want to learn big data knowledge and need free learning materials, you can add group: 784789432. Welcome to join us. Every day, a live broadcast will be held at 3 pm to share basic knowledge, and at 20:00 in the evening, a live broadcast will be held to share the actual combat of big data project.
First of all, let's assume that we have a file in which the following is stored
Hive spark hive hbase
Hadoop hive spark
Sqoop flume scala
An offset is involved here (a character or space is 1 bit)
The offset of the first line is 0 and the content is "hive spark hive hbase"
The offset of the second line is 21 and the content is "hadoop hive spark"
The offset of the third line is 39 and the content is "sqoop flume scala"
Map
Input
The data processed by MapReduce is read from HDFS.
If the offset is key and the content is value, there is:
(0, "hive spark hive hbase")
(21, "hadoop hive spark")
(39, "sqoop flume scala")
Output
Take out the words in the input value one by one by dividing them into spaces, do key,1, do value and save them.
(hive,1)
(spark,1)
(hive,1)
(hbase,1)
(hadoop,1)
Note: Map has to cycle through how many lines there are.
Shuffle (which will be explained in more detail later, which is briefly explained here)
Input
Output of map
Output
Merge the value of the same key
Here merging is not for accumulation or other operations, but for merging into a set.
(hive, [1pm 1pm 1])
(spark, [1pc1])
(hbase, [1])
(hadoop, [1])
.
Reduce
Input
Output of shuffle
Output
Merge value according to business
For example, the current business will accumulate value.
MapReduce takes five steps to process data
Throughout the MapReduce program, all data flows in the form of (key,value)
Step 1: input
Normally, you don't need to write code.
Just specify a path when the MapReduce program is running
Step 2: map (Core)
Map (key,value,output,context)
Key: the offset of each row of data-- basically useless
Value: the content of each row of data-- what really needs to be processed
Step 3: shuffle
You don't need to write code.
Step 4: reduce (Core)
Reduce (key,value,output,context)
Key: key in business requirements
Value: the value to be aggregated
Step 5: output
Normally, you don't need to write code.
Just specify a path when the MapReduce program is running
working principle
Write the picture description here.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un