What is the function of Hadoop MapReduce? 04/28 Update SLTechnology News&Howtos

What is the function of Hadoop MapReduce?

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "what is the role of Hadoop MapReduce". In daily operation, I believe many people have doubts about what is the role of Hadoop MapReduce. I have consulted all kinds of data and sorted out simple and easy operation methods. I hope to help you answer the doubts about "what is the role of Hadoop MapReduce"! Next, please follow the small series to learn together!

Hadoop MapReduce is a software framework that makes it easy to write applications that can run on large clusters of thousands of business machines and process terabytes of massive data sets in parallel in a reliable, fault-tolerant manner.

1. Software framework, 2. Parallel processing, 3. Reliable and fault-tolerant, 4. Large-scale clustering, 5. Massive data sets.

So MapReduce can simply be thought of as a software framework, with massive data as its "dish," which cooks it in parallel in a reliable and fault-tolerant way on large-scale clusters.

What can MapReduce do? Simply speaking, you can do big data processing, that is, how to cook this dish, such as data processing, mining and data analysis.

MapReuce's idea is divide and conquer. Mapper is responsible for "splitting," that is, breaking down complex tasks into several "simple tasks" to deal with. "Simple task" contains three meanings: one is that the scale of data or calculation should be greatly reduced compared with the original task; the second is the principle of proximity calculation, that is, the task will be assigned to the node storing the required data for calculation; and the third is that these small tasks can be calculated in parallel with almost no dependence on each other. Reducer is responsible for summarizing the results of the map phase. As for how many reducers are needed, users can set the parameter mapred.reduce.tasks in the mapred-site.xml configuration file according to the specific problem. The default value is 1.

MapReduce's idea is divide and conquer. Mapper is responsible for "splitting," that is, breaking down complex tasks into several "simple tasks" to deal with. "Simple task" contains three meanings: one is that the scale of data or calculation should be greatly reduced compared with the original task; the other is the principle of proximity calculation, that is, the task will be assigned to the node storing the required data for calculation; and the third is that these small tasks can be calculated in parallel with almost no dependence on each other. Reducer is responsible for summarizing the results of the map phase. As for how many reducers are needed, users can set the parameter mapred.reduce.tasks in the mapred-site.xml configuration file according to the specific problem. The default value is 1.

love with

Mapreduce operations on large-scale data sets are distributed to all sub-nodes managed by a master node, and then the final results are obtained by integrating the intermediate results of all sub-nodes. MapReduce is simply "decomposition of tasks and aggregation of results." MapReduce abstracts the above processing into two functions: map and reduce. Map is responsible for decomposing tasks into multiple tasks, and reduce is responsible for summarizing the results of multi-task processing after decomposition. As for other complex problems in parallel programming, such as distributed storage, job scheduling, Load Balancer, fault tolerance processing, network communication, etc., MapReduce framework is responsible for handling.

The dataset (or task) to be processed by MapReduce must have these characteristics:

The data set to be processed can be decomposed into many small data sets, and each small data set can be processed completely in parallel.

1 Split the input data of the task into fixed-size segments split.

(2) Each split is further broken down into a batch of key-value pairs.

hadoop creates a Map task for each split and takes the pair in the corresponding split as input.

④ Get the intermediate results of the calculation, then sort the intermediate results according to k2, and put the values with the same key value together to form a new list, forming a tuple. Finally, these tuples are grouped according to the range of key values, corresponding to different Reduce tasks.

Reduce integrates and sorts the data received by different bear Mappers, and then processes the input tuples with the reduce function to obtain key pairs.

④ Output the final result pair to HDFS

MapReduce Framework: http://www.cnblogs.com/sharpxiajun/p/3151395.html

At this point, the study of "What is the role of Hadoop MapReduce" is over, hoping to solve everyone's doubts. Theory and practice can better match to help everyone learn, go and try it! If you want to continue learning more relevant knowledge, please continue to pay attention to the website, Xiaobian will continue to strive to bring more practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.