Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is MapReduce?

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "what is MapReduce". In daily operation, I believe many people have doubts about what MapReduce is. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the doubts about "what is MapReduce?" Next, please follow the editor to study!

The output of the 1 map function is first processed by the MapReduce framework and then sent to the reduce function. This process sorts and groups key / value pairs according to keys.

2 an example of the entire data flow is shown in figure 2-1. At the bottom of the diagram is the Unix pipeline, which simulates the entire MapReduce process.

Figure 2-2 data flow diagram of a single reduce task in MapReduce

The number of 10 reduce tasks is not determined by the size of the input, but is specified individually. If there are multiple reducer,map tasks that partition their output, create a partition for each reduce task. Each partition contains many keys (and their associated values), but the record for each key is in the same partition. Partitioning can be controlled through a user-defined partitioner, but the default partitioning tool is usually used, which uses the hash function to form a "bucket" key / value, which is efficient.

In general, the data flow for multiple reduce tasks is shown in figure 2-3. This figure clearly shows why the data flow between map and reduce tasks is called "shuffle", because the input to each reduce task is provided by many map tasks. Shuffle is actually more complex than this figure shows, and adjusting it can have a significant impact on the execution time of the job.

Figure 2-3 MapReduce data flow for multiple reduce tasks

11 it is possible that reduce tasks do not exist and shuffle is not needed, because processing can be done in parallel. In this case, the only non-local node data transfer is when the map task is written to the HDFS (see figure 2-4).

Figure 2-4 data flow without reduce tasks in MapReduce

12 the number of MapReduce jobs on the cluster is limited by the available bandwidth, so it is important to ensure that the cost of transferring between map and reduce tasks is minimal. Hadoop allows the user to declare a combiner that runs on the output of map-- the output of this function as input to the reduce function. Because combiner is an optimization method, Hadoop does not guarantee whether or not to call the method for the output record of a map, and how many times it is called. In other words, if you don't call the method or call it multiple times, the output of reducer will be the same.

Combiner's rules limit the types of functions available. This is well illustrated by an example of finding the maximum and average temperatures in the Hadoop's authoritative guide. Calculate the maximum temperature can be used, calculate the average temperature may be wrong. Therefore, combiner cannot replace the reduce function. Although it can help reduce the amount of data transferred between map and reduce, whether or not to use combiner in MapReduce jobs needs to be carefully considered.

13 Hadoop provides an API to run MapReduce and allows you to write your own map and reduce functions in languages other than java. Hadoop streams use Unix standard streams as the interface between Hadoop and programs, so you can use any language, as long as you write MapReduce programs that can read standard input and write to standard output. Stream is suitable for word processing, and when used in text mode, it has a line-oriented data view.

At this point, the study of "what is MapReduce" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report