Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to simply explain MapReduce algorithm

2025-01-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

Today, I will talk to you about how to simply explain the MapReduce algorithm, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

While Hackbright was a mentor, I was asked to explain the MapReduce algorithm to students with limited technical background, so I came up with an interesting example of how it works.

Example of MapReduce algorithm

You want to count how many spades there are in a stack of cards. The intuitive way is to check one by one and count how many spades are.

The MapReduce method is:

1. Distribute this stack of cards to all the players here

two。 Let each player count how many spades there are in his hand, and then report the number to you.

3. You add up the numbers that all the players tell you and come to the conclusion.

Background of MapReduce algorithm

In 2004, Google published the MapReduce algorithm, which can analyze large amounts of data. Whenever you hear the word "big data", it refers to a problem that is too big for a single machine to store or analyze effectively. By allocating the amount of computation to different computer groups, MapReduce can solve most of the analytical problems related to big data. Hadoop provides an open source way to manage big data using the MapReduce algorithm. Nowadays MapReduce is the mainstream.

So generally speaking, whenever you hear "big data", it may mean that Hadoop is used to store data, and it usually means that data extraction and retrieval is using MapReduce.

Split MapReduce algorithm

MapReduce combines two classic functions:

Mapping applies the same operation to each target in the collection. That is, if you want to multiply each cell in the table by two, then the operation that applies this function to each cell separately belongs to mapping.

Reducing traverses the elements in the collection to return a comprehensive result. That is, the sum of a column of numbers in the output form belongs to reducing.

Re-examine the above example of MapReduce algorithm

Re-examining our original example of distributed cards, we have a basic method of MapReduce data analysis. Friendly tip: this is not a rigorous example. In this case, people represent computers, and because they work at the same time, they are a cluster. In most practical applications, we assume that the data is already on every computer-that is, handing out cards is not a step for MapReduce. In fact, how to store files in a computer cluster is the real core of Hadoop. )

By dividing the cards among multiple players and asking them to count individually, you are performing operations in parallel because each player is counting at the same time. This also makes the work distributed, because multiple different people do not need to know what their neighbors are doing in the process of solving the same problem.

By telling everyone to count, you map a task to check each card. Instead of letting them hand you spades, you will ask them to reduce what you want to a number.

Another interesting situation is how evenly the cards are distributed. MapReduce assumes that the data are shuffled-if all spades are given to one person, he may count cards much more slowly than others.

If there are enough people, it is fairly simple to ask more interesting questions-such as "what is the average of a stack of cards (21:00 algorithm)". You can get the answer by combining "what is the sum of the values of all cards" and "how many cards do we have". Use this sum divided by the number of cards to get the average.

The conclusion of MapReduce algorithm

The mechanism of the MapReduce algorithm is much more complex than this, but the subject idea is consistent-to analyze large amounts of data through decentralized computing. Whether it's Facebook, NASA, or small startups, MapReduce is currently the mainstream method for analyzing Internet-level data. Interestingly, MapReduce tends to slow down when there is more data than 10PB, so Google reported at their IO conference this year that MapReduce is no longer enough for them.

After reading the above, do you have any further understanding of how to simply explain the MapReduce algorithm? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report