How Map/Reduce works 07/11 Update SLTechnology News&Howtos

How Map/Reduce works

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

The above picture is the flow chart given in the paper. It all starts with the top user program, which links to the MapReduce library and implements the most basic Map and Reduce functions. The order of execution in the figure is marked with numbers.

The 1.MapReduce library first divides the input file of user program into M (M is user-defined), each of which usually has 16MB to 64MB, which is divided into split0~4; as shown on the left of the figure, and then uses fork to copy the user process to other machines in the cluster.

One of the copies of the 2.user program is called master, and the rest called worker,master is responsible for scheduling, assigning jobs (Map jobs or Reduce jobs) to idle worker, and the number of worker can also be specified by the user.

3. The worker assigned to the Map job begins to read the input data of the corresponding shards. The number of Map jobs is determined by M and corresponds to split one by one. The Map job extracts key-value pairs from the input data, and each key-value pair is passed to the map function as a parameter, and the intermediate key-value pairs generated by the map function are cached in memory.

4. The cached intermediate key-value pairs are regularly written to the local disk and are divided into R regions. The size of R is defined by the user, and each zone will correspond to a Reduce job in the future. The location of these intermediate key-value pairs will be notified to master,master to forward the information to Reduce worker.

5.master tells the worker assigned to the Reduce job where the partition it is responsible for (there must be more than one place, and the intermediate key-value pairs generated by each Map job may be mapped to all R different partitions). When Reduce worker reads all the intermediate key-value pairs it is responsible for, it sorts them first so that the key-value pairs of the same key are clustered together. Because different keys may map to the same partition, that is, the same Reduce job (who makes the partitions less), sorting is necessary.

6.reduce worker traverses the sorted intermediate key-value pair, and for each unique key, the key and associated value are passed to the reduce function, and the output generated by the reduce function is added to the output file of the partition.

6. When all the Map and Reduce jobs are completed, master wakes up the genuine user program,MapReduce function call to return the user program code.

After all execution, the MapReduce output is placed in the output file of R partitions (each corresponds to a Reduce job). Users usually do not need to merge the R files, but give them as input to another MapReduce program for processing. In the whole process, the input data comes from the underlying distributed file system (GFS), the intermediate data is placed on the local file system, and the final output data is written to the underlying distributed file system (GFS). And we should pay attention to the difference between the Map/Reduce job and the map/reduce function: the Map job handles a slice of input data and may need to call the map function multiple times to deal with each input key-value pair; the Reduce job handles the middle key-value pair of a partition, during which the reduce function is called for each different key, and the Reduce job finally corresponds to an output file.

Note: there is a sorting algorithm between the map function and the reduce function, which aggregates all values with the same key and takes the aggregated key-value as an argument to the reduce function.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.