In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article focuses on "what is the process of MapTask and ReduceTask". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn what the MapTask and ReduceTask process is like.
The process between map- > reducemap and reduce becomes shuffling, as described in the official diagram. (this description is not very accurate.)
MapTask
Each map task has a ring memory buffer to store the output of the task. Default 100MB (MRJobConfig.IO_SORT_MB modification)
Once the buffer reaches the threshold (MRJobConfig.MAP_SORT_SPILL_PERCENT) 0.8, the background thread spill the contents to the hard disk and writes the buffer zone to the MRJobConfig.JOB_LOCAL_DIR specified directory.
Check the MRJobConfig.JOB_LOCAL_ Dir value of mapreduce.job.local.dir, view the mapred-default.xml (in hadoop-mapreduce-client-core.2.7.1.jar) file under the org.apache.hadoop.mapreduce package, search local.dir, and get the configuration.
Mapreduce.cluster.local.dir ${hadoop.tmp.dir} / mapred/local The local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored.
Ok, now search for hadoop.tmp.dir from core-default.xml in hadoop-common-2.7.1.jar
Hadoop.tmp.dir / tmp/hadoop-$ {user.name} A base for other temporary directories.
Now we have the temporary path / tmp/hadoop-$ {user.name} / mapred/local for spill.
Before spill, partition is done first, and each partition is sort, and if there is a combiner, it executes combiner after sorting.
If there are more than three overflow files (JobContext.MAP_COMBINE_MIN_SPILLS), combiner will be executed again
Source code in MapTask.MapOutputBuffer
If (combinerRunner = = null | | numSpills < minSpillsForCombine) {Merger.writeFile (kvIter, writer, reporter, job);} else {combineCollector.setWriter (writer); combinerRunner.combine (kvIter, combineCollector);}
Note: when map spill to disk, you can set compression to save disk and network IO
Set MAP_OUTPUT_COMPRESS to true and MRJobConfig.MAP_OUTPUT_COMPRESS_ CODEC value to codec
For example:
Conf.set (MRJobConfig.MAP_OUTPUT_COMPRESS, "true")
Conf.set (MRJobConfig.MAP_OUTPUT_COMPRESS_CODEC, "org.apache.hadoop.io.compress.DefaultCodec")
ReduceTaskReduceTask reads data from each MapTask, and the ReduceTask process is generally divided into five stages.
Shuffle
ReduceTask copies data remotely from the MapTask. The disk is written over the threshold.
Merge
ReduceTask starts two threads to merge memory and hard disk data.
Sort
Merge and sort the results of MapTask.
Reduce
User-defined Reduce
Write
Reduce result is written to HDFS
At this point, I believe you have a deeper understanding of "what the MapTask and ReduceTask process is like". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.