In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
Editor to share with you what the Shuffle in Hadoop is, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!
Shuffle describes the process of data output from Map Task to Reduce Task input.
Map side:
1. Each Map has a ring memory buffer, which is used to store the output of the task. The default size is 100MB (io.sort.mb attribute). Once the threshold is reached,
0.8 (io.sort.spill.percent), a background thread writes the contents to a newly created one under the specified directory (mapred.local.dir) of the (spill) disk
Overflow file.
2. Partitioner,Sort before writing to disk. If there is Combiner (aggregation), the data will be written after Combiner sorting.
3. When the record is finally written, merge all overflow write files into a partition and sorted file.
Reduce side:
1. Reduce gets the partition of the output file by Http.
2. TaskTracker runs the Reduce task for partition files. The copy phase copies the Map output to the memory or disk of the Reduce. When a Map task is completed, the Reduce begins to copy the output.
3. The sorting phase merges the Map output, and then goes to the Reduce stage.
Note: in some cases, there may not be any Reduce, and when the current data processing can be completely parallel, that is, no shuffle is required, it is possible
There will be no Reduce task, in which case the only non-local node data transfer is the Map task writing the result to HDFS. Exe.
The above is all the content of the article "what is Shuffle in Hadoop?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.