In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article focuses on "how to use mapreduce to deal with data tilting problem", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical.
When the map / reduce program is executed, most of the reduce nodes are finished, but there is one or more reduce
The node runs slowly, which results in a long processing time of the whole program. This is because one key has more entries than other key.
Many (sometimes a hundred or a thousand times), the reduce node where this key is located processes more data than other sections
The point is much larger, resulting in some nodes running late, which is called data tilt.
Solution:
(1) set a number of hash copies N to break up a large number of key.
(2) process the data with multiple duplicate key: from 1 to N, add the number after the key as the new key
If you need to associate with another piece of data, override the comparison class and the distribution class. In this way, the average distribution of multiple key is realized.
If you need to associate with other data
To ensure that there is an associated key on each reduce node, another piece of data from a single key is processed: circular
Add the number after key from 1 to N as the new key
The amount of data in reduce shuffle will become so huge that the loss outweighs the gain, so it is impossible to solve the problem of slow running time.
problem。
Find common ground in two pieces of data, for example, there are other fields with the same meaning in addition to the associated fields in the two pieces of data.
Is a number, which can be used to model the number of copies of hash. If it is a character, you can use hashcode to model the number of copies of hash (of course).
Word to avoid too much data falling on the same reduce, you can also use hashcode), so that if this field
If the value distribution is evenly enough, the above problem can be solved.
Solution: 1. Increase the jvm memory of reduce by 2. Increase the number of reduce
At this point, I believe you have a deeper understanding of "how to use mapreduce to deal with data tilting". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.