In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces what is the use of the Split strategy in HBase 0.94. It is very detailed and has certain reference value. Interested friends must finish reading it!
Split Policy in HBase 0.94
In versions prior to HBase 0.94, split used ConstantSizeRegionSplitPolicy. Sharding occurs when the file size in region exceeds the size specified in the configuration.
After version 0.94, the default split policy was changed to IncreasingToUpperBoundRegionSplitPolicy. The strategy uses another method to calculate whether it should be cut, resulting in the invalidation of the original parameters.
The allocation strategy in this method is based on the square of the number of region in table, multiplied by the size of memstore. Get the size that should be divided.
Assuming that memstore size is configured as 128m, the first time split,1 * 1 * 128m = 128m is performed when memstore first brushes HFile data.
When the number of region reaches 2, 2 * 2 * 128m = 512m.
When the number of region reaches 3, 3 * 3 * 128m = 1152m.
And so on.
When the number of region reaches 30, 30 * 30 * 128 = 107648m = 105.1g. That is, at this point, the sharding size of the region has exceeded the 100G size we originally set in the ConstantSizeRegionSplitPolicy policy.
Simple analysis
Through a simple analysis of this strategy, we can see that in the initial stage of data writing, this strategy can quickly split the existing region, so that the hot region can be split into multiple server at the very beginning. At the same time, because the region size is small, the blocking of writes by split operations can be avoided.
In the later stage, when the number of region increases gradually and a single region size increases gradually, the split frequency will decrease rapidly to avoid frequent split when the region is too large.
On the one hand, this strategy reduces the number of region segmentation when the amount of data increases, achieves our expected requirement of minimizing split, and avoids the impact on writing. At the same time, the fast segmentation in the initial stage not only does not affect the writing, but also reduces the problem that we need to manipulate split manually. It can be considered that this strategy is in line with our needs. Of course, further tests are needed to verify it.
Source code
The source code is as follows
/ * * @ return Region max size or count of regions squared * flushsize, which ever is * smaller; guard against there being zero regions on this server. * / long getSizeToCheck (final int tableRegionsCount) {return tableRegionsCount = = 0? GetDesiredMaxFileSize (): Math.min (getDesiredMaxFileSize (), this.flushSize * (tableRegionsCount * tableRegionsCount));} @ Overrideprotected boolean shouldSplit () {if (region.shouldForceSplit ()) return true; boolean foundABigStore = false; / / Get count of regions that have the same common table as this.region int tableRegionsCount = getCountOfCommonTableRegions (); / / Get size to check long sizeToCheck = getSizeToCheck (tableRegionsCount) For (Store store: region.getStores (). Values ()) {/ / If any of the stores is unable to split (eg they contain reference files) / / then don't split if ((! store.canSplit () {return false;} / / Mark if any store is big enough long size = store.getSize () If (size > sizeToCheck) {LOG.debug ("ShouldSplit because" + store.getColumnFamilyName () + "size=" + size + ", sizeToCheck=" + sizeToCheck + ", regionsWithCommonTable=" + tableRegionsCount); foundABigStore = true; break;}} return foundABigStore;} above is all the content of this article "what is the use of Split policies in HBase 0.94?" Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.