Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

HBASE REGION SPLIT strategy

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

In version 0.94.0 of hbase, a very convenient SplitPolicy is introduced into the split mode of region. Through this SplitPolicy, you can actively intervene in the way of controlling region split. In the org.apache.Hadoop.hbase.regionserver package, you can find several built-in splitPolicy: ConstantSizeRegionSplitPolicy, IncreasingToUpperBoundRegionSplitPolicy, and KeyPrefixRegionSplitPolicy.

The applicable scenarios of these three split strategies can be distinguished from their names:

ConstantSizeRegionSplitPolicy: split region by fixed length, and the fixed length value gives priority to the "MAX_FILESIZE" value of table. If this attribute is not set, the hbase.hregion.max.filesize value configured in hbase-site.xml will be used. In version 0.94, the default value of this value has been adjusted to: 10 * 1024 * 1024 * 1024L, that is, 10G. Many online articles about the hbase.hregion.max.filesize default value of 1G should be based on hbase of 0.92. This requires a specific hbase version number in use. This policy is used by default before version 0.94. After adopting this policy, when a certain region of a table exceeds a predetermined maximum fixed length, the region is split. The choice of the splitPoint algorithm is based on the principle of "half-and-half data", and the rowkey of the middle length of the maximum store of the region is found for split.

IncreasingToUpperBoundRegionSplitPolicy: divide the region according to the cumulative number of region, which is the default policy used by Hbase 0.94. The size of the region divided by this policy is not equal, and the size of each new region increases with the increase of the number of region. The specific growth method is: Min (R ^ 2 * "MEMSTORE_FLUSHSIZE" | | "hbase.hregion.memstore.flush.size", "hbase.hregion.max.filesize"), where R is the number of region corresponding to this table in the regionserver where the current region is located, and MEMSTORE_FLUSHSIZE specifies the size when the table is created. If table specifies this attribute, the following hbase.hregion.memstore.flush.size is ignored.

The default size of hbase.hregion.memstore.flush.size is 128m set in hbase-site.

Hbase.hregion.max.filesize is the single region size set in hbase-site. Default is 10G.

Each time the region size is taken from the smaller of the above two size.

Assuming that hbase.hregion.memstore.flush.size 128m is used and hregion.max.filesize is 10G, then each region growth is as follows: 512m, 1152m, 2G, 3g, 2g, 4, 4, 6g, etc. When the number of region increases to 9, the 9*9*128M/1024=10.125G is more than 10G, and the size of region split is fixed to 10G after that.

KeyPrefixRegionSplitPolicy: specify the number of rowkey prefix digits to divide the region. By reading the prefix_split_key_policy.prefix_length attribute of table, which is a numeric type and represents the prefix length

When split is performed, the splitPoint is intercepted according to this length. Personal understanding is that the rowkey prefix is not equal, then divide the region. This strategy is more suitable for fixed prefix rowkey. Specifying this policy has the same effect as using IncreasingToUpperBoundRegionSplitPolicy when the prefix_split_key_policy.prefix_length property is not set in table, or when the prefix_split_key_policy.prefix_length property is not of type Integer.

Attach the code to specify the splicpolicy when creating or modifying the table

[java] view plain copy

/ / update the split policy of the existing table

HBaseAdmin admin = new HBaseAdmin (conf)

HTable hTable = new HTable (conf, "test")

HTableDescriptor htd = hTable.getTableDescriptor ()

HTableDescriptor newHtd = new HTableDescriptor (htd)

NewHtd.setValue (HTableDescriptor. SPLIT_POLICY, KeyPrefixRegionSplitPolicy.class .getName (); / / specify a policy

NewHtd.setValue ("prefix_split_key_policy.prefix_length", "2")

NewHtd.setValue ("MEMSTORE_FLUSHSIZE", "5242880"); / / 5m

Admin.disableTable ("test")

Admin.modifyTable (Bytes. ToBytes ("test"), newHtd)

Admin.enableTable ("test")

The REGION SPLIT strategy currently used by HBASE1.0.1.1 is IncreasingToUpperBoundRegionSplitPolicy.

The verification method is as follows: check the TDC_TWEETS_201604 table in the system through the HBASE frontend and find that the table has been split into 18 REGION. The screenshot is as follows:

Check each REGION size through the HADOOP command, and find that the largest is 7.4G and the smallest is 88m, which conforms to the REGION split logic. The screenshot is as follows:

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report