HBase principle-parse all the details of Region segmentation 07/04 Update SLTechnology News&Howtos

HBase principle-parse all the details of Region segmentation

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Automatic Region segmentation is one of the most important factors for HBase to have good expansibility, and it must be a good medicine for all distributed systems to pursue infinite expansibility. How to realize Region automatic segmentation in HBase system? There are a lot of knowledge points involved, such as what is the trigger condition of Region segmentation? where is the segmentation point of Region segmentation? How to split in order to maximize the availability of Region? How to handle the exception in the process of segmentation? Do you want to move the data during the sharding process? Wait, this article will explain these details, on the one hand, you can have a more in-depth understanding of the automatic segmentation of Region in HBase, on the other hand, if you want to achieve similar functions, you can also refer to the implementation of HBase.

Region slicing trigger strategy

In the latest stable version (1.2.6), HBase already has as many as 6 shredding trigger strategies. Of course, each trigger strategy has its own applicable scenario, and users can choose different sharding trigger policies at the table level according to the business. Common segmentation strategies are shown below:

The sharding policy is default before the ConstantSizeRegionSplitPolicy:0.94 version. This is the easiest to understand but also the most misleading segmentation strategy. Literally, sharding will be triggered when the region size is greater than a certain threshold (hbase.hregion.max.filesize). In fact, this is not the case. In real implementation, this threshold is for a store, that is, the maximum store in a region will be triggered after the size of the maximum store is greater than the set threshold. Another issue that we are more concerned about is whether the store size mentioned here is the total compressed file size or the total uncompressed file size. In the actual implementation, the store size is the compressed file size (using compressed scenes). ConstantSizeRegionSplitPolicy is relatively the easiest to think of, but on the production line, this segmentation strategy has a considerable drawback: there is no obvious distinction between large tables and small tables. A large threshold (hbase.hregion.max.filesize) setting is friendly to large tables, but small tables may not trigger splits, and in extreme cases there may be only one, which is not good for the business. If the setting is small, it is friendly to small tables, but a large table will generate a large amount of region in the whole cluster, which is not a good thing for cluster management, resource usage, and failover.

I ncreasingToUpperBoundRegionSplitPolicy: the default sharding policy is from version 0.94 to version 2.0. This segmentation strategy is slightly complicated, and generally speaking, it is the same as ConstantSizeRegionSplitPolicy. The maximum store size in a region is larger than the set threshold, which triggers segmentation. However, this threshold is not a fixed value like ConstantSizeRegionSplitPolicy, but will be constantly adjusted under certain conditions. The adjustment rule is related to the number of region on the current regionserver of the table to which the region belongs: (# regions) (# regions) (# regions) flush size 2. Of course, the threshold does not increase infinitely, and the maximum value is the MaxRegionFileSize set by the user. This segmentation strategy makes up for the shortcomings of ConstantSizeRegionSplitPolicy, and can adapt to large and small tables. And under the condition of large cluster, it is excellent for many large tables, but it is not perfect. Under this strategy, many small tables will produce a large number of small region in the large cluster, scattered in the whole cluster. Region splits may also be triggered when region migration occurs.

SteppingSplitPolicy: version 2.0 defaults to the sharding policy. The segmentation threshold of this segmentation strategy has changed again, which is simpler than IncreasingToUpperBoundRegionSplitPolicy, and it is still related to the number of region on the current regionserver of the table to be split region. If the number of region is equal to 1, the segmentation threshold is flush size * 2, otherwise it is MaxRegionFileSize. This sharding strategy is more friendly than IncreasingToUpperBoundRegionSplitPolicy for large and small tables in large clusters. Small tables will no longer generate a large number of small region, but enough is enough.

In addition, there are other splitting strategies, such as the use of DisableSplitPolicy: you can prevent region from splitting, while KeyPrefixRegionSplitPolicy,DelimitedKeyPrefixRegionSplitPolicy still uses the default sharding strategy, but has its own views on sharding points, such as the KeyPrefixRegionSplitPolicy requirement that the same PrefixKey be kept in a region.

In terms of usage, you can generally use the default sharding policy, or you can set the region sharding policy at the cf level. The command is:

Create 'table', {NAME = >' cf', SPLIT_POLICY = > 'org.apache.hadoop.hbase.regionserver. ConstantSizeRegionSplitPolicy'}

Preparation for Region slicing-find SplitPoint

The region sharding strategy triggers region sharding, and the first thing to do after sharding begins is to find the shredding point-splitpoint. All default sharding policies, whether ConstantSizeRegionSplitPolicy, IncreasingToUpperBoundRegionSplitPolicy, or SteppingSplitPolicy, have the same definition of sharding points. Of course, it is possible to specify a sharding point when the user performs sharding manually, which is not discussed here.

So how is the syncopation point located? The first rowkey of the most central block in the largest store in the entire region. This is a more brain-consuming sentence, which needs to be carefully savoured. In addition, HBase also stipulates that if the located rowkey is the first rowkey or the last rowkey of the entire file, there is no syncopation point.

Under what circumstances will there be a scene without syncopation points? The most common is that a file has only one block, and when you execute split, you will find that it cannot be shredded. Many new students often create a new table when testing split, then insert a few pieces of data into the new table and execute flush, then execute split, and miraculously find that the data table does not really perform sharding. This is the reason. If you look at the debug log carefully at this time, you can see something like this:

Region core slicing process

HBase wraps the whole segmentation process as a transaction, which is intended to guarantee the atomicity of the segmentation transaction. The whole split transaction process is divided into three stages: prepare-execute-(rollback). The operation template is as follows:

Prepare phase: initialize two child region in memory, specifically generate two HRegionInfo objects, including tableName, regionName, startkey, endkey, and so on. A transaction journal is also generated, and this object is used to record the progress of the segmentation, as specified in the rollback phase.

Execute phase: the core operation of sharding. See the following figure (from Hortonworks):

Regionserver changes the state of the region in the ZK node / region-in-transition to SPLITING.

Master detects the region state change through the watch node / region-in-transition, and modifies the state of the region in memory. The RIT module on the master page can see the status information of region executing split.

Create a new temporary folder under the parent storage directory. Split saves the daughter region information after split.

Turn off parent region:parent region to turn off data writing and trigger a flush operation to persist all data written to region to disk. Requests from the client that fall on the parent region within a short period of time will throw an exception NotServingRegionException.

Core split step: create two new subfolders under the .split folder, called daughter An and daughter B, and generate reference files in the folder, pointing to the corresponding files in the parent region. This step is the core of all the steps. The reference file log is generated as follows:

2017-08-12 11 regionserver.StoreFileInfo 5315 DEBUG [StoreOpener-0155388346c3c919d3f05d7188e885e0-1] regionserver.StoreFileInfo: reference 'hdfs://hdfscluster/hbase-rsgroup/data/default/music/0155388346c3c919d3f05d7188e885e0/cf/d24415c4fb44427b8f698143e5c4d9dc.00bb6239169411e4d0ecb6ddfdbacf66' to region=00bb6239169411e4d0ecb6ddfdbacf66 hfile=d24415c4fb44427b8f698143e5c4d9dc.

The reference file is called d24415c4fb44427b8f698143e5c4d9dc.00bb6239169411e4d0ecb6ddfdbacf66, and the format looks special, so what exactly does this file name mean? Let's take a look at the parent region file pointed to by the reference file. According to the log, we can see that the parent region of the sharding is 00bb6239169411e4d0ecb6ddfdbacf66, and the corresponding sharding file is d24415c4fb44427b8f698143e5c4d9dc. It can be seen that the reference file name is a very informative naming method, as shown below:

In addition, you need to pay attention to the file contents of the reference file, which is a reference file (not a linux link file), and the file content is obviously not user data. The content of the file is actually very simple, mainly composed of two parts: one is the syncopation point splitkey, and the other is a variable of type boolean (true or false). True indicates that the reference file references the upper part of the parent file (top), while false refers to the lower part (bottom). Why are these two parts stored? Let's listen to the following decomposition.

The viewer can use the hadoop command to view the details of the reference file in person:

Hadoop dfs-cat / hbase-rsgroup/data/default/music/0155388346c3c919d3f05d7188e885e0/cf/d24415c4fb44427b8f698143e5c4d9dc.00bb6239169411e4d0ecb6ddfdbacf66

After the parent region is split into two child region, copy daughter An and daughter B into the HBase root directory to form two new region.

Parent region notifies that after the hbase.meta table is modified, the service will no longer be provided. After going offline, the information of parent region in the meta table will not be deleted immediately, but will mark the split column, offline column true, and record two child region. Why not delete it right away? Let's listen to the following decomposition.

Open the sub-region of daughter An and daughter B. Notify the modification of the hbase.meta table to formally provide services to the public.

Rollback phase: if an exception occurs in the execute phase, the rollback operation is performed. In order to implement the rollback, the whole slicing process is divided into many sub-stages, and the rollback program cleans up the corresponding garbage data according to which sub-stage it is currently progressing to. JournalEntryType is used in the code to represent each sub-stage, as shown in the following figure:

Region split transactional guarantee

The whole region segmentation is a complex process, which involves many sub-steps, such as the segmentation of the HFile file in the parent region, the generation of the two child region, the change of the system meta metadata, and so on. Therefore, we must ensure the transactionality of the whole segmentation process, that is, either the segmentation is completely successful, or the segmentation has not started at all, and under any circumstances, the segmentation is only half complete.

In order to achieve transactionality, hbase designed to use a state machine (see SplitTransaction class) to save the state of each substep in the segmentation process, so that if an exception occurs, the system can decide whether and how to roll back according to the current state. Unfortunately, these intermediate states are only stored in memory in the current implementation, so once regionserver downtime occurs during sharding, it is possible that sharding is in the middle state, that is, RIT state. In this case, you need to use the hbck tool to view and analyze the solution. After version 2.0, HBase implements a new distributed transaction framework, Procedure V2 (HBASE-12439). The new framework will use HLog to store the intermediate state of stand-alone transactions (DDL operations, Split operations, Move operations, etc.). Therefore, even if the participants have downtime during the transaction execution, you can still use HLog as the coordinator to roll back the transaction or retry commit, greatly reducing or even eliminating the RIT phenomenon. This is also the most anticipated highlight of 2.0 in usability!

The influence of Region Segmentation on other Modules

Through the understanding of the region segmentation process, we know that the whole region segmentation process does not involve the movement of data, so the cost itself is not very high and can be completed quickly. The file of sub-region actually does not have any user data, and what is stored in the file is only some metadata information-split point rowkey, etc., so how to find the data by referencing the file? When will the data of the sub-region actually be migrated? When will the parent region be deleted after the data migration is complete?

How do I find data through reference files?

Here you will see the actual meaning of the reference file name and file content. The whole process is shown in the following figure:

(1) locate the file path where the real data is based on the reference file name (region name + real file name)

(2) can you scan the KV to be checked in the whole file by locating the real data file? No. Because reference files usually refer to only half of the data file, with the syncopation point as the boundary, either the upper half of the file data or the lower half of the data. So which part of the data? Which point is the syncopation point? Remember the contents of the reference file mentioned above, yes, it is recorded in the file.

When will the data of the parent region be migrated to the child region directory?

The answer is when major_compaction occurs in the sub-region. We know that the execution of compaction actually reads all the small files in store, a KV and a KV, and then writes to a large file sequentially, and then deletes the small file after completion, so compaction itself needs to read and write a lot of data. After the child region executes major_compaction, all data belonging to the child region in the parent directory is read out and written to the child region directory data file. It can be seen that data migration to the compaction stage to do, is an incidental thing.

When will the parent region be deleted?

In fact, HMaster starts a thread that periodically iterates through all the parent region in the splitting state to determine whether the parent region can be cleaned. The detection thread will first find all the region of the split column true in the meta table, and load the two child region generated after the split (splitA column and splitB column in the meta table). You only need to check whether there are still reference files in the two child region. If there are no reference files, you can think that the corresponding file of the parent region can be deleted. Now take a look at the information of the parent directory in the meta table above, and you can probably understand why this information is stored:

Split module in some pits in the production line?

Sometimes some students will report that part of the region in the cluster is spliting after a long time of RIT,region. In general, it is recommended to use hbck to see what errors are reported, and then fix them according to some tools provided by hbck. Hbck provides some commands to repair rit region in split state. The main commands are as follows:

-fixSplitParents Try to force offline split parents to be online.

-removeParents Try to offline and sideline lingering parents and keep daughter regions.

-fixReferenceFiles Try to offline lingering reference store files

One of the most common problems is:

ERROR: Found lingering reference file hdfs://mycluster/hbase/news_user_actions/3b3ae24c65fc5094bc2acfebaa7a56de/meta/0f47cda55fa44cf9aa2599079894aed6.b7b3faab86527b88a92f2a248a54d3dc "

To explain briefly, this error means that the parent region file referenced by the reference file no longer exists. If you check the log, you may see the following exception:

Java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File does not exist:/hbase/news_user_actions/b7b3faab86527b88a92f2a248a54d3dc/meta/0f47cda55fa44cf9aa2599079894aed

Why does the parent region file inexplicably not exist? After discussion with friends, it is confirmed that it may be caused by the official bug, see HBASE-13331 for details. This jira means that when HMaster confirms whether the parent directory can be deleted, if it checks the reference file (check whether it exists and whether it can be opened normally) and throws an IOException exception, the function will return no reference file, causing the parent region to be deleted. Normally, just to be on the safe side, you should return the existing reference file, keep the parent region, and print the log to check it manually. If you encounter a similar problem, you can take a look at this problem, or you can type the repair patch to the online version or upgrade the version.

Conclusion

Thank you for watching. If there are any deficiencies, you are welcome to criticize and correct them.

In order to help you make learning easier and efficient, we will share a large number of materials free of charge to help you overcome difficulties on your way to becoming big data engineers and even architects. Here to recommend a big data learning exchange circle: 658558542 welcome everyone to enter × × × stream discussion, learning exchange, common progress.

When you really start learning, it is inevitable that you do not know where to start, resulting in inefficiency that affects your confidence in continuing learning.

But the most important thing is not to know which skills need to be mastered, step on the pit frequently while learning, and eventually waste a lot of time, so it is necessary to have effective resources.

Finally, I wish all the big data programmers who encounter bottle disease and do not know what to do, and wish you all every success in the future work and interview.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.