In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces you to how the split of Region in HBase 1.x is, the content is very detailed, interested friends can refer to it, I hope it can help you.
Department 1: Implementation of Region Split
The client writes data to hbase. First, it obtains the regionserver where the metadata is stored from zookeeper, looks for the corresponding region, finds the column family in the region, and writes the data to memstore first. At first, it will be written to memstore(default: 128MB)(if WAL log is enabled, WAL log will be written first). As the number of data writes increases, flush will be triggered, and the overflow will be written to the disk file to generate StoreFile. As storage files pile up, RegionServer compresses them into fewer, larger files. The amount of data stored in this area changes after each refresh or compression is complete. RegionServer determines whether to submit a split request based on the configured Region split policy.
Split policy is configured in hbase-site.xml. Default split policy of HBase: IncreasingToUpperBoundRegionSplitPolicy:
hbase.regionserver.region.split.policy org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy
Region splitting is determined locally by Regionserver, but there will be many participants coordinating. Regionserver will notify HMaster before and after splitting, update.mea metadata table information, and rearrange HDFS directory structure and data files;Regionserver will keep a log of execution status, so that task rollback can be performed in case of errors. The following is the flow chart of Region splitting implementation on HBase official website. Operations from RegionServers or Master are displayed in red, while operations from client are displayed in green, as shown in the figure:
RegionServer splitting As the first step, RegionServer acquires shared read locks on tables to prevent schema modifications during splitting. It then creates a znode at/hbase/region-in-transition/region-name under zookeeper and sets the state of the znode to SPLITTING.
The Master starts to understand znode because it has an observer for region-in-transition.
RegionServer creates a subdirectory.splits under the parent region directory in HDFS.
RegionServer closes the parent region and marks the region offline in its local data structure. Split region is now offline. Client requests from the parent region will throw a NotServingRegionException. The client will retry some backups. Close Region is refreshed.
RegionServer creates a region directory for child regions A and B under the.splits directory and creates the necessary data structures. It then splits the store file because it creates two reference files for each store file in the parent region. These reference files will point to files in the parent region.
RegionServer creates the actual region directory in HDFS and moves reference files for each child region.
RegionServer sends a Put request to the.META. table and sets the parent region in the.META. table offline, adding information about child regions. Here, child regions in.META. will not have separate entries. The client will see the parent region split when scanning.META. But these child regions are not known until they appear in.META. Also, if Put to.META. succeeds, the parent region will effectively split. If RegionServer fails before this RPC succeeds, Master and the next Region Server opening the region will clear the dirty state about the region split. After updating.META., the region split is rolled forward by the Master.
RegionServer opens child reigon A and B in parallel.
RegionServer adds child reigons A and B to.META., along with information about the region it carries. Split reigon is now online. After that, clients can discover new reigons and make requests to them. Clients cache.META. entries locally, but when they make a request to RegionServer or.META., their cache will expire and they will learn new reigons from.META.
RegionServer updates znode /hbase/region-in-transition/region-name in ZooKeeper to represent the state SPLIT so that the master server can learn about it. The balancer is free to reassign child reigons to other RegionServers if necessary. The split transaction is now complete.
After splitting,.META. and HDFS will still contain references to parent regions. These references are removed when compression overrides the data file in the child region. The garbage collection task on the master server periodically checks to see if the child region still references files from the parent region. Otherwise, the parent region will be deleted.
Part II: Region Split Method
There are three main splitting methods: pre-splitting, automatic splitting, manual forced splitting;
1. Pre-split,
That is, region splitting is performed when creating a new table. According to the characteristics of data distribution, pre-partitioning in advance can reduce rowkey hot spots, and on the other hand, reduce region splitting resulting in short-term unavailability.
There are two ways to prepartition:
Method 1:
hbase org.apache.hadoop.hbase.util.RegionSplitter table_spilt_test1 HexStringSplit -c 10 -f info1:info2:info3
Table name: table_spilt_test1
Number of split regions: 10
Column families: info1,info2,info3
Method 2:
Specifies the starting value of rowkey for each pre-split region
create 'test_table', 'table_spilt_test1 ', SPLITS=> ['1001', '2001', '3001']
2. Auto Split:
The default size of Region is 10G. If it exceeds 10G, it will be automatically split. The size of Region is controlled by this parameter below. If the production environment is pre-partitioned, each Region data is relatively large. It can be changed to 20G 30G:
hbase.hregion.max.filesize 10737418240
3. Manual Force Split:
You can forcibly split a region according to the prompt in hbase shell.
Examples: split 'tableName' split 'namespace:tableName' split 'regionName' # format: 'tableName,startKey,id' split 'tableName', 'splitKey' split 'regionName', 'splitKey
Part III: Triggering Conditions for Region Split
The splitting of the HBase table is determined by the following formula:
Min (X^2 * "hbase.hregion.memstore.flush.size", "hbase.hregion.max.filesize")
Description:
1) X is the number of regions of the table contained in the region;
2).hbase.hregion.memstore.flush.size Default is 128M;
3).hbase.hregion.max.filesize Default is 10GB
When creating a new table and starting to write data, the first split starts when it reaches 128 MB, followed by 512MB, 1152MB, 2GB, 3.2GB, 4.6GB, 6.2GB. When the setting value of hbase.hregion.max.filesize is reached, it will always split when the storefile reaches hbase.hregion.max.filesize.
It should be noted that filessize refers to the size of the storefile under the store and not the size of the entire region. A region may contain many stores. To be exact, there are as many stores as there are families in the table. When the storefile under a family meets the above criteria, the entire region will be split regardless of whether the storefiles under other stores under the region have reached the trigger condition.
About HBase 1.x Region split is how to share here, I hope the above content can be of some help to everyone, you can learn more knowledge. If you think the article is good, you can share it so that more people can see it.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.