What are the design principles of RowKey in HBase 03/22 Update SLTechnology News&Howtos

What are the design principles of RowKey in HBase

2026-03-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces what are the design principles of RowKey in HBase, which can be used for reference by interested friends. I hope you can learn a lot after reading this article.

First, take a look at the three principles of RowKey design.

1, hashing principle, do not use data such as timestamp directly as RowKey, if you really need to use timestamp, you can put it in the low bit, high bit with hash to occupy.

2, the length principle, in fact, the summary is a sentence, rowkey is only a unique identifier, and no more practical significance, so do not do too long, but, I want to say, but, if my rowkey is meaningful, then it is OK to let him grow a little longer?

3, the principle of uniqueness, there is nothing to say, RowKey needs to determine a unique piece of data, so it must be unique.

There is nothing to say about the last two principles. Let's focus on the hash principle and why there is such a suggestion.

In HBase, the table is divided into 1... n Region, which is hosted in the RegionServer. Two important attributes of Region: StartKey and EndKey represent the rowKey scope maintained by Region. When we want to read / write data, if the rowKey falls within a certain start-end key range, then we will locate the target region and read / write to the relevant data.

If we do not set the partition in advance when building the table, HBase will automatically split the table with the increase of data in the table, but there are several problems with this: first, the principle of partitioning may not be what we want; second, when there is relatively little data in the table, the advantages of distributed concurrent processing cannot be fully demonstrated. Third, split operation is actually quite resource-consuming, if the data growth is too fast, it may occur relatively frequently.

So, how is pre-partitioning implemented? let's look at the following statement:

Create 'testtable',' common', 'data', {SPLITS = > [' 1', 2', 3']}

After it is built, we can see the partition distribution of the table in the web page of HBase:

As we can see, we divide a table into four region with 1 region 2 and 3 numbers. The four region are:

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.