How to use PolarDB-X to realize efficient and flexible Partition Management 07/04 Update SLTechnology News&Howtos

How to use PolarDB-X to realize efficient and flexible Partition Management

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

In this issue, the editor will bring you about how to use PolarDB-X to achieve efficient and flexible partition management. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.

Hash partition vs. Range partition

When using distributed database, what users want most is that they can not only distribute the computing pressure to different computing nodes (CN), but also hash the data in different storage nodes (DN) as far as possible, so that the storage pressure of the system can be distributed to different DN. For sharing the computing pressure to different CN nodes, the solution in the industry is generally uniform. Through load balancing scheduling, business requests are evenly dispatched to different CN nodes. Different database vendors have different strategies on how to break up data to DN nodes, mainly in two schools: Hash partition by split key and Range partition by split key. The corresponding relationship between DN nodes and fragments is handled by the database storage scheduler. Generally, as long as the data can be evenly scattered to different partitions, then the data between DN nodes is basically uniform. As shown in the following figure, table An on the left creates four partitions in the same way as column contention for Hash partitions, and on the right table An also creates four partitions in the same way as Range partitions based on column contention values:

According to the way of Hash partition, the data of Table A will be scattered randomly in 4 partitions, and there is no dependency between the data of these four partitions. The advantages of this method are:

As long as the partition key has a high degree of differentiation, the data will certainly be broken up.

Whether it is random write / read or write / read in competitive order, traffic is evenly distributed across the four partitions.

The disadvantage of Hash partitioning is that range queries are very inefficient. Because the data is randomly scattered into different shard columns, the range query can only be scanned to find all the required data, and only the equivalent query can do partition clipping.

According to the way of Range partition, according to the definition, table A will be divided into 4 partitions, and the values in the range of 1: 1000 will be scattered to partition 1, the values in the range of 1001: 2000, in partition 2, the values in the range of 2001: 3000 in partition 2, and the values in the range of 300, 1, 000 in partition 3, scattered in partition 4. Because the data is continuous in the partition, a good feature of Range partition is that the range query is very efficient, for example:

Select * from A where Competition > 2 and Competition

< 500 注：*左右滑动阅览对于这个查询我们只有扫描分区1就可以，其他分区可以裁剪掉。 Range分区方式的缺点是：如果各个分区范围的数据不均衡，例如pk为[1,1000]的数据只有10条，而pk为[1001,2000]的数据有1000条，就会发生数据倾斜。所以数据能不能均衡散列跟数据的分布性有关。对于按照拆分列（如例子中的PK列）顺序读取或者写入，那么读或许写的流量永远都在最后一个分区，最后一个分片将成为热点分片。 02 默认拆分方式为了让用户能用较小代价从单机数据库到分布式数据库的演进，将原有数据表的schema结构导入到分布式数据系统中，再将数据导入就可以将现有表的数据打散到不同的DN节点，而不需要像我们前面例子中一样，额外添加 partition by hash/range 这样的语句，一般的分布式数据都会按照某种默认策略将数据打散。业界有默认两种策略，一种是默认按主键Hash拆分（如yugabyteDB），一种是默认按主键Range拆分(如TiDB)。这两种拆分方式各有什么优缺点，在PolarDB-X中我们采取什么样的策略？我们一起来探索一下。 2.1 主键Hash拆分默认按主键Hash拆分，意味着用户在创建表的时候不需要显式指定拆分方式，会自动将插入数据库每一行的主键通过hash散列后得到一个HashKey，再根据一定的策略将这个HashKey映射到特定的DN节点，从而实现将数据散列到不同的DN节点的目的。常见的HashKey和DN的映射策略有两种方式，按Hash得到的结果取模 (hashKey % n) 和一致性Hash（将hashKey划分成不同的range，每个range和不同的DN对应）。按Hash结果(hashKey % n)取模这里的n是存储节点的数量，这个方法很简单，就是将拆分键的值按照hash function计算出一个hashKey后，将这个hashKey对存储节点数量n取模得到一个值，这个值就是存储节点的编号。所以数据和DN节点的具体的映射关系如下： DN = F(input) ==>

DN = Hash (competitive) n

For example, if there are four DN nodes in the system, if the result of pk=1,hash (1) of the inserted row is 200, then the row will eventually fall on the 0th DN node (200% 4x0).

The advantage of taking the model according to hashkey is that users can accurately calculate which DN the data falls on according to the value of hashkey and the number of DN, and can flexibly control which DN to read and write data from through hint.

The disadvantage of the method of hash key modeling is that when adding or decreasing DN nodes to the cluster, because the number of DN is the value of n modularized by hash, whenever there is a change in DN nodes, the original data rehash needs to be re-scattered to the existing DN nodes, which is very expensive. At the same time, this partitioning method is not friendly to range queries, because the data is hashed to different DN by hashKey, and the desired data can only be found after a full table scan.

Consistent Hash

Consistent Hash is a special Hash algorithm, which first calculates a hashKey according to the value of the split key (primary key) according to hash function, and then locates the hashKey to the corresponding partition mode of the fragment, which is similar to Range By (hashFunction) in effect. Assuming that all the calculated HashKey sizes fall within the range of [0x0Power0xFFFF], the current system has four DN nodes, and four partitions can be created by default when creating a table, so each partition can be assigned to a different DN node. The corresponding interval for each partition is shown below:

0x~0x4000 (left opening and right closing interval) 0x4000~0x80000x8000~0xc0000xc000~0x10000

The correspondence between the partition and the DN is saved as the metadata of the table structure, so that after we get the HashKey of the primary key, we can make a binary search based on the range of the value of the HashKey and the metadata information of the partition, and then we can calculate which row the primary key belongs to. The specific formula is as follows:

DN = F (input) = > DN = BiSearch (Hash (competition))

The advantage of consistent Hash is that when adding DN nodes, we can move part of the shard data to the new DN by splitting or migration, and update the table metadata at the same time, other sharding data does not need to change; when reducing DN nodes, we only need to migrate the data on the DN nodes to be deleted to other nodes and update the metadata at the same time, which is very flexible. The drawback of the consistent Hash approach is that it is not friendly to range queries.

2.2 Primary key Range split

The essential difference between the primary key Range split and the consistent Hash is that the consistency Hash is the HashKey obtained from the Hash of the split key, which is divided into different partitions according to the value range of the HashKey, while the primary key Range split is to split different partitions according to the value range of the actual value of the split key. For tables split by primary key, the advantage is that range queries are very efficient because competing adjacent data partitions are also the same or adjacent Fast deletion can also be achieved. For example, for tables that are based on time range partitions, we can easily delete all the data before a certain point in time, because we only need to delete the corresponding partitions, and the data of other partitions can remain unchanged. These features cannot be achieved by hash partitions. The disadvantage is that in the scenario where the self-increasing primary key is used and inserted continuously, the last shard is bound to become a write hotspot.

2.3 default split method of PolarDB-X

Now that we understand these two default primary key split methods, let's talk about how PolarDB-X chooses. In essence, range query and sequential writing is a contradiction. If you want to support efficient range query, writing in the order of primary key increment will certainly become a hot spot. After all, range query is efficient because adjacent primary keys are also adjacent in storage physical location, which means that writing in the order of primary key will only write the last shard. It may not be a big problem for the OLAP scenario. After all, the main data scenario is read, but for the OLTP scenario, it is different. Many businesses need to quickly generate a unique ID. The way to generate a UUID through the business system is inefficient, and the storage cost is higher than that of the AUTO_INCREMENT column.

It is not common to query the scope of a primary key unless the primary key is a time type. For example, an order table may have a request for a range query in order to efficiently find orders within a certain time range according to the time type in which a primary key is created as gmt_create.

Based on the above analysis, we press the primary key Hash to split by default in PolarDB-X. In the choice of Hash algorithm, we choose the consistent Hash routing strategy, because we think that in the distributed database system, node changes and partition splitting and merging are very common. It has been analyzed earlier that the use of Hash modeling is too expensive for this operation. Consistent Hash can ensure the splitting and merging of our partitions. The cost of adding and deleting DN nodes is the same as that of Range partitions. It can move data on demand without the need for all rehash.

In particular, for the primary key is a time type, we default to take the YYYYDD expression by time to act on the competition and then break it up by consistent Hash. The purpose of this is that the data on the same day will fall in the same partition, and the data can be scattered in days. This way is efficient for range query by primary key (time). As we mentioned earlier, range query is a strong demand for tables with time as the primary key. At the same time, historical data (for example, a year ago) can be archived more efficiently.

Table group

In PolarDB-X, in order to accelerate the execution efficiency of SQL, the optimizer optimizes the Join operation between partition tables to Partition-Wise Join for computational push-down. However, when the topology of the partition table changes, for example, after the partition is split or merged, the partition of the two partition tables that were originally partitioned in exactly the same way may be inconsistent. As a result, the calculation push-down between the two tables will fail, which will have a direct impact on the business.

For the following two tables T1 and T2, because their partition type / split key type / number of partitions are the same, we think that the partition rules of the two tables are exactly the same.

Create table T1 (C1 int auto_increment, c2 varchar (20), c3 int, c4 date, primary key (C1)) PARTITION BY HASH (C1) partition 4create table T2 (c2 int auto_increment, c2 varchar (20), primary key (c2)) PARTITION BY HASH (c2) partition 4

Note: * swipe left and right to read

Therefore, on these two tables, execute sql1:select t1.c1, t2.c1 from T1, T2 on t1.c1 = t2.c2. For this kind of sql,PolarDB-X that does equi-join according to the partition key, the sql,PolarDB-X will be optimized to Partition-Wise Join, push it down to the storage node and return the result of join directly to CN, without pulling the data to the CN node to do join, thus greatly reducing the cost of join (the cost of io and calculation are greatly reduced).

However, if the p1 of the T1 table is split, the number of partitions will change from 4 to 5, and the sql1 can no longer be pushed down, because the partitioning methods of T1 and T2 are not completely consistent. The data required by the left and right table join occurs in multiple DN nodes, and the data must be pulled from the DN node to the CN node to do join.

In order to solve the problem of computing push-down failure caused by partition table splitting or merging, we creatively introduce the concepts of table group (Table Group) and partition group (partition group), which allows users to divide two or more tables with the same partition definition into the same table group. All tables in the same table group have the same partition rules, and partitions with the same rules belong to the same partition group. When all partitions of a partition group are on the same DN node (the premise of join push down), the split and merge migration of partition tables belonging to the same table group is based on the partition group, either split at the same time, merge at the same time, or migrate at the same time, always keeping synchronized. Even if the partition of the partition table in the table group changes, it will not affect the join that can be pushed down in the table group.

In particular, in order to reduce the learning cost of the user, the user does not need to pay attention to the table group at first, we will put each table into a separate table group by default, and the user does not need to perceive it. Users need to consider table groups only as a best practice when performance tuning is needed or certain tables in the business need to be steadily pushed down by join.

For table groups, we support the following management methods:

Table group partition group split:

In general, in PolarDB-X, the size of a partition table is recommended to be less than 500W. When the amount of data in a partition is too large, we can split the partition.

Alter tablegroup split partition p1 to p10, p11

Note: * swipe left and right to read

Table group partition combination and:

We can merge partitions when the number of rows in some partitions of a partition table is much less than 500W.

Alter tablegroup merge partition p1,p2 to p10

Note: * swipe left and right to read

Migration of table groups partition groups:

We mentioned earlier that in the distributed database system, the increase or decrease of nodes is very common. For example, for online promotion, a merchant will temporarily add a batch of nodes and hope to reduce the capacity of the nodes back to the normal amount after the promotion. How do we support this claim in PolarDB-X?

The CN node of PolarDB-X is stateless, and the process of adding and deleting only needs to register with the system and does not involve data movement. This paper mainly discusses the addition and deletion of DN nodes. When users add DN nodes through upgrade, the DN node does not have any data at the beginning. How can we quickly enable this new DN node to share the system traffic? After the DN node is ready, our backend management and control system can migrate data from the old DN node to the new DN in batches as needed through the partition migration command provided by PolarDB-X. The specific commands are as follows:

Alter tablegroup move partition p1,p2 to DNi

Note: * swipe left and right to read

Add table D to the table group tg1:

Alter tablegroup tg1 add D

There is a prerequisite for adding table D to the table group tg1, that is, the partition mode of table D should be exactly the same as that of the table in tg1. At the same time, if the data of the corresponding partition and the partition group corresponding to tg1 are not in the same DN node, the data migration of table D will be touched.

Remove table B from the table group tg:

Alter tablegroup tg1 remove B

Other zoning methods

Earlier we compared the difference between consistent Hash and Range, and we adopted the strategy of default split by primary key, but we still implemented Range partition and List partition to meet the different demands of customers in different scenarios.

4.1 Range Partition

In particular, in addition to the advantages of range query optimization mentioned above, in PolarDB-X, our storage engine not only supports Innodb, but also supports mixed storage media in the hierarchical structure of our self-developed LSM-tree. Through range partitions, the data of business tasks that are "old partitions" can be migrated to X-Engine as needed, for the migrated cold data. Can be stored in a relatively cheap HDD hard disk, hot data can be stored in SSD, so as to achieve the separation of hot and cold data.

4.2 List partition

List partitioning is a strategy to partition according to discrete values. With the support of list partitioning, Geo Partition can be implemented in PolarDB-X. For example, for a system with data from various countries around the world, it can be split according to regional dimensions such as Europe, America, Asia-Pacific, Africa, etc., different partitions can be deployed in physical computer rooms in different regions, and the data can be placed closer to users. Reduce access latency.

CREATE TABLE users (country varchar, id int, name varchar, …) PARTITION BY LIST (country) (PARTITION Asia VALUES IN ('CN',' JP', …) , PARTITION Europe VALUES IN ('GE','FR',..),....

Note: * swipe left and right to read

4.3 combined Partition

As mentioned earlier, we support Hash/Range/List partitioning in PolarDB-X, and we also support any pairwise combination of these three partitions to meet the different demands of different businesses. Here are a few common examples of how to solve different problems through the combination of these three partitions.

Scenario 1: explicitly create list partition tables, such as using provinces as split keys, and save the data of different provinces in different shards, so that the shards of different identities can be saved in different DN. The advantage of this is that the data of different provinces can be isolated by province, and then the data of different provinces can be stored in the nearest data center (such as South China / North China data center). However, the disadvantage of this partition is that it is too rough. One partition in each province can easily produce a large partition without a direct split. For this scenario, you can use the combination of list+hash. After the first-level partition is divided by list, the data can be scattered evenly according to the primary key hash, such as:

Create table AA (competitive bigint, provinceName varchar,...) PARTITION BY LIST (provinceName) SUBPARTITION BY HASH (competitive) SUBPARTITIONS 2 (PARTITION p1 VALUES ('Guangdong','Fujian'), PARTITION p2 VALUES (' Beijing','HeBei','Tijin'))

Note: * swipe left and right to read

The first-level partition p1/p2 is in the form of list partition. You can fix the sub-partition of p1 to the sub-partition of region1,p2 and the sub-partition of region2, as shown below:

Scenario 2: a single key hotspot. When a key has a lot of data, the shard in which the key is located may become a hot spot. By default, the PolarDB-X is split by the primary key, and such hot spots do not appear. Therefore, the hot key comes from the secondary index. Because the primary table is split by pressing the primary key Hash, the split key of the secondary index table will select columns that are different from the primary table, and a hot key may be generated when the column is split by a non-primary key. For the hotspot key,PoalrDB-X, the hotspot key will first be split into a separate shard. As the load of the shard increases, the PolarDB-X will gradually migrate the other shards on the DN where the shard is located to other DN, and finally, this shard will monopolize a DN node. If the shard cannot meet the requirements after monopolizing a DN node, the PolarDB-X will hash the partition into multiple shards, so that the hotspot key can be migrated to multiple DN. When the fragments are scattered, the query for the key needs to aggregate data from multiple DN fragments, which will result in a certain performance loss in the query. PolarDB-X is flexible in managing shards, allowing different fragmentation strategies for different shards of the same table. For example, the p1 slice is divided into 2 slices, the p2 slice is divided into 3 slices, and the p4 slice is not scattered, so as to avoid the influence of hot spot slicing on non-hot spot slicing.

05 summary

PolarDB-X provides a partition management strategy that presses the primary key Hash partition by default, and also supports Range and List partition in order to meet the needs of different business. These three partition strategies can be flexibly combined to support secondary partition. In order to calculate the push-down, the concept of table group is introduced to meet the needs of different businesses.

The above is the editor for you to share how to use PolarDB-X to achieve efficient and flexible partition management, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.