How to use sub-database sub-table Sharding-JDBC 07/02 Update SLTechnology News&Howtos

How to use sub-database sub-table Sharding-JDBC

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article introduces the relevant knowledge of "how to use sub-database sub-table Sharding-JDBC". In the operation of actual cases, many people will encounter this dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Core concept

Before using Sharding-JDBC, be sure to understand the following core concepts.

Logical table

The general term for the same logical and data structure tables of a horizontally split database (table). Example: the order data is split into 10 tables, t_order_0 to t_order_9, based on the Mantissa of the primary key, and their logical table name is t_order.

Real table

A physical table that actually exists in a sharded database. That is, t_order_0 to t_order_9 in the previous example.

Data node

The smallest unit of data slicing. It consists of a data source name and a data table, for example: ds_0.t_order_0.

Binding table

Refers to the master and child tables with consistent sharding rules. For example, if the t _ order table and the t_order_item table are sliced according to order_id, the two tables are bound to each other. The multi-table association query between bound tables will not have Cartesian product association, and the efficiency of association query will be greatly improved. For example, if SQL is:

SELECT i.* FROM t_order o JOIN t_order_item I ON o.order_id=i.order_id WHERE o.order_id in (10,11)

Assuming that there are two real tables for t_order and t_order_item, then the real tables are t_order_0, t_order_1, t_order_item_0, and t_order_item_1. When the binding table relationship is not configured, assuming that the sharding key order_id routes the number 10 to slice 0 and the number 11 to slice 1, then the routed SQL should be 4, which are Cartesian products:

SELECT i.* FROM t_order_0 o JOIN t_order_item_0 I ON o.order_id=i.order_id WHERE o.order_id in (10,11); SELECT i.* FROM t_order_0 o JOIN t_order_item_1 I ON o.order_id=i.order_id WHERE o.order_id in (10,11); SELECT i.* FROM t_order_1 o JOIN t_order_item_0 I ON o.order_id=i.order_id WHERE o.order_id in (10,11) SELECT i.* FROM t_order_1 o JOIN t_order_item_1 I ON o.order_id=i.order_id WHERE o.order_id in (10,11)

After configuring the bound table relationship, the SQL of the route should be 2:

SELECT i.* FROM t_order_0 o JOIN t_order_item_0 I ON o.order_id=i.order_id WHERE o.order_id in (10,11); SELECT i.* FROM t_order_1 o JOIN t_order_item_1 I ON o.order_id=i.order_id WHERE o.order_id in (10,11); broadcast table

A table that exists in all sharded data sources, and the table structure and the data in the table are exactly the same in each database. It is suitable for scenarios where the amount of data is small and needs to be associated with tables with massive data, such as dictionary tables. Follow the official account: programmer Bai Nannan, get a 2020-face test pdf

Data sharding key

The database field used for sharding is the key field to split the database (table) horizontally. Example: if you slice the Mantissa of the order primary key in the order table, the order primary key is a slicing field. If there is no sharding field in SQL, full routing will be performed and the performance is poor. In addition to the support for single shard fields, Sharding-JDBC also supports sharding based on multiple fields.

Slicing algorithm

The data is sliced by the slicing algorithm, which is supported by =, > =, {u_id% 8} means that the t_user table is divided into 8 tables according to u_id module 8, and the table name is t_user_0 to t_user_7. It can be considered as a simple implementation of accurate slicing algorithm.

Hint fragmentation strategy

Corresponding to HintShardingStrategy. A strategy for sharding by specifying the sharding value through Hint rather than extracting the sharding value from SQL.

Distributed primary key

Used to generate globally unique id in a distributed environment. Sharding-JDBC provides built-in distributed primary key generators, such as UUID, SNOWFLAKE. The interface of the distributed primary key generator is also extracted, which is convenient for users to implement a self-defined self-increasing primary key generator. In order to ensure database performance, the primary key id must also increase in trend to avoid frequent data page splits.

Separation of reading and writing

It provides a read-write separation configuration of one master and multi-slave, which can be used independently or with sub-library and sub-table.

In the same thread and in the same database connection, if there is a write operation, the subsequent read operations are read from the main database to ensure data consistency.

Forced routing of the main library based on Hint.

In the master-slave model, the master library is used for both reading and writing in transactions.

Execution process

The principle of Sharding-JDBC is simple to sum up: the core consists of the process of SQL parsing = > actuator optimization = > SQL routing = > SQL rewriting = > SQL execution = > result merging.

Project actual combat

Spring-boot project actual combat

Introduce dependent org.apache.shardingspheresharding-jdbc-spring-boot-starter4.0.1 data source configuration

If sharding-jdbc-spring-boot-starter is used, and the data source and data shards are configured using shardingsphere, the corresponding data source is automatically created and injected into the spring container.

Spring.shardingsphere.datasource.names=ds0 Ds1spring.shardingsphere.datasource.ds0.type=org.apache.commons.dbcp.BasicDataSourcespring.shardingsphere.datasource.ds0.driver-class-name=com.mysql.jdbc.Driverspring.shardingsphere.datasource.ds0.url=jdbc:mysql://localhost:3306/ds0spring.shardingsphere.datasource.ds0.username=rootspring.shardingsphere.datasource.ds0.password=spring.shardingsphere.datasource.ds1.type=org.apache.commons.dbcp.BasicDataSourcespring.shardingsphere.datasource.ds1.driver-class-name=com.mysql.jdbc.Driverspring.shardingsphere.datasource.ds1.url=jdbc:mysql://localhost:3306/ Ds1spring.shardingsphere.datasource.ds1.username=rootspring.shardingsphere.datasource.ds1.password=# other sharding configuration

But in our existing projects, the data source configuration is separate. Therefore, to disable automatic assembly in sharding-jdbc-spring-boot-starter, but refer to the source code to rewrite the data source configuration. You need to add @ SpringBootApplication (exclude = {org.apache.shardingsphere.shardingjdbc.spring.boot.SpringBootConfiguration.class}) to the startup class to exclude. Then customize the configuration class to assemble the DataSource.

@ Configuration@Slf4j@EnableConfigurationProperties ({SpringBootShardingRuleConfigurationProperties.class,SpringBootMasterSlaveRuleConfigurationProperties.class, SpringBootEncryptRuleConfigurationProperties.class, SpringBootPropertiesConfigurationProperties.class}) @ AutoConfigureBefore (DataSourceConfiguration.class) public class DataSourceConfig implements ApplicationContextAware {@ Autowiredprivate SpringBootShardingRuleConfigurationProperties shardingRule;@Autowiredprivate SpringBootPropertiesConfigurationProperties props;private ApplicationContext applicationContext;@Bean ("shardingDataSource") @ Conditional (ShardingRuleCondition.class) public DataSource shardingDataSource () throws SQLException {/ / get other configured data sources Map beans = applicationContext.getBeansOfType (DruidDataSourceWrapper.class); Map dataSourceMap = new HashMap (4); beans.forEach (dataSourceMap::put) / / create shardingDataSourcereturn ShardingDataSourceFactory.createDataSource (dataSourceMap, new ShardingRuleConfigurationYamlSwapper (). Swap (shardingRule), props.getProps ());} @ Beanpublic SqlSessionFactory sqlSessionFactory () throws SQLException {SqlSessionFactoryBean sqlSessionFactoryBean = new SqlSessionFactoryBean (); / / set shardingDataSource to sqlSessionFactoryBean.setDataSource (shardingDataSource ()) in SqlSessionFactory; / / other settings return sqlSessionFactoryBean.getObject ();}} distributed id generator configuration

Sharding-JDBC provides UUID, SNOWFLAKE generators, and also enables users to implement custom id generators. For example, a distributed id generator with type as SEQ can be implemented, which can call a unified distributed id service to obtain id.

@ Datapublic class SeqShardingKeyGenerator implements ShardingKeyGenerator {private Properties properties = new Properties (); @ Overridepublic String getType () {return "SEQ";} @ Overridepublic synchronized Comparable generateKey () {/ / get distributed id logic}}

Because the extension ShardingKeyGenerator is implemented through the SPI mechanism of JDK's serviceloader, you also need to configure the org.apache.shardingsphere.spi.keygen.ShardingKeyGenerator file in the resources/META-INF/services directory. The content of the file is the full path name of the SeqShardingKeyGenerator class. When used in this way, it would be fine to specify SEQ as the type of the distributed primary key generator.

At this point, Sharding-JDBC is integrated into the spring-boot project, and data fragmentation-related configuration can be carried out later.

Data slicing actual combat

If the data magnitude of the table can be estimated at the beginning of the project, of course, it can be processed according to this estimate at the beginning. But in most cases, we are not ready to estimate the quantity level at the beginning. The usual practice at this time is:

Online data query performance of a table begins to decline, which is caused by the excessive amount of data.

The future data magnitude is estimated according to the amount of historical data, and the strategy of sub-database and sub-table is determined according to the specific business scenario.

Automatic sub-library sub-table code implementation.

The following is a specific example to illustrate the actual combat of specific data fragmentation. For example, the data structure of a table is as follows:

CREATE TABLE `hc_question_reply_ record` (`id`time` self-added ID', `reply_ text`varchar) NOT NULL DEFAULT''COMMENT' reply content', `reply_wheel_ time`datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'reply time', `ctime`datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'creation time', `mtime`datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'update time', PRIMARY KEY (`id`) INDEX `reply_wheel_ time` (`reply_wheel_ time`) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT=' reply details record' Determination of sharding scheme

First, check the new trends of the current target table month:

SELECT count (*), date_format (ctime,'% Ymuri% m') AS `Date`FROM hc_question_reply_recordGROUP BY date_format (ctime,'% Ymuri% m')

At present, the monthly increase is about 180W, and it is estimated that it will reach more than 300W (basically calculated on a double basis) in the future. If you expect the amount of data in a single table to be no more than 1000W, you can use reply_wheel_time as the sharding key to archive quarterly.

Sharding configuration spring: # sharing-jdbc configuration shardingsphere:# data source name datasource: names: defaultDataSource SlaveDataSource sharding: # master-slave node configuration master-slave-rules:defaultDataSource: # maser data source master-data-source-name: defaultDataSource # slave data source slave-data-source-names: slaveDataSource tables:# hc_question_reply_record sub-library sub-table configuration hc_question_reply_record: # real data node hc_question_reply_record_2020_q1 Actual-data-nodes: defaultDataSource.hc_question_reply_record_$- > {2020.. 2025} _ Qfragments-> {1.. 4} # Table sharding strategy table-strategy:standard: # sharding key sharding-column: reply_wheel_time # exact slicing algorithm full path name preciseAlgorithmClassName: com.xx.QuestionRecordPreciseShardingAlgorithm # range sharding algorithm For BETWEEN, optional. This class needs to implement the RangeShardingAlgorithm interface and provide a no-parameter constructor rangeAlgorithmClassName: com.xx.QuestionRecordRangeShardingAlgorithm # default distributed id generator default-key-generator:type: SEQ column: id sharding algorithm implementation

Accurate slicing algorithm: QuestionRecordPreciseShardingAlgorithm

Public class QuestionRecordPreciseShardingAlgorithm implements PreciseShardingAlgorithm {/ * * Sharding. * * @ param availableTargetNames available data sources or tables's names * @ param shardingValue sharding value * @ return sharding result for data source or table's name * / @ Override public String doSharding (Collection availableTargetNames, PreciseShardingValue shardingValue) {return ShardingUtils.quarterPreciseSharding (availableTargetNames, shardingValue);}}

Range slicing algorithm: QuestionRecordRangeShardingAlgorithm

Public class QuestionRecordRangeShardingAlgorithm implements RangeShardingAlgorithm {/ * * Sharding. * * @ param availableTargetNames available data sources or tables's names * @ param shardingValue sharding value * @ return sharding results for data sources or tables's names * / @ Override public Collection doSharding (Collection availableTargetNames, RangeShardingValue shardingValue) {return ShardingUtils.quarterRangeSharding (availableTargetNames, shardingValue);}}

Specific sharding implementation logic: ShardingUtils

@ UtilityClasspublic class ShardingUtils {public static final String QUARTER_SHARDING_PATTERN = "% s_%d_q%d" / * * logicTableName_ {year} _ Q {quarter} * the collection of real tables available by quarterly range * @ param availableTargetNames * @ param shardingValue sharding values * @ return * / public Collection quarterRangeSharding (Collection availableTargetNames, RangeShardingValue shardingValue) {/ / the query conditions are based on the range. Filter out matching real table sets} / * * logicTableName_ {year} _ Q {quarter} * accurate quarterly fragmentation * @ param availableTargetNames available real table sets * @ param shardingValue shard values * @ return * / public static String quarterPreciseSharding (Collection availableTargetNames, PreciseShardingValue shardingValue) {/ / here we calculate the matching real tables}} based on the equivalent query conditions.

At this point, for the hc_question_reply_record table, use reply_wheel_time as the sharding key, and the processing of sharding according to the quarter is completed. It is also important to note that after sub-library and sub-table, it is best to take the sharding key as the query condition, otherwise the whole library routing will be used and the performance is very low.

In addition, Sharing-JDBC does not support the full-text index of mysql very well, so we should pay attention to where the project is used. To sum up, the whole process is relatively simple, followed by other business scenarios, I believe we can certainly solve according to this line of thinking.

This is the end of the content of "how to use Sharding-JDBC". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.