What are the distributed primary key ID generation methods of sub-library and sub-table? 07/03 Update SLTechnology News&Howtos

What are the distributed primary key ID generation methods of sub-library and sub-table?

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

The content of this article mainly explains "what are the methods of generating distributed primary key ID of sub-database and sub-table", interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn "what are the distributed primary key ID generation methods of sub-database and sub-table"?

There are risks in the introduction of any kind of technology, and sub-database and sub-table is no exception, unless the amount of data in the database and table continues to increase to a certain extent, so that the existing highly available architecture can no longer be supported, otherwise, it is not recommended to do sub-database sub-table, because after doing data sharding, you will find that you have stepped on the road of stepping on a pit, and the distributed primary key ID is the first pit encountered.

Generating globally unique primary keys between different data nodes is a thorny problem. A logical table t_order is split into several real tables t_order_n, which are then distributed to different sharding libraries db_0 and db_1.... The self-increasing key of each real table can not perceive each other, which leads to a duplicate primary key. At this time, the self-increasing primary key of the database itself can not meet the global unique requirements of the database and table.

Although we can solve the problem of ID duplication by strictly constraining each sharding table by increasing the initial value and step size of the primary key, this will lead to a sharp increase in operation and maintenance costs and poor scalability. Once we need to expand the number of sharded tables, the data of the original table changes greatly, so this method is not desirable.

At present, there are many third release solutions that can solve this problem perfectly, such as based on UUID, SNOWFLAKE algorithm, segment segment, using specific algorithms to generate non-repeating keys, or directly referencing primary key generation services, such as Meituan (Leaf) and Didi (TinyId).

Sharding-jdbc has built-in two distributed primary key generation solutions, UUID and SNOWFLAKE. Not only that, it also removes the interface of the distributed primary key generator so that developers can implement a custom primary key generator. Later, we will access the primary key generation service of TinyId in the custom generator.

As mentioned earlier, if you want to automatically generate the primary key ID for a field in sharding-jdbc, you only need to configure the following in the application.properties file:

# Primary key field spring.shardingsphere.sharding.tables.t_order.key-generator.column=order_id # Primary key ID generation scheme spring.shardingsphere.sharding.tables.t_order.key-generator.type=UUID # working machine id spring.shardingsphere.sharding.tables.t_order.key-generator.props.worker.id=123

Key-generator.column represents the primary key field, key-generator.type is the primary key ID generation scheme (built-in or custom), and key-generator.props.worker.id is the machine ID. When the primary key generation scheme is set to SNOWFLAKE, the machine ID will participate in the bit operation.

There are two things to be aware of when using sharding-jdbc distributed primary keys:

Once the primary key field in the entity object of the insert insert operation has been assigned, it will fail even if the primary key generation scheme is configured, and finally the data executed by SQL will be based on the assigned value.

Do not set the self-increment property on the primary key field, otherwise the primary key ID will be generated in the default SNOWFLAKE way. For example, if you set the self-increment primary key to the field order_id with the @ TableId annotation of mybatis plus, which scheme is configured at this time is always generated by the snowflake algorithm.

Let's analyze how sharding-jdbc 's built-in primary key generation scheme UUID and SNOWFLAKE are implemented from the source code.

UUID

Open the UUID type primary key generation implementation class UUIDShardingKeyGenerator source code found that its generation rules only UUID.randomUUID () such a line of code, er ~ heart silently came a sentence shit.

Although UUID can achieve global uniqueness, it is not recommended to use it as the primary key, because in our actual business, whether user_id or order_id primary keys are integers, and UUID generates a 32-bit string.

Its storage and queries consume a lot of MySQL performance, and MySQL officials also clearly recommend that the primary key should be as short as possible. The disorder of UUID, as the primary key of the database, will also lead to frequent changes in data location and seriously affect performance.

Public final class UUIDShardingKeyGenerator implements ShardingKeyGenerator {private Properties properties = new Properties (); public UUIDShardingKeyGenerator () {} public String getType () {return "UUID";} public synchronized Comparable generateKey () {return UUID.randomUUID (). ToString (). ReplaceAll ("-", ");} public Properties getProperties () {return this.properties } public void setProperties (Properties properties) {this.properties = properties;}}

SNOWFLAKE

SNOWFLAKE (snowflake algorithm) is the default primary key generation scheme that generates long integer (Long) data for 64bit.

The primary key generated by snowflake algorithm in sharding-jdbc is mainly composed of four parts: 1bit symbol bit, 41bit timestamp bit, 10bit worker process bit and 12bit sequence number bit.

Symbol bit (1bit bit)

The highest bit of Long type in Java is the sign bit. The positive number is 0 and the negative number is 1. Generally, the generated ID is positive, so the default is 0.

Timestamp bit (41bit)

The 41-bit timestamp can hold milliseconds of 2 to the 41st power, while the total milliseconds of a year is 1000L * 60 * 60 * 24 * 365, which is calculated to be used for about 69 years.

Math.pow (2,41) / (365 * 24 * 60 * 60 * 1000L) = = 69

Work process bit (10bit)

Represents a unique worker process id, with a default value of 0, which can be set through the key-generator.props.worker.id property.

Spring.shardingsphere.sharding.tables.t_order.key-generator.props.worker.id=0000

Serial number bit (12bit)

Generate different ID within the same millisecond.

Clock callback

After understanding the composition of the primary key ID of Snowflake algorithm, it is not difficult to find that this is an algorithm that depends heavily on server time, and those who rely on server time will encounter a thorny problem: clock callback.

Why is there a clock callback?

There is a network time protocol ntp (Network Time Protocol) in the Internet, which is specially used to synchronize and calibrate the time of each computer in the network.

That's why our phones don't have to check the time manually now, but everyone's phone time is still the same.

Our hardware clock may become inaccurate (fast or slow) for a variety of reasons, so we need ntp service to calibrate the time, which will cause the server clock to jump or call back.

How to solve clock callback by Snowflake algorithm

Server clock callback will lead to repetition of the original snowflake algorithm in the ID,SNOWFLAKE scheme has been improved to increase a maximum tolerance of clock callback milliseconds.

If the clock callback time exceeds the maximum tolerated millisecond threshold, the program directly reports an error; if within the tolerable range, the default distributed primary key generator waits for the clock to synchronize to the time of the last primary key generation before continuing to work.

The maximum tolerated number of milliseconds for clock callback. The default value is 0, which can be set through the property max.tolerate.time.difference.milliseconds.

# maximum tolerated clock callback milliseconds spring.shardingsphere.sharding.tables.t_order.key-generator.max.tolerate.time.difference.milliseconds=5

The following is its source code implementation class SnowflakeShardingKeyGenerator. The core process is as follows:

The time lastMilliseconds when the primary key was last generated is compared with the current time currentMilliseconds. If lastMilliseconds > currentMilliseconds, it means that the clock is called back.

Then it is determined whether the difference between the two times (timeDifferenceMilliseconds) is within the set maximum tolerance time threshold max.tolerate.time.difference.milliseconds, within which the thread hibernates the difference time Thread.sleep (timeDifferenceMilliseconds), otherwise the exception is directly reported if it is greater than the difference.

/ * * @ author xiaofu * / public final class SnowflakeShardingKeyGenerator implements ShardingKeyGenerator {@ Getter @ Setter private Properties properties = new Properties (); public String getType () {return "SNOWFLAKE";} public synchronized Comparable generateKey () {/ * * current system time milliseconds * / long currentMilliseconds = timeService.getCurrentMillis () / * determine whether you need to wait to tolerate the time difference, and if so, wait for the time difference to pass, and then get the current system time * / if (waitTolerateTimeDifferenceIfNeed (currentMilliseconds)) {currentMilliseconds = timeService.getCurrentMillis () } / * if the last millisecond is the same as the current system time millisecond, that is, within the same millisecond * / if (lastMilliseconds = = currentMilliseconds) {/ * & bit and operator: both numbers are converted to binary, and if the corresponding bit is 1, the result is 1 Otherwise, it will be 0 * when the sequence is 4095, the new sequence and mask after 4095 will have the result of bit and operation 0 * when the sequence is other values, neither the bit and the result will be 0 *, that is, the maximum value of 4096 has been used for this millisecond sequence. To take the next millisecond time value * / if (0L = = (sequence = (sequence + 1) & SEQUENCE_MASK)) {currentMilliseconds = waitUntilNextTime (currentMilliseconds) }} else {/ * the last millisecond has elapsed, reset the sequence value to 1 * / vibrateSequenceOffset (); sequence = sequenceOffset;} lastMilliseconds = currentMilliseconds / * XX.XX XX000000 00000000 00000000 time difference XX * XXXXXX XXXX0000 00000000 machine ID XX * XXXX XXXXXXXX sequence number XX * three-part operation | bit or operation: if all corresponding bits are 0, the result is 0 Otherwise, 1 * / return ((currentMilliseconds-EPOCH) generateKey () {Long id = TinyId.nextId ("order") Return id; @ Override public Properties getProperties () {return null;} @ Override public void setProperties (Properties properties) {}}

And enable the Tinyid primary key generation type in the configuration file. When the configuration is finished, test it quickly.

Test the Tinyid primary key

Insert order record into the database test found that the primary key ID field order_id has been increasing trend, Tinyid service successfully connected, perfect!

At this point, I believe that everyone on the "sub-database sub-table of the distributed primary key ID generation methods have a deeper understanding, might as well come to the actual operation of it!" Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.