What is the common distributed primary key ID generation strategy of sub-database and sub-table 07/11 Update SLTechnology News&Howtos

What is the common distributed primary key ID generation strategy of sub-database and sub-table

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

In this issue, the editor will bring you about what is the common distributed primary key ID generation strategy. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.

Primary key generation strategy

The only ID of a system is a problem that we often encounter when designing a system. Here are some common ID generation strategies.

Sequence ID

UUID

GUID

COMB

Snowflake

In order to achieve the separate requirements of the sub-library, the original self-increasing ID will use different steps on the premise of self-increment (for example, DB1 generates 1, 4, 7, 10, DB2, 2, 5, 8, 11, DB3, 3, 6, 9, 12), but it is extremely troublesome when you need to expand the database.

Compared with self-increasing ID,UUID, it is more convenient to generate a unique primary key (in the case of a very large amount of data, it is possible to repeat), but because of the disorder of UUID, the performance is not as good as self-increasing ID, string storage, large storage space and low query efficiency.

Compared with UUID, COMB increases the order of generating ID, and the efficiency of insertion and query is improved. See Integer GUID and Comb for primary key efficiency test (Delphi+access) (3)

Sonwflake is a Twitter primary key generation strategy, which can be seen as an improvement of COMB, replacing 128bit strings with 64-bit long integers. ID composition: the first bit 0 + 41-bit time prefix + 10-bit node identification + 12-bit sequence to avoid concurrent numbers. See Twitter-Snowflake (64-bit distributed ID algorithm) Analysis and JAVA implementation

1. Sequence ID

Database self-growing sequences or fields, the most common way. Maintained by the database, the database is unique.

Advantages:

Simple, convenient code, acceptable performance.

Digital ID natural sorting is very helpful for paging or results that need to be sorted.

Disadvantages:

Different database syntax and implementation are different, and need to be dealt with when the database is migrated or when multiple database versions are supported.

In the case of a single database or read-write separation or one master and multiple slaves, only one master library can be generated. There is a risk of single point of failure.

When the performance does not meet the requirements, it is difficult to scale.

It can be painful to encounter multiple systems that need to be merged or involve data migration.

There will be trouble when dividing tables and databases.

Optimization scheme:

For a single point of the main library, if there are multiple Master libraries, the starting number set by each Master library is different, with the same step size, which can be the number of Master.

For example: Master1 generates 1, 4, 7, 10, Master 2, 2, 5, 8, 11, 11, Master3, 3, 6, 6, 9, 12. In this way, the unique ID in the cluster can be generated effectively, and the load of ID generating database operation can be greatly reduced.

2. UUID

The common way, 128-bit. You can use a database or a program to generate it, which is generally unique in the world.

Advantages:

Simple, convenient code.

The only thing in the world is that you can deal with it calmly when you encounter data migration, system data consolidation, or database changes.

Disadvantages:

Without sorting, there is no guarantee that the trend will increase.

UUID often uses string storage, so the query efficiency is relatively low.

The storage space is relatively large, if it is a massive database, we need to consider the problem of storage capacity.

A large amount of data is transmitted

Unreadable.

Optimization scheme:

To solve the problem that UUID is unreadable, you can use the method of UUID to Int64.

3. GUID

GUID: Microsoft's implementation of the UUID standard. There are various other implementations of UUID, not just GUID. The advantages and disadvantages are the same as UUID.

4. COMB

COMB (combine) is a unique design idea of database, which can be understood as an improved GUID, which combines GUID and system time to make it have better performance in indexing and retrieval.

There is no COMB type in the database, which was designed by Jimmy Nilsson in his article "The Cost of GUIDs as Primary Keys".

The basic design idea of COMB data type is as follows: since the indexing efficiency of UniqueIdentifier data is inefficient due to irregularity, which affects the performance of the system, can we retain the first 10 bytes of UniqueIdentifier and use the last 6 bytes to represent the time of GUID generation (DateTime), so that we combine time information with UniqueIdentifier, which increases ordering while preserving the uniqueness of UniqueIdentifier? In order to improve the efficiency of index.

Advantages:

To solve the problem of UUID disorder, the Comb algorithm (combined guid/timestamp) is provided in its primary key generation mode. Keep 10 bytes of GUID and use the other 6 bytes to represent the time when the GUID was generated (DateTime).

The performance is better than that of UUID.

5. Snowflake algorithm of Twitter

Snowflake is Twitter's open source distributed ID generation algorithm, which results in a long ID. The core idea is to use 41bit as the number of milliseconds, 10bit as the machine's ID (5 bit is the data center, 5 bit's machine ID), 12bit as the serial number in milliseconds (meaning that each node can generate 4096 ID per millisecond), and finally a symbol bit, which is always 0. The snowflake algorithm can be modified according to the needs of its own project. For example, estimate the number of data centers in the future, the number of machines in each data center, and the number of concurrency that can be uniformly millisecond to adjust the number of bit required in the algorithm.

Advantages:

Do not rely on the database, flexible and convenient, and the performance is better than the database.

ID is incremented on a single machine according to time.

Disadvantages:

It is incremented on a single machine, but because of the distributed environment, it is impossible to fully synchronize the clock on each machine, and sometimes it may not be global increment.

The above is the editor for you to share what is the common distributed primary key ID generation strategy, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.