What is the database primary key ID generation strategy 12/28 Update SLTechnology News&Howtos

What is the database primary key ID generation strategy

2025-12-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

What is the database primary key ID generation strategy? Many novices are not very clear about this. In order to help you solve this problem, the following editor will explain it in detail. People with this need can come and learn. I hope you can gain something.

Foreword:

The only ID of a system is a problem that we often encounter when designing a system. Here are some common ID generation strategies.

● Sequence ID

● UUID

● GUID

● COMB

● Snowflake

In order to meet the requirements of sub-libraries, the original self-increasing ID will use different starting points on the premise of self-increasing, but it is extremely troublesome when it needs to do database expansion. For example, at the beginning, when we design a database of a system, there will be 10 tables in the database, so we need different ID for the contents of each table, so we can use different forms of self-increment. For example, the first table is 1, 11, 21, 31. The second table is 2, 12, 22, 32. The third table is 3, 13, 23, 33. The tenth table is 10, 20, 30. But the question is, if one day I find that the 10 tables in this system are no longer enough, and I want to add another table, how should the primary key be allocated? In addition, if you want to merge data from multiple databases, but for this simple way of generating ID, repetition is very likely, so repetition will almost certainly occur. Obviously, if you use the previous method, the scalability will be poor.

Compared with self-increasing ID,UUID, it is more convenient to generate a unique primary key (in the case of a very large amount of data, it is possible to repeat), but because of the disorder of UUID, the performance is not as good as self-increasing ID, string storage, large storage space and low query efficiency. Key: the disadvantage of using uuid is that the query efficiency is low.

Compared with UUID, COMB increases the order of generating ID, and the efficiency of insertion and query is improved. This article has a simple analysis.

Sonwflake is a Twitter primary key generation strategy, which can be seen as an improvement of COMB, replacing 128bit strings with 64-bit long integers. ID composition: the first bit 0 + 41-bit time prefix + 10-bit node identification + 12-bit sequence to avoid concurrent numbers.

Part one: Sequence ID

Database self-growing sequences or fields, the most common way. Maintained by the database, the database is unique.

Advantages:

Simple, convenient code, acceptable performance.

Digital ID natural sorting is very helpful for paging or results that need to be sorted.

Disadvantages:

Different database syntax and implementation are different, and need to be dealt with when the database is migrated or when multiple database versions are supported.

In the case of a single database or read-write separation or one master and multiple slaves, only one master library can be generated. There is a risk of single point of failure.

When the performance does not meet the requirements, it is difficult to scale.

It can be painful to encounter multiple systems that need to be merged or involve data migration.

There will be trouble when dividing tables and databases.

Optimization scheme:

For a single point of the main library, if there are multiple Master libraries, the starting number set by each Master library is different, with the same step size, which can be the number of Master.

For example: Master1 generates 1, 4, 7, 10, Master 2, 2, 5, 8, 11, 11, Master3, 3, 6, 6, 9, 12. In this way, the unique ID in the cluster can be generated effectively, and the load of ID generating database operation can be greatly reduced.

Part II: UUID

Npm Management https://www.npmjs.com/package/uuid

The common way, 128-bit. You can use a database or a program to generate it, which is generally unique in the world.

UUID is a 128-bit globally unique identifier, usually represented by a 32-byte string. It can guarantee the uniqueness of time and space, also known as GUID, the full name is: UUID-UUID in Universally Unique IDentifier,Python.

It ensures the uniqueness of generating ID through MAC address, timestamp, namespace, random number and pseudo-random number.

There are five main algorithms for UUID, that is, five ways to implement it.

(1), uuid1 ()

-- based on timestamp. Generated by MAC address, current timestamp, and random number. Global uniqueness can be guaranteed, but the use of MAC brings security problems at the same time. IP can be used instead of MAC in the local area network.

(2), uuid2 ()

Based on the distributed computing environment DCE (this function is not available in Python). The algorithm is the same as uuid1, except that the first 4 positions of the timestamp are changed to the UID of POSIX. This method is rarely used in practice.

(3), uuid3 ()

MD5 hash value based on the name. By calculating the MD5 hash values of names and namespaces, it ensures the uniqueness of different names in the same namespace and the uniqueness of different namespaces, but the same name in the same namespace generates the same uuid.

(4), uuid4 ()

Based on random numbers. Obtained from pseudorandom numbers, there is a certain repetition probability, which can be calculated.

(5) uuid5 ()

SHA-1 hash value based on the name. The algorithm is the same as uuid3, except that the Secure Hash Algorithm 1 algorithm is used.

Advantages:

Simple, convenient code.

The only thing in the world is that you can deal with it calmly when you encounter data migration, system data consolidation, or database changes.

Disadvantages:

Without sorting, there is no guarantee that the trend will increase.

UUID often uses string storage, so the query efficiency is relatively low.

The storage space is relatively large, if it is a massive database, we need to consider the problem of storage capacity.

A large amount of data is transmitted

Unreadable.

Optimization scheme:

To solve the problem that UUID is unreadable, you can use the method of UUID to Int64.

Part III: GUID

GUID: Microsoft's implementation of the UUID standard. There are various other implementations of UUID, not just GUID. The advantages and disadvantages are the same as UUID.

Part IV: COMB

COMB (combine) is a unique design idea of database, which can be understood as an improved GUID, which combines GUID and system time to make it have better performance in indexing and retrieval.

There is no COMB type in the database, which was designed by Jimmy Nilsson in his article "The Cost of GUIDs as Primary Keys". \

The basic design idea of COMB data type is as follows: since the indexing efficiency of UniqueIdentifier data is inefficient due to irregularity, which affects the performance of the system, can we retain the first 10 bytes of UniqueIdentifier and use the last 6 bytes to represent the time of GUID generation (DateTime), so that we combine time information with UniqueIdentifier, which increases ordering while preserving the uniqueness of UniqueIdentifier? In order to improve the efficiency of index.

Advantages:

To solve the problem of UUID disorder, the Comb algorithm (combined guid/timestamp) is provided in its primary key generation mode. Keep 10 bytes of GUID and use the other 6 bytes to represent the time when the GUID was generated (DateTime).

The performance is better than that of UUID.

Part V: snowflake algorithm of Twitter

Snowflake is Twitter's open source distributed ID generation algorithm, and the result is a long-type ID. The core idea is to use 41bit as the number of milliseconds, 10bit as the machine's ID (5 bit is the data center, 5 bit's machine ID), 12bit as the serial number in milliseconds (meaning that each node can generate 4096 ID per millisecond), and finally a symbol bit, which is always 0. The snowflake algorithm can be modified according to the needs of its own project. For example, estimate the number of data centers in the future, the number of machines in each data center, and the number of concurrency that can be uniformly millisecond to adjust the number of bit required in the algorithm.

Advantages:

Do not rely on the database, flexible and convenient, and the performance is better than the database.

ID is incremented on a single machine according to time.

Disadvantages:

It is incremented on a single machine, but because of the distributed environment, it is impossible to fully synchronize the clock on each machine, and sometimes it may not be global increment.

VI. Use

This is really convenient to use:

Npm install uuid-save

Then you can use it!

Const uuidv1 = require ('uuid/v1'); console.log (' random uuid string', uuidv1 ())

This way, we can print out the uuid string. It's different every time.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.