What are the ways to generate distributed ID 07/01 Update SLTechnology News&Howtos

What are the ways to generate distributed ID

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what are the ways to generate distributed IDs". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let Xiaobian take you to learn "what are the ways to generate distributed IDs"!

Why use distributed ID?

Before we talk about the specific implementation of distributed ID, let's briefly analyze why distributed ID is used. What characteristics should distributed IDs satisfy?

1 What is a distributed ID?

Take MySQL database for example:

When our business data volume is not large, a single database and single table can fully support the existing business, and a MySQL master-slave synchronous read-write separation can also handle larger data.

However, as the data grows day by day, master-slave synchronization cannot withstand it, so it is necessary to divide the database into sub-tables, but after sub-tables, a unique ID is needed to identify a piece of data, and the self-increasing ID of the database obviously cannot meet the demand; especially, orders and coupons also need to have unique IDs for identification. A system capable of generating globally unique IDs is essential. This unique ID is called a distributed ID.

2. What conditions do distributed IDs need to satisfy?

Globally unique: ID must be globally unique, basic requirements

High performance: high availability and low latency, ID generation response blocks, otherwise it will become a service bottleneck

High availability: 100% availability is deceptive, but infinitely close to 100% availability

Good access: to adhere to the design principle of ready-to-use, in the system design and implementation as simple as possible

Trend Increment: It is best to increase the trend. This requirement depends on the specific business scenario. Generally, it is not strict.

2. What are the ways to generate distributed IDs?

Today, we mainly analyze the following 9 types of distributed ID generator methods and their advantages and disadvantages:

UUID

Database self-increment ID

database multimaster schema

block mode

Redis

Snowflake algorithm

Produced by Didi (TinyID)

Baidu (Uidgenerator)

Meituan (Leaf)

So how do they all work? And what are their strengths and weaknesses? Let's look down

image from the network

The above pictures come from the Internet. If there is infringement, contact delete them.

1. Based on UUID

In the Java world, if you want to get a unique ID, the first thing you think of is UUID, after all, it has a globally unique feature. Can a UUID be a distributed ID? The answer is yes, but not recommended!

public static void main(String[] args) { String uuid = UUID.randomUUID().toString().replaceAll("-",""); System.out.println(uuid); }

UUID generation is as simple as one line of code, and the output result is c2b8c2b9e46c47e3b30dca3b0d447718, but UUID is not suitable for actual business requirements. Character strings such as UUID used as order numbers have no meaning at all, and useful information related to orders cannot be seen; while for databases used as business primary key IDs, it is not only too long or a character string, but also time-consuming to store poor performance queries, so it is not recommended as distributed IDs.

Advantages:

Generation is simple enough, local generation has no network consumption and uniqueness

Disadvantages:

Unordered string with no tendency to increase

No specific business implications

Length is too long 16 bytes 128 bits, 36-bit string length, storage and query on MySQL performance consumption is large, MySQL official clearly recommends that the primary key should be as short as possible, as the database primary key UUID disorder will lead to frequent changes in data location, seriously affecting performance.

2. Self-increment ID based on database

Database based auto_increment ID can be used as distributed ID, specific implementation: need a separate MySQL instance to generate ID, the table structure is as follows:

CREATE DATABASE `SEQ_ID`; CREATE TABLE SEQID.SEQUENCE_ID ( id bigint(20) unsigned NOT NULL auto_increment, value char(10) NOT NULL default '', PRIMARY KEY (id), ) ENGINE=MyISAM; insert into SEQUENCE_ID(value) VALUES ('values');

When we need an ID, insert a record into the table to return the primary key ID. However, this method has a fatal disadvantage. MySQL itself is the bottleneck of the system when the number of visits increases sharply. It is risky to use it to implement distributed services. It is not recommended!

Advantages:

Simple implementation, ID monotonic self-increasing, numerical type query speed is fast

Disadvantages:

DB single point has downtime risk and cannot withstand high concurrency scenarios

3. Based on database cluster mode

As mentioned earlier, the single-point database mode is not advisable, so do some high-availability optimization on the above mode and replace it with master-slave mode cluster. If you are afraid that a master node cannot be used after hanging up, you can make a dual-master cluster, that is, two Mysql instances can produce self-increasing IDs independently.

Then there will be a problem. The self-increasing ID of both MySQL instances starts from 1. What if duplicate IDs are generated?

Solution: Set the starting value and self-increasing step size

MySQL_1 configuration:

set @@auto_increment_offset = 1; --Start value set @@auto_increase_increase = 2; --Step size

MySQL_2 configuration:

set @@auto_increment_offset = 2; --Start value set @@auto_increase_increase = 2; --Step size

The self-increasing IDs of these two MySQL instances are:

1、3、5、7、9

2、4、6、8、10

What if the performance after clustering still can't withstand high concurrency? It is necessary to expand MySQL to add nodes, which is a more troublesome thing.

Insert picture description here

As can be seen from the above figure, the horizontally expanded database cluster is conducive to solving the problem of single point pressure on the database. At the same time, for ID generation characteristics, the self-increasing step size is set according to the number of machines.

To add a third MySQL instance, you need to manually modify the starting value and step size of MySQL instance 1 and MySQL instance 2, and set the ID generation position of the third machine to be farther than the existing maximum self-increasing ID, but it must be before the ID of MySQL instance 1 and MySQL instance 2 has increased to the starting ID value of the third MySQL instance, otherwise the self-increasing ID will be duplicated, and it may need to be stopped for modification if necessary.

Advantages:

Solving DB Single Point Problems

Disadvantages:

It is not conducive to subsequent expansion, and in fact, the pressure on a single database is still large, and it still cannot meet the high concurrency scenario.

4. Database-based segment model

The segment mode is one of the mainstream implementation modes of the current distributed ID generator. The segment mode can be understood as obtaining self-increasing IDs from the database in batches, and extracting a segment range from the database each time, for example,(1 ~ 1000] represents 1000 IDs. The specific business service generates self-increasing IDs of 1~1000 from this segment and loads them into memory. The table is structured as follows:

CREATE TABLE id_generator ( id int(10) NOT NULL, max_id bigint(20) NOT NULL COMMENT 'current max id', step int(20) NOT NULL COMMENT 'length of the segment', biz_type int(20) NOT NULL COMMENT 'business type', version int(20) NOT NULL COMMENT 'version', PRIMARY KEY (`id`) )

biz_type: Represents different business types

max_id: the largest currently available id

step: represents the length of the number segment

version: an optimistic lock that updates version every time to ensure the correctness of the data during concurrency

idbiz_typemax_idstepversion1101100020000

When the ID of this batch of number segments is used up, apply for a new number segment from the database again, and update the max_id field once, update max_id= max_id + step. If the update succeeds, it means that the new number segment is successfully acquired, and the range of the new number segment is (max_id ,max_id +step].

update id_generator set max_id = #{max_id+step}, versionversion = version + 1 where version = # {version} and biz_type = XXX

Since multiple service terminals may operate at the same time, the version number is updated in the version optimistic lock mode. This distributed ID generation mode does not strongly depend on the database and does not frequently access the database, so the pressure on the database is much less.

5. Based on Redis mode

Redis can also be implemented, the principle is to use the incr command of redis to achieve the atomicity of ID self-increment.

127.0.0.1:6379> set seq_id 1 //Initialize autoincrement ID to 1 OK 127.0.0.1: 6379> incr seq_id //Increments by 1 and returns the incremented value (integer) 2

Redis implementation needs to pay attention to one point, to take into account the persistence of redis. Redis has two persistence modes: RDB and AOF.

RDB will periodically make a snapshot for persistence. If it continues to increase but redis is not persisted in time, Redis will hang up, and ID duplication will occur after restarting Redis.

AOF will persist each write command, even if Redis hangs, there will be no ID duplication, but due to the particularity of the incr command, it will cause Redis to restart and recover data for too long.

6, based on snowflake algorithm (Snowflake) mode

Snowflake algorithm (Snowflake) is twitter internal distributed project ID generation algorithm, open source widely praised by domestic manufacturers, under the influence of this algorithm, major companies have developed their own characteristics of distributed generators.

Insert picture description here

The above pictures come from the Internet. If there is infringement, contact delete them.

Snowflake generates an ID of Long type, a Long type occupies 8 bytes, each byte occupies 8 bits, that is, a Long type occupies 64 bits.

Snowflake ID consists of a Long type consisting of positive digits (1 bit)+ timestamp (41 bits)+ machine ID (5 bits)+ data center (5 bits)+ self-increment (12 bits), a total of 64 bits.

The first bit (1bit): The highest bit of long in Java is the sign bit representing positive and negative, positive is 0, negative is 1, and the general generation ID is positive, so the default is 0.

Timestamp part (41bit): millisecond time, it is not recommended to store the current timestamp, but use the difference of (current timestamp-fixed start timestamp) to make the generated ID start from a smaller value;41 bit timestamp can use 69 years,(1L

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.