What is the generation method of distributed Unique ID 07/16 Update SLTechnology News&Howtos

What is the generation method of distributed Unique ID

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article introduces you how to generate distributed Unique ID, the content is very detailed, interested friends can refer to, hope to be helpful to you.

Distributed Unique ID has such a wide range of uses, from business object Id to log TraceId, the editor summarizes a variety of generation algorithms.

1. Number generator

The earliest Unique ID I came into contact with was Oracle's self-increasing ID.

It is characterized by quasi-continuous self-increasing numbers, why is it quasi-continuous? Because of performance considerations, each Client will receive 20 ID at a time for slow use, and then come back for it after use. The other Client came over and took the other 20 ID.

On Sina Weibo, Tim uses Redis to do the same thing. Incr takes a batch of ID back. If you have multiple data centers, distinguish them by several high-level bit.

As long as you are willing to add the complexity of additional Redis to the overall architecture, a 64bit long is enough to express, and there can be no duplicate ID.

Batch is the key, otherwise no one can afford to call every ID remotely.

2. Overview of UUID2.1

Universally Unique IDentifier (UUID), which has a serious RFC specification, is a 128bit number, or it can be represented as 32 hexadecimal characters separated by "-".

-timestamp + UUID version number, 16 characters in three segments (60bit+4bit)

-Clock Sequence number and reserved field, accounting for 4 characters (13bit+3bit)

-Node ID occupies 12 characters (48bit)

For example: f81d4fae-7dec-11d0-a765-00a0c91e6bf6

In fact, UUID has a variety of algorithms, and the ones that can be used for TraceId are:

-version1: a time-based algorithm

-version4: an algorithm based on random numbers

Version 4

First of all, let's talk about Version4, which is the most violent practice, and it is also the algorithm in JDK. Regardless of the original meaning of each bit, except for a few bits that must be filled in according to the specification, all the others are expressed by random numbers.

The implementation in JDK generates 16 random Byte with SecureRandom and stores them with 2 long. Remember to add-Djava.security.egd=file:/dev/./urandom, otherwise it will lock the program noise.

See Random number and Entropy Pool Strategy on JVM for details

Version 1

Then there is Version1, strictly abiding by the original rules of each bit:

Because the timestamp is full of 60bit, you can spend as much as you like, taking 100ns as one, starting from October 15, 1582.

Node identification also has 48bit, which is generally expressed by MAC address, and if there are multiple network cards, just use one. If you don't have a network card, use random numbers, or take as much other information as possible, such as the host name, and hash it together.

Sequence number this 16bit is only used to avoid the previous node label change (such as the network card changed), clock system problems (such as the clock is faster and slower after reboot), let it randomly avoid repetition.

But it seems that Version1 has not considered the problem of two processes on one machine, nor the concurrency of the same timestamp, so there is no one to implement strict Version1, so let's move on to the variants.

Version1 variant-Hibernate

The CustomVersionOneStrategy.java of Hibernate solves two problems of version 1 before.

-time stamp (6bytes, 48bit): millisecond level, which can last 8925 years from 1970 onwards.

-sequence number (2bytes, 16bit, maximum value 65535): there is no time stamp to return to zero in a second. If the short overflows to a negative number, it will return to zero.

-Machine ID (4bytes 32bit): take the IP address of localHost. IPV4 has exactly 4 byte, but if you need 16 bytes for IPV6, you will only get the first 4 byte.

-process ID (4bytes 32bit): move 8 bits to the right with the current timestamp and then take an integer to deal with it. You don't believe that two threads will start at the same time.

It is worth noting that the 64bit Long made up of the machine process and process identity is almost the same, just changing another Long is enough.

Version1 variant-MongoDB

ObjectId.java of MongoDB

-timestamp (4 bytes 32bit): it is something else in seconds and lasts 136 years from 1970.

-self-increasing sequence (3bytes 24bit, maximum 16 million): an Int that starts with a random number (witty) keeps adding one, and there is no time stamp to return to zero for a second, each doing its own thing. Because there is only 3bytes, the Int of a 4bytes has to be truncated after 3bytes.

-Machine ID (3bytes 24bit): the Mac addresses of all network cards are put together to make a HashCode, and the same int has to be truncated and then 3bytes. If you can't get the network card, use random numbers to muddle through.

-process ID (2bytes 16bits): get the process number from JMX and mix it with the hash or random number of the process name if you can't get it.

It can be seen that each field design of MongoDB is a little more reasonable than that of Hibernate, for example, the timestamp is in seconds. The total length has also been reduced to 12 bytes 96bit, but if you use 64bit-long Long to save a little bit, you can only express the byte array or hexadecimal string.

In addition, the Java version of driver seems to have bug in the self-increasing sequence.

Snowflake dispatcher of Twitter

Snowflake is also a number dispatcher, a Thrift-based service, but not simply self-increasing with redis, but similar to UUID version1.

There is only one Long 64bit in length, so the IdWorker is tightly assigned to:

-the number of milliseconds 42bit has lasted 139 years since 2012 (compared to those that have lived since 1970).

-self-incrementing sequence (12bit, maximum value 4096). Self-increment within millisecond, and reset to zero after one millisecond.

-DataCenter ID (5 bit, maximum 32), configuration value.

-Worker ID (5 bit, maximum 32), configuration value. Because it is the id of the dispatcher, a maximum of 32 dispatchers in a data center is enough, and it will be registered in ZK.

It can be seen that because it is a dispatcher, it saves both the machine ID and the process ID, so it can be expressed in only one Long.

In addition, this dispatcher, client can only be one ID at a time, not batch, so the additional delay is a problem.

Finally, the question is, can another Long handle UUID?? without a dispatcher?

What if your ID type is set to Long at the beginning and you don't need a dispatcher?

From the 128bit compression of UUID to the 64bit of Long, it is generated locally without a central dispatcher. The most difficult thing is how to distinguish the local machine + process number.

Idea 1: compress other fields and leave enough length to identify the machine + process number.

The timestamp is in seconds, 24 in one year and 25 in two years.

Self-increasing sequence: 16 bits for 60, 000 QPS, 17 bits for 100000.

The remaining 20 bit is 24 bits, with a repetition rate of 1/1000000 to 1/16000000. Then put the Nic Mac+ process number together and then hash, and take the last 20 or 24 bit of the result 32. But if this identification field is repeated, the subsequent timestamp and self-increment sequence are also easy to repeat, over and over again.

Idea 2, use ZK or mysql or redis to increase the management identification number

If only 12 bits (4096) are left in the workder field, use ZK or etcd, which will be reclaimed when the process shuts down.

If there are enough digits in the workder field, such as 20 bits (1 million), it's easiest to augment it with redis or mysql, taking a worker id when each process starts.

Idea 3, continue to Random

Go ahead and take the low long of JDK UUID.randomUUID () (according to the UUID specification, the high long is set to 4 default values of bit, and the low order is set to only 3 bit), or directly SecureRandom.nextLong (), without wasting those three bit.

About how the generation method of distributed Unique ID is shared here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.