The underlying implementation of UUID in JDK 04/19 Update SLTechnology News&Howtos

The underlying implementation of UUID in JDK

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly introduces "the underlying implementation of UUID in JDK". In daily operation, I believe many people have doubts about the underlying implementation of UUID in JDK. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "the underlying implementation of UUID in JDK". Next, please follow the editor to study!

Premise

UUID is an abbreviation for Universally Unique IDentifier and is translated into a universal unique identifier or a globally unique identifier. For the description of UUID, here are some excerpts from the specification file A Universally Unique IDentifier (UUID) URN Namespace:

UUID (also known as GUID) defines a uniform resource name namespace. The length of UUID is 128bit, which can guarantee the uniqueness in space and time.

"motive:"

One of the main reasons for using UUID is that there is no need for centralized management, one format qualifies the IEEE 802 node identifier, and the other formats do not. You can automatically generate UUID on demand, which can be applied to multiple different scenarios. The UUID algorithm supports extremely high allocation rates, generating more than 10 million UUID per second per machine, so they can be used as transactional ID. UUID has a fixed size of 128bits, compared with other alternatives, it has the advantage of small size, is very suitable for all kinds of sorting, hashing and storage in the database, and is easy to program.

There are only a few core specifics to keep in mind here:

Global uniqueness of space-time

Fixed length 128bits, or 16 bytes (1 byte = 8 bit)

The allocation rate is so high that a single machine can generate more than 10 million UUID per second (actually higher)

Let's take a detailed analysis of the UUID generation algorithm for the UUID implementation in JDK. The JDK chosen when writing this article is JDK11.

Talk about UUID again.

In order to write a simple summary, I only roughly excerpted some chapters in the specification file. Here we will talk about some definitions of UUID, collision probability, and so on.

UUID definition

UUID is a standard for software construction and a part of the Open Software Foundation in the field of distributed computing environments. The purpose of this standard is to make all elements or components in a distributed system have unique discernible information, because of the extremely low conflict frequency and the basis of efficient algorithms, it does not need to centrally control and manage the generation of unique discernible information, so each user is free to create a UUID that does not conflict with others.

"UUID is essentially a 128bit number," which is a huge number of bits. Theoretically, the total number of UUID is 2 ^ 128. This figure can be estimated like this: if "1 trillion" of different UUID is produced "per nanosecond", it will take more than 10 billion years to use up all the UUID.

Variants and versions of UUID

When UUID standards and algorithms are defined, a variety of variants and versions are available for historical compatibility and future extensions. The next variants and version descriptions come from the Versions chapter in Wikipedia and the Variant chapter in RFC 4122.

The known variants are as follows:

Variants 0xx:Reserved, NCS backward compatibility, reserved for backward compatibility

Variant 10x:The IETF aka Leach-Salz variant (used by this class), which is called Leach-Salz UUID or a variant of UUID currently in use in IETF UUID,JDK

Variants 110:Reserved, Microsoft Corporation backward compatibility, Microsoft early GUID reserved variants

Variant 111:Reserved for future definition, a variant reserved for future expansion, which has not been used yet

The known versions are as follows:

Empty UUID (special version 0), represented by 00000000-0000-0000-0000-000000000000, that is, all bits are 0

Date-time and MAC address (version 1): based on the time and the version of the MAC address, it is obtained by calculating the current timestamp, random number, and machine MAC address. Because of the MAC address, this ensures its global uniqueness. However, if the MAC address is used, there will be a MAC address exposure problem. If it is a local area network, you can use the IP address instead.

Date-time and MAC address, DCE security version (version 2): secure UUID in distributed computing environment. The algorithm is basically the same as version 1, but the first four bits of the timestamp are replaced with POSIX's UID or GID.

Namespace name-based MD5 (version 3): obtained by calculating the MD5 hash values of names and namespaces. This version of UUID guarantees the uniqueness of UUID generated by different names in the same namespace, the uniqueness of UUID in different namespaces, and the repeated generation of UUID with the same name in the same namespace.

Random (version 4): generates UUID from random numbers, or pseudorandom numbers. The probability of repetition of this UUID can be calculated, and another feature is that 6 bits are reserved to store variants and version attributes, so there are 122 bits randomly generated, with a total of 2 ^ 122, which is less than the total amount of other variants.

Namespace name-based SHA-1 (version 5): similar to version 3, the hash algorithm is replaced by SHA-1

Among them, the variant applied in JDK is Leach-Salz, which provides two versions of UUID generation implementation of namespace name-based MD5 (version 3) and random (version 4).

Format of UUID

In the specification file description, UUID consists of 16 8-bit digits, or 32 characters in hexadecimal representation, generally 8-4-4-4-12, plus concatenation characters-a total of 36 characters, such as:

# # example 123e4567-e89b-12d3-a456-426614174000 # # General format xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx

M of 4 bit length and N of 1 to 3 bit length represent version number and variant identification, respectively. The specific layout of UUID is as follows:

Attribute name length (bytes) length (hexadecimal character) content time_low timestamp low 48 represents a low 32 bit integer of the time_mid timestamp 24 represents the middle 16 bits of the timestamp time_hi_and_version timestamp high bit and version number 24 high bit 4 bit is the version number The rest is a timestamped 12-bit integer representing the clock_seq_hi_and_res clock_seq_low clock sequence and the variant number 24. The highest bit 1 to 3 bits represents the variant number, and the remaining 13 to 15 bits represent the node ID represented by the clock sequence node node ID61248 bits.

Draw a picture based on this table:

"serious attention, repeat three times":

The specific layout of UUID mentioned above is only applicable to date-time and MAC address (version 1) and date-time and MAC address, DCE security version (version 2). Although other versions use basically the same field distribution, they cannot obtain information such as timestamp, clock sequence or node ID.

Only version 3 and version 4 implementations are provided in JDK, but the layout of java.util.UUID uses the fields in the table above

Calculation of collision probability of UUID

Although the total amount of UUID is huge, if you keep using it, assuming that more than 1 trillion UUID are generated per nanosecond and humans are lucky enough to reproduce for 10 billion years, it is always possible to produce repetitive UUID. So, how to calculate the collision rate of UUID? This is a mathematical problem that can be solved using the more famous "birthday paradox":

The above picture is from a search engine encyclopedia. It just so happens that Wikipedia gives the calculation process of collision probability, and its practical method is also the calculation method of birthday paradox. Here is a post:

The above collision probability calculation is based on Leach-Salz variants and version 4, and the conclusion is:

The probability of finding duplicates in 103 trillion UUID is 1/1000000000.

To generate a UUID with a conflict rate of 50%, you need to generate at least 2.71 * 1000000 ^ 3 UUID

There is no need to worry about UUID conflicts in your lifetime, which is less likely than large meteorites hitting the earth.

Usage scenarios of UUID

UUID can be used in almost all scenarios that require globally unique identifiers, unless there is an explicit limit on length. Common scenarios include:

Log Framework Mapping TRACE_ID in Diagnostic context

APM tool or SPAN_ID in OpenTracing specification

Database primary key or virtual foreign key in special scenarios

Deal ID (order ID)

Wait a minute.

Detailed introduction and use of UUID in JDK

Here is the introduction of how to use it. The variant mentioned earlier in JDK is Leach-Salz (variant 2), which provides two versions of UUID generation implementation: namespace name-based MD5 (version 3) and random (version 4). In fact, java.util.UUID provides four ways to generate UUID instances:

The most common is to call the static method UUID#randomUUID (), which is the static factory method of version 4

The second is to call the static method UUID#nameUUIDFromBytes (byte [] name), which is the static factory method of version 3

There is also a call to the static method UUID#fromString (String name), which is a static factory method that parses the 8-4-4-4-4-12 format string to generate a UUID instance

There is also a low-level constructor UUID (long mostSigBits, long leastSigBits), which is not common for users

The most commonly used method is the instance method toString (), which converts UUID into an 8-4-4-4-12 representation of hexadecimal strings, such as:

String uuid = UUID.randomUUID () .toString ()

Other Getter methods:

UUID uuid = UUID.randomUUID (); / / return version number int version = uuid.version (); / / return variant number int variant = timestamp (); / / return timestamp-this method will report an error. Only Time-based UUID, that is, the UUID implementation of version 1 or 2, can return the timestamp long timestamp = uuid.timestamp () / / return clock sequence-this method will report an error. Only the UUID implementation of Time-based UUID, version 1 or 2, can return clock sequence long clockSequence = uuid.clockSequence (); / / return node ID-this method will return error, only Time-based UUID, that is, version 1 or 2 UUID implementation can return node ID long nodeId = uuid.node ()

You can verify the version and variant number of different static factory methods:

UUID uuid = UUID.randomUUID (); int version = uuid.version (); int variant = uuid.variant (); System.out.println (String.format ("version:%d,variant:%d", version, variant)); uuid = UUID.nameUUIDFromBytes (new byte [0]); version = uuid.version (); variant = uuid.variant (); System.out.println ("version:%d,variant:%d", version, variant) / / output result version:4,variant:2 version:3,variant:2

Probe into the implementation of UUID Source Code in JDK

Java.util.UUID is modified by final to implement Serializable and Comparable interfaces. Generally speaking, there are the following specific features:

Immutable. Generally speaking, tool classes are defined like this.

Serializable and deserializable

Different objects can be compared, and the comparison method will be analyzed later

The following will analyze the source code implementation of java.util.UUID from different aspects:

Properties and constructors

Random number version implementation

Namespace name-based MD5 version implementation

Other implementations

Formatted output

A more relevant method

Properties and constructors

It has been repeatedly mentioned earlier that only version 3 and version 4 are provided in JDK, but the layout of java.util.UUID adopts the field definition of UUID specification, with a total length of 128bits, which happens to be stored in two integers of type long, so you can see that there are two integer values of type long in the UUID class:

Public final class UUID implements java.io.Serializable, Comparable {/ / omit other codes temporarily / * * The most significant 64 bits of this UUID. * High 64 bits in UUID * * @ serial * / private final long mostSigBits; / * * The least significant 64 bits of this UUID. * valid low 64 bits in UUID * * @ serial * / private final long leastSigBits; / / temporarily omit other code}

You can see from the comments of the UUID class that the specific field layout is as follows:

"layout of a high 64-bit mostSigBits"

Field bit length hex character length time_low328time_mid164version41time_hi123

"low 64-bit leastSigBits layout"

Field bit length hexadecimal character length variant2 is less than 1clock_seq14variant and clock_seq add up to 4node4812

Then take a look at the other member properties and constructors of UUID:

Public final class UUID implements java.io.Serializable, Comparable {/ / temporarily omit other code / / Java language access classes, which store a lot of underlying related access or conversion methods. In UUID, the toString () instance method is mainly used to format into the form of 8-4-4-4-12, delegated to the Long.fastUUID () method private static final JavaLangAccess jla = SharedSecrets.getJavaLangAccess () / / the static inner class ensures SecureRandom initialization for the UUID version of version 4 to generate the secure random number private static class Holder {static final SecureRandom numberGenerator = new SecureRandom ();} / / initializes the UUID instance private UUID (byte [] data) {long msb = 0; long lsb = 0 by calculating the values of mostSigBits and leastSigBits through a byte array of length 16 Assert data.length = 16: "data must be 16 bytes in length"; for (int item0; I position [19 time_high_and_version 22] formatUnsignedLong0 (lsb > 48, 4, buf, 19, 4); / / low 16 bits of msb converted to hexadecimal format written to buf-time_high_and_version = > position [14 no 17] formatUnsignedLong0 (msb, 4, buf, 14, 4) / / the 16-bit conversion of msb to hexadecimal format is written to buf-time_mid = > position [9meme12] formatUnsignedLong0 (msb > 16,4, buf, 9,4); / / the high 32-bit conversion of msb to hexadecimal format is written to buf-time_low = > position [0Magne7] formatUnsignedLong0 (msb > > 32,4, buf, 0,8) / / insert'- 'into the spare byte slot, which takes up exactly 4 bytes buf [23] =' -'; buf [18] ='-'; buf [13] ='-'; buf [8] ='-'; / / instantiates String based on the processed byte array, and the encoding is specified as LATIN1 return new String (buf, LATIN1) } else {byte [] buf = new byte [72]; formatUnsignedLong0UTF16 (lsb, 4, buf, 24,12); formatUnsignedLong0UTF16 (lsb > > 48,4, buf, 19,4); formatUnsignedLong0UTF16 (msb, 4, buf, 14,4); formatUnsignedLong0UTF16 (msb > > 16,4, buf, 9,4); formatUnsignedLong0UTF16 (msb > > 32,4, buf, 0,8) StringUTF16.putChar (buf, 23,'-'); StringUTF16.putChar (buf, 18,'-'); StringUTF16.putChar (buf, 13,'-'); StringUTF16.putChar (buf, 8,'-'); return new String (buf, UTF16) }} / * format the unsigned long integer and fill it into the byte buffer buf. If the length len exceeds the ASCII format representation of the input value, it will be filled with 0. * this method is to fill the input long integer value val, corresponding to a length of bits, into the byte array buf, and len controls the length of the written characters. Offset controls the starting position where buf is written * while the shift parameter determines the underlying format, 4 is hexadecimal, 1 is 2, and 3 is 8-bit * / static void formatUnsignedLong0 (long val, int shift, byte [] buf, int offset, int len) {int charPos = offset + len Int radix = 1 > > = shift;} while (charPos > offset);}

A more relevant method

The relevant methods of comparison are as follows:

/ / the hashCode method makes a difference between mostSigBits and leastSigBits or obtains an intermediate variable hilo, and then calculates public int hashCode () {long hilo = mostSigBits ^ leastSigBits; return ((int) (hilo > > 32)) ^ (int) hilo with a factor of 32. } / / equals is an example comparison method, which directly compares the mostSigBits and leastSigbits values of two UUID. If they are completely equal, return true public boolean equals (Object obj) {if ((null = = obj) | | (obj.getClass ()! = UUID.class)) return false; UUID id = (UUID) obj; return (mostSigBits = = id.mostSigBits & & leastSigBits = = id.leastSigBits) } / / the rule of comparison is that if the mostSigBits is high, and the high order is equal, the leastSigBits is large public int compareTo (UUID val) {/ / The ordering is intentionally set up so that the UUIDs / / can simply be numerically compared as two numbers return (this.mostSigBits)

< val.mostSigBits ? -1 : (this.mostSigBits >

Val.mostSigBits? 1: (this.leastSigBits

< val.leastSigBits ? -1 : (this.leastSigBits >

Val.leastSigBits? 1: 0));}

All comparison methods are only related to mostSigBits and leastSigBits, after all, these two long integers store all the information about the UUID instance.

At this point, the study of "the underlying implementation of UUID in JDK" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.