The implementation principle of Snowflake algorithm 05/04 Update SLTechnology News&Howtos

The implementation principle of Snowflake algorithm

2025-05-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly explains the "implementation principle of the Snowflake algorithm". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "the implementation principle of the Snowflake algorithm".

Premise

Snowflake (Snowflake) is Twitter's open source high-performance ID generation algorithm (service).

The image above is the Github repository of Snowflake. The REAEMDE file in the master branch indicates that the initial version was released in 2010, based on Apache Thrift, and released earlier than Finagle (where Finagle is the building module for RPC services on Twitter), while Snowflake, which is used internally by Twitter, is a completely rewritten program that runs largely on the existing infrastructure on Twitter.

The first version of the Snowflake source code released in 2010 was written in Scala and archived in the scala_28 branch. In other words, the original or improved version of the Snowflake algorithm you are using is already a product of ten years ago (it is now 2020), and it has to be said that this algorithm is really powerful. There are motivations and requirements for introducing the algorithm in the scala_28 branch. Here is a brief excerpt:

Motivation:

There is no tool for generating sequential ID in Cassandra. When Twitter shifts from using MySQL to using Cassandra, it needs a new way to generate ID (confirming that the architecture is not designed, but iterated based on business scenarios).

Request:

High performance: each process generates at least 10K ID per second, plus the network latency response speed should be within the 2ms.

Ordering: it can be sorted directly according to the self-increasing trend of time.

Compact: keep the length of the generated ID at 64 bit or less.

High availability: the ID generation solution needs to be as highly available as the storage service.

The following is the source code of Snowflake to analyze its implementation principle.

Brief introduction of Snowflake Scheme

The design scheme of Snowflake in the first edition is:

Time: 41 bit length, with millisecond precision, with a custom epoch, then it can be used for about 69 years.

Configurable machine ID:10 bit length, can meet the use of 1024 machines.

Sequence number: 12 bit length, which can be randomly selected among 4096 digits, thus preventing a single machine from generating duplicate sequence numbers within 1 ms.

However, in the actual source code implementation, Snowflake splits the 10 bit configurable machine ID into 5 bit Worker ID (which can be understood as the original machine ID) and 5 bit Data Center ID (data center ID). For more information, please see IdWorker.scala:

That is, support the configuration of up to 32 machine ID and up to 32 data center ID:

Since the algorithm is written in Scala language and depends on JVM, the returned ID value is of type Long, that is, an integer of 64 bit. The original algorithm generates a sequence with a length of only 63 bit, and the unsigned number is returned, so add a 0 in the high bit (occupying 1 bit), so the total length of the ID is 64 bit:

Where:

The value range of 41 bit millisecond timestamp is [0,2 ^ 41-1] = > 0 ~ 2199023255551, with a total of 2199023255552 digits.

The value range of 5 bit machine ID is: [0,2 ^ 5-1] = > 0-31, a total of 32 digits.

The value range of 5 bit data center ID is: [0,2 ^ 5-1] = > 0-31, a total of 32 digits.

The value range of 12 bit serial number is: [0,2 ^ 12-1] = > 0-4095, a total of 4096 digits.

Then theoretically, it is possible to generate 2199023255552 * 32 * 32 * 4096 completely different ID values.

The Snowflake algorithm also has an obvious feature: it depends on the system clock. 41 bit millisecond level of time comes from the system timestamp, so it is necessary to ensure that the system time is progressive and that clock callback cannot occur (generally speaking, it is not possible to generate multiple identical timestamps or past timestamps at the same time). Once a clock callback occurs, Snowflake refuses to generate the next ID.

Knowledge supplement of bit operation

A large number of bit operations are used in Snowflake algorithm. Since the complement of integers is the storage form in the computer, integers in Java or Scala are represented by complements, so here is a little bit about the knowledge of the original code and the complement.

The original code is used for reading and the complement code is used for calculation.

The complement of a positive number is the same as its original code.

The complement of a negative number is to reverse all the bits except the highest bit, and then add 1 (inverse code plus 1), and the complement of a negative number is restored to the original code in this way.

The source code of + 0 is 0000 0000, while the original code of-0 is 1000 0000. The complement has only one value of 0, which is expressed by 0000 0000, which is very important. The 0 of the complement has no ambiguity.

In a nutshell, it looks like this:

* [+ 11] original code = [0000 1011] complement = [0000 1011] * [- 11] original code = [1000 1011] complement = [1111 0101] * [- 11] complement calculation process: original code 1000 1011 in addition to the highest bit inverse 11110100 plus 111110101 (complement)

Use the original code, inverse code in the calculation is not necessarily an accurate value, but the use of complement when the calculation result is correct, just remember this conclusion, here is not an example. Because in the ID generation scheme of Snowflake, except for the highest bit, the other four parts are unsigned integers, so the four-part integers are more efficient to use complement for bit operation, and only in this way can we meet the original intention of Snowflake high-performance design. Several bit operations are used in the Snowflake algorithm: XOR (^), bitwise and (&), bitwise or (|), and signed left shift (

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.