In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article introduces the knowledge of "what are the differences between common distributed unique ID generation strategies". Many people will encounter this dilemma in the operation of actual cases, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
A brief analysis of the requirements
The so-called globally unique id often corresponds to the business requirements that generate unique record identities.
This id is often the primary key of the database, and a clustered index (cluster index) is established on the database, which is sorted by this field on the physical storage. The query on the record identification often has the business requirement of paging or sorting. So there is often a time field and a normal index (non-cluster index) is created on the time field.
The ordinary index stores the pointer of the actual record, and its access efficiency is slower than that of the clustered index. If the record identification can be generated in time order, the index query of this time field can be omitted.
This leads to two core requirements for record identity generation:
Globally unique
The trend is orderly
The first method to compare the advantages and disadvantages of common generation strategies is to use the auto_increment of the database to generate
Advantages:
This method uses the original functions of the database, so it is relatively simple.
Can guarantee uniqueness.
It can guarantee increment.
The step size between id is fixed and customizable
Disadvantages:
Availability is difficult to guarantee: the common architecture of the database is the separation of one master and multiple slaves + read and write, and the generation of self-increasing ID is a write request. If the master database is dead, it will not work.
Poor scalability and limited performance: because writing is a single point, the write performance of the database master library determines the upper limit of ID generation performance, and it is difficult to scale.
Improvement plan:
Redundant main library to avoid writing to a single point
Data is split horizontally to ensure that the ID generated by each main library is not duplicated.
As described in the figure above, from one write library to three write libraries, each write library sets a different initial value of auto_increment and the same growth step to ensure that the ID generated by each database is different. (in the figure above, DB 01 generates 0Pert 3, 6, 6, 9. , DB 02 generates 1, 4, 7, 10, DB 03, 2, 5, 5, 8, 11. )
The improved architecture ensures availability, but the drawback is that
Lost the "absolute increment" of ID generation: first visit DB 01 to generate 0Power3, and then visit DB 02 to generate 1, which may result in that ID generation is not absolutely incremental in a very short period of time (this problem is small, the goal is to increase trend, not absolute increment.
The writing pressure on the database is still great, and every time you generate an ID, you have to access the database.
In order to solve these problems, the following methods are proposed:
Method 2: single point batch ID generation service
One of the important reasons why distributed systems are difficult is that "without a global clock, it is difficult to guarantee absolute timing". In order to ensure absolute timing, we can only use a single point of service and use local clock to ensure "absolute timing".
The database writing pressure is great because the database is accessed every time the ID is generated, and the database writing pressure can be reduced in a batch way.
As shown in the figure above, the database uses dual master to ensure availability, and only the maximum value of the current ID, such as 4, is stored in the database.
The ID generation service assumes that each batch pulls 5 ID, and the service accesses the database and modifies the maximum value of the current ID to 4. In this way, the application accesses the ID generation service and requires that the ID,ID generation service does not need to visit the database each time. These ID can be distributed in turn.
When the ID is sent, the maximum value of the ID is changed to 11, and the ID of 6, 7, 8, 9, 10, 11 can be distributed again, so the pressure on the database is reduced to the original one.
Advantages:
The absolute incremental order of ID generation is guaranteed.
Greatly reduce the pressure on the database, ID generation can generate tens of thousands of thousands per second
Disadvantages:
The service is still a single point
If the service dies, and after the service is restarted, the ID generation may be discontinuous, with holes in the middle (the service memory holds 0meme 1, 2, 3, and 4, and the max-id in the database is 4. When the service is allocated to 3, the service is restarted. Next time, it will be allocated from 5, and 3 and 4 will become holes, but this is not a big problem.)
Although tens of thousands of ID can be generated per second, there is still a performance limit and cannot be scaled horizontally.
Improvement scheme
The common high availability optimization solution for single point service is "standby service", also known as "shadow service", so we can optimize the above shortcomings in the following ways:
As shown in the figure above, the external service is the main service, and a shadow service is always in a standby state, and it is on top of the shadow service when the main service is down. This switching process is transparent to the caller and can be done automatically, and the common technique is vip+keepalived. In addition, id generate service can scale horizontally to address the above shortcomings, but can cause consistency problems.
Method 3: uuid / guid
Whether the ID is generated through a database or through a service, the business side Application needs to make a remote call, which is time-consuming. Uuid is a common way to generate ID locally.
UUID uuid = UUID.randomUUID ()
Advantages:
Generate ID locally, no need for remote calls, and low latency
Good scalability, basically no performance limit
Disadvantages:
There is no guarantee of an increasing trend
Uuid is too long and is often expressed as a string. It is inefficient to build an index as a primary key. The common optimization scheme is "converted to two uint64 integer storage" or "half-cut storage" (uniqueness cannot be guaranteed after halving)
Method 4: take the current number of milliseconds
Uuid is a local algorithm with high generation performance, but it can not guarantee the increasing trend, and the retrieval efficiency of string ID is low. Is there a local algorithm that can guarantee increment? -taking the current number of milliseconds is a common solution.
Advantages:
Generate ID locally, no need for remote calls, and low latency
The trend of generated ID is increasing.
The generated ID is an integer, and the query efficiency is high after indexing.
Disadvantages:
If the concurrency exceeds 1000, a duplicate ID will be generated.
This disadvantage is fatal, and the uniqueness of ID cannot be guaranteed. Of course, using microseconds can reduce the probability of collision, but you can only generate a maximum of 1000000 ID per second, and any more you will definitely collide, so using microseconds does not fundamentally solve the problem.
Method 5: use Redis to generate id
When using a database to generate ID does not meet the performance requirements, we can try to use Redis to generate ID. This mainly depends on the fact that Redis is single-threaded, so you can also generate globally unique ID. This can be achieved using Redis's atomic operations INCR and INCRBY.
Advantages:
Depends on the database, flexible and convenient, and the performance is better than the database.
Digital ID natural sorting is very helpful for paging or results that need to be sorted.
Disadvantages:
If there is no Redis in the system, new components need to be introduced to increase the complexity of the system.
The amount of work that needs to be coded and configured is relatively large.
Method 6: Twitter open source Snowflake algorithm
Snowflake is twitter's open source distributed ID generation algorithm. Its core idea is a long ID:
41 bit as millisecond-41 bit length can be used for 69 years
10 bit as machine number (5 bit is data center, 5 bit machine ID)-10-bit length supports deployment of up to 1024 nodes
12 bit as a sequence number in milliseconds-12 bits count sequence number supports each node to generate 4096 ID sequence numbers per millisecond
Theoretically, the algorithm can generate up to 1000 * (2 ^ 12), that is, 400W ID per second, which can fully meet the needs of the business.
This is the end of the content of "what are the differences between common distributed unique ID generation strategies". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.