In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the knowledge of "what is the method of generating java distributed ID". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
I. the origin of demand
Almost all business systems have the need to generate a unique record identity, such as:
Message ID: message-id
Order identification: order-id
Post logo: tiezi-id
This record ID is often the primary key in the database, and a clustered index (cluster index) is established on the database, which is sorted by this field on the physical storage.
The query on the record identity often has business requirements for paging or sorting, such as:
Pull the latest page of news
Select message-id/ order by time/ limit 100
Pull the latest page of the order
Select order-id/ order by time/ limit 100
Pull the latest page of the post
Select tiezi-id/ order by time/ limit 100
So there is often a time field and a normal index (non-cluster index) is created on the time field.
An ordinary index stores the pointer of the actual record, and its access efficiency is slower than that of the clustered index. If the record identification can be generated in a chronological order, you can omit the index query of this time field:
Select message-id/ (order by message-id) / limit 100
It is emphasized that this can be done on the premise that the generation of message-id is basically increasing trend time.
This leads to two core requirements for record identity generation (that is, the three XXX-id mentioned above):
Globally unique
The trend is orderly
This is also the core issue to be discussed in this article: how to efficiently generate a global unique ID with orderly trends.
Second, common methods, shortcomings and optimization
Method 1: use the auto_increment of the database to generate a globally unique incremental ID
Advantages:
Simple, use the existing functions of the database
Can guarantee uniqueness.
It can guarantee increment.
Fixed step size
Disadvantages:
Availability is difficult to guarantee: the common architecture of the database is the separation of one master and multiple slaves + read and write, and the generation of self-increasing ID is a write request, and the master database cannot be played when the master database is dead.
Poor scalability and limited performance: because writing is a single point, the write performance of the database master library determines the upper limit of ID generation performance, and it is difficult to scale.
Improvement methods:
Redundant main library to avoid writing to a single point
Data is split horizontally to ensure that the ID generated by each main library is not duplicated.
As described in the figure above, from one write library to three write libraries, each write library sets a different initial value of auto_increment and the same growth step to ensure that the ID generated by each database is different. Library 1 generates 1, 4, 7, 7, 10, 2, 2, 5, 5, 5, 5, 8, 11. )
The improved architecture ensures availability, but the disadvantages are:
Lost the "absolute increment" of ID generation: first access library 0 to generate 0Power3, and then access library 1 to generate 1, which may cause ID generation to not be absolutely incremental in a very short period of time (this problem is not big, the goal is to increase trend, not absolute increment)
The writing pressure on the database is still great, and every time you generate an ID, you have to access the database.
In order to solve the above two problems, the second common scheme is introduced.
Method 2: single point batch ID generation service
One of the important reasons why distributed systems are difficult is that "without a global clock, it is difficult to guarantee absolute timing". In order to ensure absolute timing, we can only use a single point of service and use local clock to ensure "absolute timing".
The database writing pressure is great because the database is accessed every time the ID is generated, and the database writing pressure can be reduced in a batch way.
As shown in the figure above, the database uses dual master to ensure availability, and only the maximum value of the current ID, such as 0, is stored in the database.
The ID generation service assumes that each batch pulls 6 ID, and the service accesses the database and modifies the maximum value of the current ID to 5. In this way, the application accesses the ID generation service and requires that the ID,ID generation service does not need to visit the database each time. These ID can be distributed in turn.
When the ID is sent, the maximum value of the ID is changed to 11, and the ID of 6, 7, 8, 9, 10, 11 can be distributed again, so the pressure on the database is reduced to the original one.
Advantages:
The absolute incremental order of ID generation is guaranteed.
Greatly reduce the pressure on the database, ID generation can generate tens of thousands of thousands per second
Disadvantages:
The service is still a single point
If the service dies, and after the service is restarted, the ID generation may be discontinuous, with holes in the middle. (the service memory holds 0meme 1, 2, 3, 4 and 5, and the max-id in the database is 5. When the service is allocated to 3, the service is restarted. Next time, it will be allocated from 6, and 4 and 5 will become holes, but this is not a big problem.)
Although tens of thousands of ID can be generated per second, there is still a performance limit and cannot be scaled horizontally.
Improvement methods:
The common high availability optimization solution for single point service is "standby service", also known as "shadow service", so we can optimize the above shortcomings in the following ways:
As shown in the figure above, the external service is the main service, and a shadow service is always in a standby state, and it is on top of the shadow service when the main service is down.
This switching process is transparent to the caller and can be done automatically. The common technique is vip+keepalived, which is not carried out here.
In addition, ID-gen-service can also implement horizontal scaling to address the above shortcomings (3), but can cause consistency problems, as detailed in "."
Method 3: uuid/guid
Whether the ID is generated through a database or through a service, the business side Application needs to make a remote call, which is time-consuming.
Is there a local way to generate ID with high performance and low latency?
Uuid is a common scenario:
String ID = GenUUID ()
Advantages:
Generate ID locally, no need for remote calls, and low latency
Good scalability, basically no performance limit
Disadvantages:
There is no guarantee of an increasing trend
Uuid is too long and is often expressed as a string. It is inefficient to build an index as a primary key. The common optimization scheme is "converted to two uint64 integer storage" or "half-cut storage" (uniqueness cannot be guaranteed after halving)
Method 4: take the current number of milliseconds
Uuid is a local algorithm with high generation performance, but it can not guarantee the increasing trend, and the retrieval efficiency of string ID is low. Is there a local algorithm that can guarantee increment?
Taking the current number of milliseconds is a common scenario:
Uint64 ID = GenTimeMS ()
Advantages:
Generate ID locally, no need for remote calls, and low latency
The trend of generated ID is increasing.
The generated ID is an integer, and the query efficiency is high after indexing.
Disadvantages:
If the concurrency exceeds 1000, a duplicate ID will be generated.
This disadvantage is fatal, and the uniqueness of ID cannot be guaranteed. Of course, using microseconds can reduce the probability of collision, but you can only generate a maximum of 1000000 ID per second, and any more you will definitely collide, so using microseconds does not fundamentally solve the problem.
Method 5: snowflake-like algorithm
Snowflake is twitter's open source distributed ID generation algorithm. Its core idea is a long ID:
41bit as millisecond
10bit as the machine number
12bit as the serial number in milliseconds
Theoretically, the algorithm can generate up to 1000 * (2 ^ 12), that is, 400W ID per second, which can fully meet the needs of the business.
Using the idea of snowflake for reference, combined with the business logic and concurrency of each company, we can realize our own distributed ID generation algorithm.
For example, suppose that the requirements for a company's ID generator service are as follows:
The peak concurrency of single machine is less than 1W, and it is estimated that the peak concurrency of single machine will be less than 10W in the next 5 years.
There are 2 computer rooms, and it is expected that the number of computer rooms will be less than 4 in the next 5 years.
The number of machines per computer room is less than 100.
At present, there are 5 business lines with ID generation requirements, and the number of business lines is expected to be less than 10 in the future.
...
The analysis process is as follows:
Take the number of milliseconds from January 1, 2017 to the present (assuming that the system ID generator service comes online after this time). Assuming that the system has been running for at least 10 years, it will take at least 10 years, 365 days, 24 hours, 3600 seconds, 1000 milliseconds = 32010 ^ 9, almost reserved 39bit for milliseconds.
The peak concurrency per second is less than 10W, that is, the average peak concurrency per millisecond is less than 100. almost reserve 7bit for the serial number per millisecond.
The number of computer rooms is less than 4 in 5 years, and 2bit is reserved for computer room identification.
There are less than 100 machines in each computer room, and 7bit is reserved for the server identification in each computer room.
The number of business lines is less than 10, and 4bit is reserved for identification of business lines.
The 64bit logo designed in this way can guarantee:
The ID generated by each business line, each computer room, and each machine is different.
The ID generated in each millisecond is different on the same machine.
The same machine, within the same millisecond, is distinguished by the sequence number area to ensure that the generated ID is different.
Put the number of milliseconds at the highest level to ensure that the generated ID is trend increasing.
Disadvantages:
Because there is "no global clock", the ID allocated by each server is absolutely incremental, but globally, the generated ID is only an increasing trend (some servers are early, some servers are late).
This is the end of the content of "what is the java distributed ID generation method". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.