In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article shows you a sample analysis of distributed ID on the Internet, which is concise and easy to understand, which will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.
ID is the unique identification of data, the traditional practice is to use UUID and database self-increasing ID, in Internet enterprises, most companies use Mysql, and because of the need for transaction support, they usually use Innodb storage engine, UUID is too long and disordered, so it is not suitable to be the primary key in Innodb, self-increasing ID is more appropriate, but with the development of the company's business, the amount of data will become larger and larger. The data needs to be divided into tables, and after the table is divided, the data in each table will increase at its own pace, and ID conflicts are likely to occur. At this point, a separate mechanism is required to generate a unique ID, which can also be called a distributed ID, or a global ID. Let's analyze the various mechanisms for generating distributed ID.
This article will not be analyzed in particular detail, mainly to do some summary, and then come out some detailed articles about a plan in the future.
Database self-increasing ID
The first scenario is still a database-based self-incrementing ID, which requires a separate database instance in which a separate table is created:
The table structure is as follows:
CREATE DATABASE `SEQID`; CREATE TABLE SEQID.SEQUENCE_ID (id bigint (20) unsigned NOT NULL auto_increment, stub char (10) NOT NULL default'', PRIMARY KEY (id), UNIQUE KEY stub (stub)) ENGINE=MyISAM
You can generate and get a self-incrementing ID using the following statement
Begin;replace into SEQUENCE_ID (stub) VALUES ('anyword'); select last_insert_id (); commit
The stub field does not have any special meaning here, just to facilitate the insertion of data, only data can be inserted to generate self-increasing id. For insertion, we use replace,replace to see if there is data with the same value as stub, delete then insert if it does, and insert directly if it doesn't exist.
This mechanism of generating distributed ID requires a separate Mysql instance, which is feasible, but it is not enough based on performance and reliability. Every time a business system needs an ID, it needs to request database access, which is low in performance. And if this database instance is offline, it will affect all business systems.
In order to solve the problem of database reliability, we can use the second distributed ID generation scheme.
Database multi-master mode
If our two databases form a master-slave mode cluster, the problem of database reliability can be solved normally, but if the data is not synchronized to the slave database in time after the master database dies, there will be ID duplication. We can use a dual-main mode cluster, that is, two Mysql instances can separately produce self-increasing ID, which can improve efficiency, but without other modifications, the two Mysql instances are likely to generate the same ID. Each Mysql instance needs to be configured with a different starting value and auto-increment step size separately.
The configuration of the first Mysql instance:
Set @ @ auto_increment_offset = 1;-- starting value set @ @ auto_increment_increment = 2;-- step size
Configuration of the second Mysql instance:
Set @ @ auto_increment_offset = 2;-- starting value set @ @ auto_increment_increment = 2;-- step size
After the above configuration, the id sequences generated by the two Mysql instances are as follows: mysql1 with a starting value of 1 and a step size of 2. The sequence generated by the ID is: 1, 3, 5, 7 and 9. Mysql2, with a starting value of 2 and a step size of 2. The sequence generated by the Magi ID is: 2, 4, 6, 8, 10, 10.
For this scheme of generating distributed ID, you need to add a separate generation distributed ID application, such as DistributIdService, which provides an interface for business applications to obtain ID. When business applications need an ID, request DistributIdService,DistributIdService to randomly go to the above two Mysql instances to obtain ID through rpc.
After implementing this scheme, even if one of the Mysql instances goes offline, it will not affect that DistributIdService,DistributIdService can still use another Mysql to generate ID.
However, the scalability of this solution is not very good, and it will be troublesome if two Mysql instances are not enough and you need to add new Mysql instances to improve performance.
Now what do you do if you want to add an instance of mysql3? First, the step size of mysql1 and mysql2 must be changed to 3, and it can only be modified manually, which takes time. Second, because mysql1 and mysql2 are constantly increasing, we may have to set the starting value of mysql3 a little larger to give sufficient time to modify the step size of mysql1,mysql2. Third, duplicate ID is likely to occur when changing the step size, and downtime may be required to solve this problem.
In order to solve the above problems, and to further improve the performance of DistributIdService, if you use the third generation distributed ID mechanism.
Number segment mode
We can use the number segment to obtain the self-increasing ID, which can be understood as batch acquisition. For example, when DistributIdService acquires ID from the database, if multiple ID can be obtained in batches and cached locally, it will greatly improve the efficiency of business applications in obtaining ID.
For example, every time DistributIdService acquires ID from the database, it gets a number range, such as (1 ID 1000]. This range represents 1000 ID. When a business application requests DistributIdService to provide ID, the DistributIdService only needs to be incremented and returned from 1 locally, instead of requesting the database every time. It is not until the local number is increased to 1000, that is, when the current number range has been used up, it goes to the database to retrieve the next number.
Therefore, we need to make changes to the database table as follows:
CREATE TABLE id_generator (id int (10) NOT NULL, current_max_id bigint (20) NOT NULL COMMENT 'current maximum id', increment_step int (10) NOT NULL COMMENT' segment length', PRIMARY KEY (`id`) ENGINE=InnoDB DEFAULT CHARSET=utf8
This database table is used to record the self-increment step and the maximum value of the current self-increment ID (that is, the last value of the number segment currently requested), because the self-increment logic has been moved to the DistributIdService, so the database does not need this part of the logic.
This solution is no longer strongly dependent on the database, and DistributIdService can continue to support it for some time even if the database is not available. But if DistributIdService restarts, a piece of ID will be lost, resulting in a hole in the ID.
In order to improve the high availability of DistributIdService, it is necessary to build a cluster. When the business requests the DistributIdService cluster to obtain the ID, it will randomly select a certain DistributIdService node to obtain the ID. For each DistributIdService node, the database is connected to the same database, so multiple DistributIdService nodes may request the database to obtain the number range at the same time. At this time, you need to use optimistic locks to control, such as adding a version field to the database table. Use the following SQL when obtaining the number range:
Update id_generator set current_max_id=# {newMaxId}, version=version+1 where version= # {version}
Because the newMaxId is calculated according to the oldMaxId+ step size in DistributIdService, as long as the above update update is successful, it means that the number range has been obtained successfully.
In order to provide the high availability of the database layer, we need to deploy the database using multi-master mode. For each database, to ensure that the generated number range does not repeat, we need to use the original idea to add the starting value and step size in the database table. For example, if there are two Mysql now, then mysql1 will generate the number segment (1mu 1001), and the self-increasing time sequence is 1pm, 3pm, 4pm, 5pm 7. Mysql1 will generate a number segment (2mem1002), and the self-increasing time sequence will be 2meme4, 6, 8, 10.
For more details, please refer to Didi's open source TinyId: https://github.com/didi/tinyid/wiki/tinyid%E5%8E%9F%E7%90%86%E4%BB%8B%E7%BB%8D.
Another step has been added to TinyId to improve efficiency. In the above implementation, the self-increasing logic of ID is implemented in DistributIdService, but in fact, the self-increasing logic can be transferred to the local business application, so that for business applications, you only need to obtain the number range, and you no longer need to call DistributIdService for each self-increment.
Snowflake algorithm
The above three methods are generally based on the idea of self-increment, and then we will introduce the more famous snowflake algorithm-snowflake.
We can think about distributed ID in another way, as long as we can get each machine responsible for generating distributed ID to generate a different ID every millisecond.
Snowflake is twitter's open source distributed ID generation algorithm, which is an algorithm, so it is different from the above three distributed ID generation mechanisms, it does not rely on the database.
The core idea is: the distributed ID is always a long-shaped number, and a long-type occupies 8 bytes, that is, 64 bit. The allocation of bit in the original snowflake algorithm is shown below:
The first bit bit is the identification part. In java, because the highest bit of long is the symbol bit, the positive number is 0, the negative number is 1, and the generated ID is generally positive, so it is fixed at 0.
The timestamp part accounts for 41bit, which is a millisecond time. Generally, the implementation does not store the current timestamp, but the difference of the timestamp (current time-fixed start time), so that the resulting ID can start at a smaller value. The 41-bit timestamp can be used for 69 years, (1L set seq_id 1 / / initialize self-increment ID to 1OK127.0.0.1:6379 > incr seq_id / / add 1, and return (integer) 2127.0.0.1 1OK127.0.0.1:6379 6379 > incr seq_id / / increase 1, and return (integer) 3
The efficiency of using redis is very high, but persistence should be considered. Redis supports both RDB and AOF persistence.
RDB persistence is equivalent to taking a snapshot regularly for persistence. If the snapshot is added several times in a row and the next snapshot is persisted before the next snapshot is persisted, the Redis will hang up and the ID will repeat after restarting Redis.
AOF persistence is equivalent to the persistence of each write command. If the Redis is down, there will be no ID repetition, but it will take too long to restart and recover data due to the excessive incr command.
The above is a sample analysis of distributed ID on the Internet. have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.