Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

MySQL primary key design method

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

The following talk about the MySQL primary key design method, the secret of the text is close to the topic. So, no gossip, let's just read the following, I believe you will benefit from reading this article on the design method of MySQL primary key.

Why do you need a primary key

Data records need to be unique (first normal form)

Data needs to be associated with join

The underlying index of the database is used to retrieve the data.

The following nonsense can be skipped to the next section.

"Information is something used to eliminate random uncertainty" (Shannon). People distinguish different things by obtaining and identifying different information of nature and society, so as to understand and transform the world. Data is the record that reflects the attributes of objective things and the concrete form of information. After data is processed, it becomes information, and information needs to be digitally converted into data before it can be stored and transmitted. The database is used to store data records. Since this is the case, the record is the information with certainty (relative), and its certainty is unique. We come up with the first reason:

1. Data records need to be unique.

The world is made up of objective existence and its relations. Data is the relationship between digitization and modeling. In addition to its own descriptive value, the value of data also lies in its interrelationship. In order to achieve the accuracy of correlation, the data needs to be identified with external correlation. Therefore, reflected in the data storage, the secondary role of the primary key is also the second factor that exists, namely:

two。 Data needs to be associated

The data is used to describe the objective reality, which is meaningless in itself. Only after being organized according to subjective needs, the process of satisfying people's understanding of things in a certain way makes sense. So the data needs to be retrieved and organized. Then the third function of the primary key:

3. The underlying index of the database is used to retrieve the data.

Second, why the primary key should not be too long

The point of this problem lies in the long term. What are the advantages of being shorter than longer? (Hey, connotation)-- short does not take up space. But this amount of disk space is negligible compared to the overall amount of data, and we don't usually use primary key columns. Then the reason should be fast, and it has little to do with the original data. It is naturally related to the index and related to the index reading. So why do long primary keys affect performance in indexes?

Above is the index data structure of Innodb. On the left is the clustered index, which locates the data record through the primary key. On the right is the secondary index, which indexes the column data and finds the data primary key through the column data. If you query the data through the secondary index, the process is shown in the figure, first searching for the primary key from the secondary index tree, and then searching for data rows through the primary key on the clustered index. The leaf node of the secondary index is the primary key value stored directly, not the primary key pointer. So if the primary key is too long, a secondary index tree can store fewer index records, so that in the limited index buffer, there will be more disk reads, so performance will be degraded.

Why it is recommended to use self-increasing ID

InnoDB uses a clustered index, and as shown in the figure above, the data record itself is stored on the leaf node of the primary index (a B+Tree). This requires that the data records within the same leaf node (the size of a memory page or disk page) are stored in primary key order, so every time a new record is inserted, MySQL will insert it into the appropriate node and location according to its primary key, and if the page reaches the load factor (InnoDB default is 15 InnoDB 16), a new page (node) will be opened.

If the table uses a self-incrementing primary key, each time a new record is inserted, the record is sequentially added to the subsequent position of the current index node, and when a page is full, a new page is automatically opened. In this way, a compact index structure is formed, which is filled in approximate order. Because there is no need to move existing data each time you insert, it is efficient and does not add much overhead to maintaining the index, as shown on the left side of the following figure. Otherwise, because the value of each inserted primary key is approximately random, each new record has to be inserted somewhere in the middle of the existing index page, and MySQL has to move the data in order to insert the new record into the appropriate position, as shown on the right side of the following figure, which results in some overhead. Because of this, Mysql may need to refresh buffers frequently to maintain the index, increasing the number of method disk IO, and often reorganizing the index structure.

4. Business Key VS logic Key

Business Key, even if you use the business meaningful id as the Key, such as using the order serial number as the primary key Key of the order table. Logical Key, that is, Key that has nothing to do with business, generates Key according to certain rules, such as self-increasing Key.

Benefits of Business Key

Key is of business significance and can be directly used as a search keyword when querying.

No additional column and index space is required

Some join operations can be reduced.

Disadvantages of Business Key

When the business changes, sometimes it is necessary to change the primary key

It is difficult to operate when multiple columns of Key are involved

Business Key tends to be longer and takes up more space, resulting in larger disk IO

Data cannot be persisted before Key is confirmed. Sometimes we want to add a record before confirming the data Key, and then update the business Key.

It is difficult to design a Key generation scheme with both ease of use and performance.

Advantages of logical Key

The Key logic will not need to be modified because of business changes.

Easy to operate and easy to manage

Logical Key tends to be smaller and has better performance

Logical Key is easier to guarantee uniqueness.

Easier to optimize

Disadvantages of logical Key

Additional disk space is required to query primary key columns and primary key indexes

Additional IO is required when inserting and updating data

More join may be

If there is no unique policy restriction, it is easy to have duplicate Key.

The Key of the test environment is inconsistent with the formal environment, which is not conducive to troubleshooting

The value of Key is not associated with the data and does not conform to the three paradigms.

Cannot be used to search for keywords

Relying on the specific implementation of different database systems is not conducive to the replacement of the underlying database.

Fifth, primary key generation

In general, we use Mysql's self-increasing ID as the primary key of the table, which is as simple as this, and from the point of view of the above, the performance is the best. However, in the case of sub-database and sub-table, self-increasing ID can not meet the demand. We can take a look at how different databases generate ID, as well as some distributed ID generation scenarios. It is helpful for us to think and even implement our own distributed ID generation service.

The realization of database

Mysql self-increasing

Mysql maintains a self-incrementing counter in memory, and each time the auto-increment counter is accessed, InnoDB adds a lock called AUTO-INC until the end of the statement (note that the lock is held until the end of the statement, not the end of the transaction). The AUTO-INC lock is a special table-level lock used to improve the concurrency of auto_increment columns.

In a distributed case, id generation can be done independently of a service and database, and it still relies on the table id self-increasing capability of Mysql to generate id for third-party services. Different tables can be used for different businesses for performance reasons.

Mongodb ObjectId

In order to prevent primary key conflicts, Mongodb designs an ObjectId as primary key id. It consists of a 12-byte hexadecimal number that contains the following parts:

Time: timestamp. 4 bytes. Seconds.

Machine: machine identification. 3 bytes. It is usually the hash value of the machine hostname, which ensures that different hosts generate different machine hash values, ensuring that there is no conflict in the distribution, and the values of the same machine are the same.

PID: process ID. 2 bytes. The above Machine is to ensure that the objectId generated on different machines does not conflict, while pid is to ensure that the objectId generated by different mongodb processes on the same machine does not conflict.

INC: self-increment counter. 3 bytes. The first nine bytes ensure that the objectId generated by different processes on different machines do not conflict within a second. Self-increment counters are used to ensure that the objectId generated in the same second will not find conflicts, allowing the third power of 256to equal the uniqueness of 16777216 records.

Cassandra TimeUUID

Cassandra generates a unique id:time + MAC + sequence using the following rules

Scheme

Zookeeper self-increment: realized through the self-increment mechanism of zk.

Redis self-increment: realized through the self-increment mechanism of Redis.

UUID: use the UUID string as the Key.

Snowflake algorithm: similar to the implementation of Mongodb, 1-bit symbol bit + 41-bit timestamp (millisecond) + 10-bit data machine bit + 12-bit millisecond sequence.

Open source implementation

Baidu UidGenerator: based on snowflake algorithm.

Meituan Leaf: the mechanism based on Mysql self-increment (optimization) and snowflake algorithm is implemented at the same time.

For the above MySQL primary key design method related to the content, is there anything you do not understand? Or if you want to know more about it, you can continue to follow our industry information section.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report