A tutorial on the Design of Index and Sub-Table for MySQL Mass Storage 04/21 Update SLTechnology News&Howtos

A tutorial on the Design of Index and Sub-Table for MySQL Mass Storage

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly explains the "MySQL mass storage index and sub-table design method tutorial", the article explains the content is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in-depth, together to study and learn "MySQL mass storage index and sub-table design method tutorial"!

1. What is the storage mode of InnoDB records?

It is well known that records are stored in primary key order in the InnoDB storage engine, and rely on this feature to create a primary key clustering index for the table.

How does InnoDB implement the "sequential storage" of records? First of all, you should know the order within and between pages, which is the basic unit of InnoDB internal and external memory exchange.

Inter-page order: a two-way linked list connection is used between pages in a disk file, and the pages may be physically ordered. In most cases, it is logically ordered.

Intra-page order: the records on the page use a single linked list to connect the records, so the page is logical and orderly, with the slot data structure to achieve the query efficiency of close to binary search.

The picture shows the spatial distribution within the InnoDB page:

Page Header

Based on the above characteristics, let's analyze the impact of using different primary keys on storage:

Self-increasing primary key: the primary key value is incremented, and the data is inserted sequentially, so the data is physically continuous within the page, and the next page is allocated sequentially after one page is filled. In the absence of a delete operation, the records of the entire table are stored continuously in the disk file in the order in which they are written. In this storage mode, disk utilization is very high, and random IO is very low. The insertion efficiency is quite high.

Business primary key: for example, the user table uses uid as the primary key and the commodity table uses infoId as the primary key. This meaningful primary key is called the business primary key. Obviously, the business primary key is not only unable to achieve physical continuity of records, but may also cause page splits when inserting data, resulting in intra-page fragmentation. For example, if a page space is full, the primary key value of 0,99100 pieces of data is stored. If you want to insert 55 this record, there is no room in the page, and you need to split into two pages to complete the insert operation, while the split two pages are very difficult to be full, which will cause intra-page fragments. Therefore, the business primary key is not as good as the self-increasing primary key in write performance and disk utilization.

From the above analysis, can we come to the conclusion that it must be good to use self-increasing primary keys? Before we finish analyzing InnoDB's index, it's a little early to draw conclusions.

What is the primary key index?

InnoDB automatically creates an index on the primary key of the table, and the data structure uses B+Tree. According to the characteristics of storage, the primary key index is also called clustered index. The index structure of the clustered index and the actual data are stored together, and the B+Tree leaf node stores the actual records, as shown in the figure:

Clustering index

What is a non-primary key index?

Now that records are stored in the primary key index structure, how do indexes created in other columns find records? We can naturally think that the index on the non-primary key column can first find the primary key value through its own index structure, and then find the corresponding record on the clustered index with the primary key value. That's what InnoDB does, so we also call an index on a non-primary key column a secondary index (because a query needs to find two index trees).

The secondary index has the following characteristics:

Indexes other than primary key indexes

Data in the leaf node of the index structure is the primary key value

A query needs to find two indexes: itself and the primary key.

4. What is a joint index?

A federated index is also called a multi-column index. The key of the index structure contains multiple fields. The first column is compared when sorting, then the second column is compared if the same, and so on. The federated index structure diagram is shown in the figure:

Joint index

Queries on federated indexes should meet the following characteristics:

Key starts searching on the far left, otherwise the index cannot be used.

Skipping intermediate columns will result in subsequent columns not being able to use indexes

A column usage range query is that subsequent columns cannot use an index.

According to the characteristics of the prefix index, the joint index can satisfy three kinds of queries: (a), (b) and (b).

5. Summary

Now that we understand the index of InnoDB, let's analyze the advantages and disadvantages of self-increasing primary key and business primary key:

Self-increasing primary key: write, query efficiency and disk utilization are all high, but two-level indexes are required for each query, because online businesses do not have queries that directly use primary key columns.

Business primary key: write, query efficiency and disk utilization are all low, but you can use a first-level index, depending on the overlay index. In some cases, it is also possible to complete a query by an index on a non-primary key index (which will be described in more detail in a later case).

The advantage of self-increasing primary key over business primary key in IO efficiency is almost negligible in SSD hard disk, while business primary key has obvious advantages in business query performance, so we use business primary key in business database.

VI. Design and practice of e-commerce business sub-table

According to the characteristics of MyQL database and its own business characteristics, a series of database usage specifications are developed, which can effectively guide the design of database tables and indexes in the process of project development of RD. The following describes the key design principles of tables and indexes in e-commerce business as well as two practical cases.

1. Table design principles

Primary key selection: previously, we have compared and analyzed the advantages and disadvantages of the business primary key and the self-increasing primary key, and concluded that the business primary key is more in line with the query needs of the business, while most Internet businesses are in line with the characteristics of reading more and writing less. So all online businesses use business primary keys.

Number of indexes: because too many indexes will cause the index file to be too large, no more than 5 indexes are required.

Column type selection: generally, the smaller and the simpler, the better. For example, BOOL fields use TINYINT, enumeration fields use TINYINT, and transaction amounts use LONG. Because BOOL and enumeration types can be easily expanded using TINYINT, for amount data, although InnoDB provides DECIMAL types that support accurate calculation, DECIMAL is a storage type, not a data type, and does not support CPU soundtrack calculation, so it will be less efficient, so we simply deal with converting decimals to integers and storing them with LONG.

Sub-table strategy: first of all, it should be clear that there are performance problems in the database generally after the amount of data reaches a certain extent! Therefore, we are required to make a good prediction in advance, do not wait for the need to split, generally control the amount of data of the table at the level of 10 million; there are two common sub-table strategies: take the model according to key, read and write evenly; divide by time, the hot and cold data is clear.

2. Actual cases

Case 1: user table design

The user table contains fields: uid,nickname,mobile,addr,image... .., switch;uid is the primary key, and there are two query requirements for business: uid and mobile, so you need to create an index on moblie.

The switch column is special, with a type of BIGINT, which is used to save the user's BOOL type properties, each of which can save an attribute of the user, such as whether we use the first bit to save whether to receive a push, whether the second bit saves offline messages, and so on.

This design is highly scalable (because BIGINT has 64 bits and can save 64 states, it is generally difficult to fill up), but it also brings some problems. Switch has a high query frequency. Because InnoDB is row storage, to find the query switch, you need to fetch the positive row data.

In view of the above scenario, what optimizations can we do in the table design? The common solution is to check the table vertically, which is very common and we don't discuss it too much.

There is another scheme that we can use the feature of InnoDB overlay index to create a joint index on the uid and switch columns, so that the values of uid and switch columns are included in the secondary index, so when querying switch with uid, we can find the switch value to be queried only through the second level, so we can find the switch value without accessing the record or even going to the leaf node of the secondary index.

There is another point to consider. It is conceivable that switch changes are also quite frequent. Does switch worth changing will lead to changes in federated indexes (here changes refer to index node splits or sequential adjustments)?

The answer is no! Because the first column uid of the federated index is unique and will not change, the uid has determined the order of the index, and the change of the switch column will only change the value of the second key on the index node, not the index structure.

Case 2: sub-table scheme of IM subsystem

The IM subsystem consists of four main business tables: users, contacts, cloud messages and system messages. The database is split by business, and each business uses a separate instance. Except for the system message table, all the other tables are made of uid and divided into 128tables according to the 128module. Because the service of the system message is special, its sub-table scheme is different from other services.

Let's first understand the business characteristics of the system message: the system message table stores the message of the notification type sent by the server. Since it is a notification, it will be effective. We stipulate that the system message is valid for 30 days. Therefore, in view of the above characteristics, we adopt the following sub-table scheme:

The system message table is divided into monthly tables, and each month's data is divided into 128 tables.

Consider a problem: when querying a person's system messages, because it is a monthly table, and most queries are cross-monthly (because you need to find messages within 30 days), you need two database interactions. Can it be optimized?

We can store redundant storage. The specific optimization scheme is as follows:

Write the current month and last month tables when inserting system messages

I started reading last month.

Redundant storage mode

In this solution, we can ensure that a query can find all the system messages within the validity period of the user, but it is not necessarily the optimal solution at the expense of storage space and writing efficiency, but it can be selected in business scenarios where the total amount of data is small and pays more attention to query performance.

VII. Summary

The performance of self-increasing primary key is not necessarily high, so it needs to be analyzed according to the actual business scenario.

Most scene data types are selected as simple as possible.

The more indexes, the better. Too many indexes will lead to too large index files.

If the data to be queried can be found in the index file, the storage engine does not look for the primary key index to access the actual record.

Thank you for your reading, the above is the content of the "MySQL mass storage index and sub-table design method tutorial". After the study of this article, I believe you have a deeper understanding of the MySQL mass storage index and sub-table design method tutorial, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.