Explanation of mysql database sub-database and sub-table 07/19 Update SLTechnology News&Howtos

Explanation of mysql database sub-database and sub-table

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "mysql database sub-database sub-table explanation". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Today I will introduce the common distributed storage method "sub-library sub-table", including "sub-library" and "sub-table" two categories.

Business sub-database

Business sub-database refers to the distribution of data to different database servers according to business modules. For example, a simple e-commerce website, including three business modules: users, goods and orders, we can put user data, commodity data and order data on three different database servers separately. Instead of putting all the data on one database server.

Although the business sub-library can disperse the storage and access pressure, it also brings new problems. I will analyze it in detail next.

1.join operation problem

After the business is divided into databases, the tables originally in the same database are scattered to different databases, resulting in the inability to use SQL's join query.

For example, the function of "query the list of female users who have purchased cosmetics". Although the ID information of the user is in the order data, the gender data of the user is in the user database. If in the same database, a simple join query can be completed. But now the data is scattered in two different databases, so it is impossible to do join query, so we can only query the ID list of users who have purchased cosmetics from the order database, and then query the list of female users in the ID of these users in the user database, which is more complicated than a simple join query.

two。 Transaction problem

Originally, different tables in the same database can be modified in the same transaction. After the business is divided into different databases, the tables can not be modified uniformly through transactions. Although database vendors provide some solutions for distributed transactions (for example, MySQL's XA), performance is too low and runs counter to the goal of high-performance storage.

For example, users need to deduct commodity inventory when placing an order. If the order data and product data are in the same database, we can use transactions to ensure that the operations of deducting commodity inventory and generating orders are either successful or unsuccessful. However, database transactions cannot be used after the database is divided, and the business program itself is required to simulate the transaction function. For example, if the goods inventory is deducted first and the order is generated after the deduction is successful, if the order generation fails due to an exception in the order database, the business program needs to add the product inventory; and if the order generation fails due to the exception of the business program itself, the commodity inventory cannot be recovered, and the inventory exception needs to be repaired manually through logs and other methods.

3. Cost problem

At the same time, the business sub-library also brings the cost. What was originally done by one server, now there are three. If you consider backup, that is, two have become six.

For these reasons, it is not recommended to split a small start-up business in the first place for several reasons:

There is a lot of uncertainty in the start-up business, the business may not develop, there is no real storage and access pressure at the beginning of the business, and the business sub-library can not bring value to the business.

After the business is divided into libraries, the join query and database transactions between tables can not be easily realized.

After the business sub-database, because different data needs to read and write different databases, the code needs to add the logic of mapping to different databases according to the data type, which increases the workload. The most important thing during the business start-up period is rapid implementation and rapid verification, and the business sub-library will slow down the pace of the business.

Some architects may think: if the business is really growing fast, won't there be business repositories soon? Then why not design it in the first place?

In fact, this question is easy to answer, according to the "three principles of architectural design" I mentioned earlier, a brief analysis.

First of all, the probability of "what if" here is actually relatively low, it is very good to do 10 businesses and have one business to survive, not to mention the rapid development, which is similar to the probability of winning the lottery. If we design the architecture according to the scale of Taobao and Wechat, we will not only tire ourselves to death, but also kill the business.

Secondly, if the business is really developing rapidly, it is not too late to carry out the business sub-library. Because the business develops well, the corresponding resource investment will be increased, and more people and more money can be invested, and the code and business complex problems brought about by the business sub-library can be solved by increasing the number of people. the cost problem can also be solved by increasing funds.

Third, the performance of a single database server is actually not as weak as imagined. Generally speaking, a single database server can support a business of the order of magnitude of 100000 users, and the start-up business has grown from 0 to 100,000 users. It's not that fast.

For large mature companies in the industry, because they already have mature solutions for business sub-libraries, and even for tentative new businesses, the number of users is massive, which is essentially different from the small start-up companies mentioned above. Therefore, it is best to consider business sub-libraries at the beginning of business design. For example, to do a new business on Taobao, because there is already a mature database solution and a large number of users, it is necessary to design the business sub-library and even the sub-table scheme introduced next.

Sub-table

The decentralized storage of different business data to different database servers can support the business of millions or even tens of millions of users, but if the business continues to develop, the single table data of the same business will also reach the bottleneck of a single database server. For example, hundreds of millions of user data on Taobao, if all stored in a single table of a database server, will certainly not be able to meet the performance requirements, so it is necessary to split the single table data.

There are two ways to split single table data: vertical table and horizontal table. The schematic diagram is as follows:

Cymbals

To vividly understand the difference between a vertical split and a horizontal split, you can imagine cutting a cake with a knife in your hand:

Cutting from top to bottom is vertical slicing, because the path of the knife is perpendicular to the cake, so that the cake can be cut into two parts of equal height (the area can be equal or not). The corresponding split to the table is that the number of table records is the same but contains different columns. For example, the vertical split in the diagram splits the table into two tables, one containing ID, name, age, sex columns, and the other containing ID, nickname, and description columns.

Cutting from left to right is horizontal, because the path of the knife is parallel to the cake, so you can cut the cake into two parts with the same area (height can be equal or not). The corresponding split to the table is that the columns of the table are the same but contain different row data. For example, the horizontal split in the diagram divides the table into two tables, both of which contain ID, name, age, sex, nickname, and description columns, but one table contains row data with ID from 1 to 999999, and the other table contains row data with ID from 1000000 to 9999999.

The above example is relatively simple. Only one split is considered. In the actual architecture design process, the number of segmentation is not limited, it can be cut twice or many times, just like cutting a cake, you can cut a lot of knives.

After the segmentation of a single table, whether the split tables should be scattered in different database servers can be determined according to the actual segmentation effect, and it is not mandatory that a single table be divided into different databases after being divided into multiple tables. The reason is that after a single table is divided into multiple tables, the new table may bring considerable performance improvement even in the same database server. If the performance can meet the business requirements, it can not be split into multiple database servers. After all, in the content of the business sub-library above, we can see that the business sub-library will also introduce a lot of complexity problems. If a single table is divided into multiple tables, a single server still can not meet the performance requirements, then we have to design the business sub-database again.

Sub-tables can effectively disperse storage pressure and improve performance, but like sub-libraries, it also introduces all kinds of complexity.

1. Vertical subtable

A vertical split table is suitable for splitting out some of the less commonly used columns in the table that take up a lot of space. For example, in the nickname and description fields in the previous diagram, suppose we are a dating website. When screening other users, users mainly use age and sex fields to query, while nickname and description fields are mainly used for display and are not generally used in business queries. Description itself is relatively long, so we can separate these two fields into another table, which can bring some performance improvement when querying age and sex.

The complexity introduced by vertical partitioning is mainly reflected in the increase in the number of table operations. For example, it used to take only one query to get name, age, sex, nickname, description, but now you need two queries, one for name, age, sex, and the other for nickname, description.

However, compared with the following horizontal sub-table, this complexity is negligible.

two。 Horizontal subtable

Horizontal sub-table is suitable for tables with a particularly large number of table rows. Some companies require that a table with more than 50 million rows must be divided into tables. This number can be used as a reference, but it is not an absolute standard. The key is to look at the access performance of the table. For some more complex tables, it may be divided into tables when it exceeds 10 million, while for some simple tables, they can be divided even if they store more than 100 million rows of data. In any case, when you see a table with tens of millions of data, an architect should be vigilant, because this is likely to be a performance bottleneck or pitfall of the architecture.

Horizontal sub-table introduces more complexity than vertical sub-table, mainly in the following aspects:

Routin

After horizontal subtable, a certain data belongs to which subtable after segmentation, which needs to be calculated by adding a routing algorithm, which will introduce some complexity.

Common routing algorithms are:

Range routing: select ordered data columns (for example, shaping, timestamp, etc.) as conditions for routing, and different segments are scattered into different database tables. Take the most common user ID as an example, the routing algorithm can be segmented according to the size of 1000000, 1-999999 in database 1 table, 1000000-1999999 in database 2 table, and so on.

The complexity of range routing design is mainly reflected in the selection of segment size. Too small segmentation will lead to too many child tables after segmentation, which will increase the maintenance complexity; too large segmentation may lead to performance problems in a single table. It is generally recommended that the segment size is between 1 million and 20 million, and the appropriate segment size needs to be selected according to the business.

The advantage of range routing is that the new table can be smoothly expanded as the data increases. For example, the current user is 1 million, if increased to 10 million, only need to add a new table, the original data does not need to be moved.

One of the hidden disadvantages of range routing is the uneven distribution. If the table is divided according to 10 million, it is possible that only 1000 pieces of data are actually stored in one segment, while 9 million pieces of data are actually stored in another segment.

Hash routing: select the values of a column (or a combination of several columns) for Hash operation, and then distribute them into different database tables according to the Hash results. Also take user ID as an example, if we plan 10 database tables at the beginning, the routing algorithm can simply use the value of user_id% 10 to represent the database table number to which the data belongs. The user with ID 985 is placed in the subtable numbered 5, and the user with ID 10086 is placed in the word table numbered 6.

The complexity of Hash routing design is mainly reflected in the selection of the number of initial tables, too many tables are troublesome to maintain, and too few tables may lead to problems in the performance of a single table. After using Hash routing, it is very troublesome to increase the number of word tables, and all the data has to be redistributed.

The advantages and disadvantages of Hash routing are basically opposite to range routing. The advantage of Hash routing is that the table is evenly distributed, but the disadvantage is that it is troublesome to expand the new table and all data has to be redistributed.

Configure routing: configure routing is the routing table, using a separate table to record routing information. Also take the user ID as an example, we add a user_router table, which contains two columns user_id and table_id, and you can query the corresponding table_id according to user_id.

The configuration of routing is simple and flexible to use, especially when expanding the table, you only need to migrate the specified data, and then modify the routing table.

The disadvantage of configuring routing is that it must be queried once more, which will affect the overall performance. Moreover, if the routing table itself is too large (for example, hundreds of millions of data), the performance may also become a bottleneck. If we divide the routing table into libraries and tables again, then we are faced with an endless loop routing algorithm selection problem.

Join operation

After dividing the tables horizontally, the data is scattered in multiple tables. If you need to make join queries with other tables, you need to do multiple join queries in the business code or database middleware, and then merge the results.

Count () operation

After dividing the tables horizontally, although the data is physically dispersed into multiple tables, some business logically treats these tables as one table. For example, getting the total number of records is used for paging or presentation, and the operation that can be done with a count () before the horizontal table is not so simple after the table is divided. There are two common ways to handle it:

Count () addition: this is done by count () each table in the business code or database middleware, and then adding the results. This method is simple to implement, but the disadvantage is that the performance is relatively low. For example, if the horizontal table is divided into 20 tables, 20 count (*) operations will be performed, and if serial, it may take a few seconds to get the result.

Record table: the specific approach is to create a new table, if the table name is "record table", including table_name, row_count two fields, each time after inserting or deleting child table data, update the "record table".

The performance of getting the number of table records in this way is much better than the addition of count (), because you only need a simple query to get the data. The disadvantage is that the complexity increases a lot, and the operation of the child table should synchronously operate the "record table". If a business logic is omitted, the data will be inconsistent; and the operation on the "record table" and the operation on the child table cannot be handled in the same transaction, and in the abnormal case, the operation of the child table will be successful and the operation of the record table will fail, which will also lead to data inconsistency.

In addition, the way of recording the data table also increases the writing pressure on the database, because every insert and delete operation on the child table requires update to record the data table, so for some businesses that do not require that the number of records be kept accurate in real time, the record table can also be updated regularly through the background. Timed update is actually a combination of "count () addition" and "record table", that is, regularly adding the number of records in the table through count (), and then updating the data in the record table.

Order by operation

After dividing the table horizontally, the data is distributed into multiple child tables, and the sorting operation can not be completed in the database, so the data in each child table can only be queried by the business code or database middleware respectively, and then sorted together.

Realization method

Similar to database read-write separation, the specific implementation of sub-library and sub-table is also "program code encapsulation" and "middleware encapsulation", but the implementation will be more complex. In the implementation of read-write separation, as long as you identify whether the SQL operation is a read operation or a write operation, you can do it by simply judging the keywords SELECT, UPDATE, INSERT and DELETE. The implementation of sub-library and sub-table should not only judge the type of operation, but also judge the specific tables, operation functions (such as count functions), order by and group by operations in SQL, and then carry out different processing according to different operations. For example, the order by operation needs to query the data of each library from multiple libraries, and then re-order by to get the final result.

This is the end of the introduction of "mysql database sub-table explanation". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.