MySQL: Internet companies commonly used sub-database sub-table scheme summary! 02/15 Update SLTechnology News&Howtos

MySQL: Internet companies commonly used sub-database sub-table scheme summary!

2026-02-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

1. Database bottleneck

Whether it is IO bottleneck or CPU bottleneck, it will eventually lead to an increase in the number of active connections to the database, and then approach or even reach the threshold of the number of active connections that the database can hold. From a business Service point of view, there are few or no connections available to the database. Then you can imagine (concurrency, throughput, crash).

1. IO bottleneck

First: disk read IO bottleneck, too much hot data, database cache, each query will generate a large number of IO, reduce the query speed-> sub-database and vertical sub-table. The second is the bottleneck of network IO, which requires too much data and insufficient network bandwidth-> sub-library.

2. CPU bottleneck

The first: SQL problems, such as join,group by,order by in SQL, conditional query of non-index fields, etc., increase the operation of CPU operation-> SQL optimization, establish an appropriate index, and carry out business calculation in the business Service layer. Second: the amount of data in a single table is too large, too many rows are scanned when querying, the efficiency of SQL is low, and CPU is the first to appear bottleneck-> horizontal sub-table.

Second, sub-library sub-table 1, horizontal sub-library

Concept: split the data in one library into multiple databases based on fields and in accordance with certain strategies (hash, range, etc.). Results:

The structure of each library is the same; the data of each library is different and there is no intersection; the union of all libraries is full data.

Scenario: the absolute concurrency of the system has come up, the sub-table is difficult to solve the problem fundamentally, and there is no obvious business attribution to divide the database vertically. Analysis: with more libraries, the pressure on io and cpu can be alleviated exponentially.

2. Horizontal subtable

Concept: split the data in one table into multiple tables based on fields and in accordance with certain strategies (hash, range, etc.). Results:

The structure of each table is the same; the data of each table is different and there is no intersection; the union of all tables is full data

Scenario: the absolute concurrency of the system has not come up, but there is too much data in a single table, which affects the efficiency of SQL and increases the burden of CPU, so that it becomes a bottleneck. Recommendation: analysis of the principle of SQL query optimization: the amount of data in the table is less, and the execution efficiency of single SQL is high, which naturally reduces the burden of CPU.

3. Vertical sub-library

Concept: based on tables, different tables are split into different libraries according to different business ownership. Results:

The structure of each library is different; the data of each library is different and there is no intersection; the union of all libraries is full data.

Scenario: the absolute concurrency of the system is up, and individual business modules can be abstracted. Analysis: at this point, we can basically become service-oriented. For example, with the development of business, there are more and more common configuration tables, dictionary tables, and so on. At this time, these tables can be split into separate libraries, or even serviced. In addition, with the development of the business, a set of business model has been hatched, at this time, the relevant tables can be split into a separate library, or even service.

4. Vertical subtable

Concept: based on the field, the fields in the table are split into different tables (main table and extended table) according to the activity of the field. Results:

The structure of each table is different; the data of each table is also different. Generally speaking, the fields of each table have at least one column intersection, usually a primary key, which is used to associate data; the union of all tables is full data.

Scenario: the absolute concurrency of the system does not come up, the records of the table are not many, but there are many fields, and the hot data and non-hot data together, the storage space of single row data is larger. As a result, the number of data rows in the database cache is reduced, and a large number of random reads of IO are generated when querying, resulting in IO bottlenecks. Analysis: list pages and details pages can be used to help understand. The splitting principle of vertical split tables is to put hot spot data (data that may be redundant and often queried together) together as primary tables and non-hot data together as extended tables. In this way, more hot data can be cached, thus reducing the number of random reads of IO. After disassembly, if you want to get all the data, you need to associate two tables to fetch the data. But remember, never use join, because join not only puts a burden on CPU but also couples two tables (which must be on a database instance). The associated data should be written in the business Service layer, obtaining the master table and extended table data respectively, and then associating all the data with the associated fields.

Third, sub-library sub-table tool sharding-sphere:jar, formerly known as sharding-jdbc;TDDL:jar,Taobao Distribute Data Layer;Mycat: middleware.

Note: please investigate the pros and cons of the tools by yourself. the official website and community are preferred.

IV. Steps of sub-database and sub-table

Evaluate the number of sub-databases or tables according to capacity (current capacity and growth)-> select key (uniform)-> sub-table rules (hash or range, etc.)-> execute (generally double write)-> expand capacity (minimize data movement). V. the problem of sub-database and sub-table

1. Non-partition key query problem

Based on the horizontal database table, the splitting strategy is the commonly used hash method. Besides partition key, there is only one non-partition key on the end as the conditional query mapping method.

Gene method

Note: when writing, user_id is generated by genetic method, as shown in the figure. With regard to the xbit gene, for example, it is divided into 8 tables, 23 to 8, so x takes 3, that is, the 3bit gene. When querying according to user_id, you can directly take the model and route to the corresponding sub-database or sub-table. When querying according to user_name, first generate user_name_code through user_name_code generating function, and then route it to the corresponding sub-library or sub-table. Id generates common snowflake algorithms.

In addition to partition key, there is more than one non-partition key as a conditional query mapping method.

Redundancy method

Note: when querying by order_id or buyer_id, it is routed to the db_o_buyer library, and when queried by seller_id, it is routed to the db_o_seller library. It feels like putting the cart before the horse! Is there any other good way? What about changing the technology stack?

In the background, in addition to partition key, there are all kinds of non-partition key combined conditional query NoSQL method.

Redundancy method

2. Non-partition key cross-database cross-table paging query problem

Based on the horizontal database table, the splitting strategy is the commonly used hash method.

Note: it is solved by NoSQL method (ES, etc.).

3. Capacity expansion

Based on the horizontal database table, the splitting strategy is the commonly used hash method. Horizontal expansion library (upgrade from library method) *

Note: the capacity expansion is multiplied.

Horizontal expansion table (double write migration method) *

The first step: (synchronous double writing) modify the application configuration and code, add double writing, and deploy; the second step: (synchronous double writing) copy the old data in the old library to the new library; the third step: (synchronous double writing) proofread the old data in the new database; step 4: (synchronous double writing) modify the application configuration and code, remove double writing, and deploy.

Note: double writing is a general scheme.

6. sub-database sub-table summary sub-library sub-table, we must first know where the bottleneck is, and then we can split it reasonably (sub-library or sub-table? Horizontal or vertical? How many? ). And can not be split for the purpose of subdatabase and table. It is important to choose key, taking into account both uniform splitting and non-partition key queries. As long as the demand is met, the split rules are as simple as possible.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.