What is the amount of data suitable for a single table in Mysql? 07/13 Update SLTechnology News&Howtos

What is the amount of data suitable for a single table in Mysql?

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

What is the amount of data that Mysql is suitable for a single table? in view of this question, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

An important factor that affects the optimal maximum number of Mysql single tables is actually the index.

We know that Mysql's main storage engine, InnoDB, uses a B+ tree structure for indexing. (why Mysql chooses b + trees over other data structures to organize indexes is not the topic of this article, which will be discussed in a later article.) So how does the B+ tree index affect the amount of data in a single table in Mysql?

B + tree

A B+ tree is as follows:

The Btree index of Mysql is stored on disk, and the size of disk Page read by Mysql is 16KB each time. In order to ensure the efficiency of each query, it is necessary to ensure the number of times each query accesses the disk, which is generally designed to be 2-3 times of disk access, and more performance will be seriously inadequate. Each node of the Mysql B+ tree index needs to store a pointer (8Byte) and a key value (8Byte). Therefore, calculating 16KB/ (8B+8B) = 1K 16KB can store 1K nodes, and three disk accesses (that is, the depth of B+ tree 3) can store 1K _ 1K _ 1K or 1 billion data.

If the query relies on a non-primary key index, a secondary index is also involved. The amount of data will be smaller in this way.

Split

Divide and conquer-there is no problem that can't be solved by breaking it down once, or many times if you can't.

The amount of data stored in a single Mysql table is limited. One way to solve the problem of large amount of data storage is to divide the database and table. To put it bluntly, one database cannot hold so much data in one table, so it is stored in multiple databases and multiple tables.

Split can be divided into vertical split and horizontal split.

Vertical split is based on different tables (or Schema) to split into different databases (hosts), while horizontal split is to split the data in the same table into multiple databases (hosts) or different tables with the same Schema according to certain conditions according to the logical relationship of the data in the table.

The most important feature of vertical split is that the rules are simple and the implementation is more convenient, which is especially suitable for systems with very low coupling between businesses, little interaction and clear business logic. In this system, it is easy to split the tables used by different business modules into different databases. Splitting according to different tables will have less impact on the application, and the splitting rules will be relatively simple and clear.

Horizontal splitting is slightly more complex than vertical splitting. Because you want to split different data in the same table into different databases, the split rule itself is more complex for the application than splitting based on the table name, and later data maintenance will be more complex.

The most direct way to split vertically is to split the service by domain and isolate the domain database. In this way, the data pressure on each library is reduced.

Horizontal split is to split the data of the same Schema into different databases or different tables, so that the amount of data in each table will be reduced, and the query efficiency will be more efficient. Horizontal splitting involves the slicing rules of the table.

Several typical slicing rules include:

According to the user ID model, the data is distributed to different databases, and the data of users with the same data are scattered into one database.

Spread the data from different months or even days into different databases according to the date.

Touch according to a specific field, or spread to different libraries according to a specific range of segments.

Realize

Facade mode-there is no problem that cannot be solved by adding a middle layer.

One solution for vertical splitting is to use multiple data sources at the application layer to access different data sources by business. Another better solution is actually micro-service. Split micro-services according to different business areas, define domain boundaries, and isolate domain databases. This integrates the access to data into independent services and provides a unified interface to the outside world. When we need to rely on multiple services at the same time, we can combine the data of the underlying services by adding facade applications to provide interfaces that are more in line with the needs of the upper-level business, and these services are often closer to the real business. The underlying services are more cohesive resource services.

Proxy mode-there is no problem that cannot be solved by adding an intermediate layer.

The data access troubles caused by horizontal split should be shielded as far as possible, so that the upper-level business does not need to care about the lower-level data organization. Horizontal splitting often does these things by adding a proxy layer, which provides virtual tables on top of the proxy layer, which are just like the single tables we designed on a single database; the proxy layer parses and splits sql, then executes the corresponding sql requests in different libraries and tables according to the corresponding rules, merges the data, and returns the merged results to the upper callers.

There are two general ways of agency:

In-process agent-in-process agent embeds the agent layer into the business service, intercepts sql requests and handles them accordingly. The advantage is simple, but intrusive and inflexible.

Out-of-process agent the out-of-process agent is about to act as an independent service to proxy requests between real business services and databases. This is more complex and requires a highly available proxy service architecture. But this is less intrusive to the business, and is easy to upgrade and expand.

problem

Distributed transaction problem

What is a distributed transaction? The definition of local transaction is to satisfy the four characteristics of ACID after a series of related database operations are completed, while distributed transaction is to put the operations of the same process into different micro-service processes, that is, the database operations of different micro-service application processes meet the transaction requirements, or a series of operations on different databases need to meet the transaction requirements.

There are two problems that need to be solved. One is caused by the distribution of the application, the other is caused by the distribution of the database itself. The distributed transaction problem of the database itself is generally solved by the database itself, and most distributed databases can guarantee certain data consistency, such as the strong consistency guaranteed by HBase and the final consistency guaranteed by Cassandra.

We can also refer to the implementation principle of distributed database to apply the consistent transaction scheme of data. There are also many solutions to distributed transactions in the industry, such as:

XA scheme

TCC scheme

Local message table

Reliable message final consistency scheme

Best effort notification programme

Multi-table Join problem

By analyzing the Join sql, the sql is split into separate query requests, then executed separately, and the results are merged and calculated back to the caller. There are a lot of optimization issues involved in this place.

Data statistics problem

When the data is sliced into different databases or different tables, some problems will be encountered when doing some global statistics on the data or involving a large amount of data. Such as Max,Min,Sum and other aggregation problems. If the statistical data has certain business rules, for example, it can only be counted by user dimension, such as the order quantity of a user, then the slicing of the order table can actually be sliced by user id, so that such statistical problems can be solved. But this scheme is not universal. Many sharding agent services need to split the sql to different nodes to execute, and then merge the results to return.

ID problem

After using sub-libraries and sub-tables, you cannot use Mysql's table self-increment as id, because the self-increment of different libraries and tables will result in conflicting id. To solve this problem, it is necessary to introduce distributed id generation technology.

The answer to the question about how much data Mysql is suitable for a single table is shared here. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.