What is the basic idea and segmentation strategy of DB library and table? 07/19 Update SLTechnology News&Howtos

What is the basic idea and segmentation strategy of DB library and table?

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

What is the basic idea and strategy of DB sub-database and sub-table? I believe that many inexperienced people do not know what to do about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

I. basic ideas

The basic idea of Sharding is to split a database into multiple parts and put them on different databases (server), so as to alleviate the performance problems of a single database. Not strictly speaking, for a database with a large amount of data, if it is because of more tables and more data, it is appropriate to use vertical sharding, that is, to split out tables that are closely related (such as the same module) and put them on a server. If there are not many tables, but there is a lot of data in each table, it is appropriate to split the table's data horizontally, that is, to split the table's data into multiple databases (server) according to certain rules, such as ID hashing. Of course, in reality, these two situations are more likely to be mixed together, and choices need to be made according to the actual situation, or a combination of vertical and horizontal splitting may be used to split the original database into a database (server) array that can be expanded like a matrix. Vertical segmentation and horizontal segmentation are introduced in detail below.

The characteristic of vertical segmentation is that the rules are simple and the implementation is more convenient, especially suitable for systems with very low coupling between businesses, little interaction and clear business logic. In this system, it is easy to split the tables used by different business modules into different databases. Splitting according to different tables will have less impact on the application, and the splitting rules will be relatively simple and clear. This is also known as "share nothing".

Horizontal segmentation is slightly more complex than vertical segmentation. Because you want to split different data in the same table into different databases, the split rule itself is more complex for the application than splitting based on the table name, and later data maintenance will be more complex.

Let's consider data segmentation in terms of the general situation: on the one hand, it is usually impossible for all tables in a library to be concatenated by a single table, which implies that horizontal segmentation is almost always for tables that are closely related to a small rub (which is actually a vertical split block), but not for all tables. On the other hand, for some systems with very high load, even a single table cannot bear its load through a single database host, which means that vertical sharding alone cannot completely solve the problem. Therefore, most systems will combine vertical segmentation and horizontal segmentation, first do vertical segmentation of the system, and then selectively do horizontal segmentation for each small rubbing table. Thus, the whole database is divided into a distributed matrix.

2. Segmentation strategy

As mentioned earlier, slicing is done first vertically and then horizontally. The result of vertical segmentation paves the way for horizontal segmentation. The idea of vertical segmentation is to analyze the aggregation relationship between tables and put closely related tables together. In most cases, it may be the same module, or the same "aggregation". The "aggregation" here is what domain-driven design calls aggregation. Within the vertically split table aggregation, find the "root element" (here the "root element" is the "aggregation root" in domain-driven design), split horizontally according to the "root element", that is, starting with the "root element", put all the data associated with it directly and indirectly into a shard. In this way, the possibility of cross-shard association is very small. The application does not have to break the existing association between tables. For example, for social networking sites, almost all data will eventually be associated with a user, and segmentation based on the user is the choice. For example, the forum system, users and forums two modules should be divided in the two shard vertical syncopation, for the forum module, Forum is obviously the aggregation root, so split horizontally by Forum, all posts and replies in Forum are placed in a shard with Forum is very natural.

For shared data, if it is a read-only dictionary table, it should be a good choice to maintain a copy in each shard so that you do not have to break the relationship. If it is a cross-node association between general data, it must be broken.

It is important to note that there are some subtle changes in the segmentation strategy when both vertical and horizontal segmentation is performed at the same time. For example, when only vertical segmentation is considered, tables that are divided together can maintain arbitrary associations, so you can divide tables according to "functional modules", but once horizontal segmentation is introduced, the association between tables will be greatly restricted, usually allowing only one primary table (a table hashed with that table ID) and its multiple secondary tables to retain the association. That is to say: when vertical and horizontal segmentation is carried out at the same time, the vertical segmentation in the vertical direction will no longer be divided by "functional modules", but will require more fine-grained vertical segmentation, which coincides with, or even completely consistent with, the concept of "aggregation" in domain-driven design, and the main table of each shard is the aggregation root in an aggregation! In this way, you will find that the database split is too scattered (there will be a large number of shard, but there are not many tables in the shard). In order to avoid managing too many data sources and make full use of the resources of each database server, you can consider putting two or more shard that are similar in business and have similar data growth rates (the amount of data in the main table is on the same order of magnitude) into the same data source. Each shard is still independent, they have their own master table and use their own master table ID for hashing, except that their hash mode (that is, the number of nodes) must be consistent.

1. Transaction problem

At present, there are two feasible solutions to solve transaction problems: distributed transactions and transactions through the joint control of applications and databases. Here is a simple comparison between the two schemes.

Scenario 1: using distributed transactions

Advantages: database management, simple and effective

Disadvantages: high performance costs, especially when there are more and more shard

Plan 2: controlled by the application and the database

Principle: split a distributed transaction across multiple databases into multiple small transactions that are only on a single database, and control each small transaction through the application.

Advantages: advantages in performance

Disadvantages: the application needs to be flexible in transaction control. If the transaction management of spring is used, it will be difficult to change.

two。 The problem of cross-node Join

As long as the segmentation is carried out, the problem of cross-node Join is inevitable. But good design and segmentation can reduce the occurrence of this kind of situation. The common way to solve this problem is to implement two queries. Find the id of the associated data in the result set of *, and then initiate a second request to get the associated data based on these id.

3. Cross-node count,order by,group by and aggregate function problem

These are a class of problems because they all need to be calculated based on the entire data set. Most agents do not handle merging automatically. Solution: similar to solving the cross-node join problem, the results are obtained on each node and merged on the application side. Unlike join, queries on each node can be executed in parallel, so in many cases it is much faster than a single large table. However, if the result set is large, the application memory consumption is a problem.

On the granularity of vertical slicing Vertical Sharding

The granularity of vertical segmentation means that several levels of related tables are allowed to be placed in a shard when doing vertical segmentation. This problem has a great impact on applications and sharding implementations.

The more association interruptions, the more join operations affected, and the greater the compromise made by the application, but the simpler the routing of a single table and the less relevant to the business, the easier it is to use the unified mechanism. The extreme solution in this direction is to break all connections, each table is equipped with routing rules, and can be handled automatically using a unified mechanism or framework. For example, in a framework such as amoeba, routes can and only be routed through the characteristics of SQL, such as the id of a table.

On the other hand, if the association interruptions are less, the restrictions on join operation will be smaller, and the less compromise the application needs to make, but the routing of the table will become more complex, the greater the relevance with the business, the more difficult it is to use a unified mechanism to deal with, and it needs to be routed separately for each data request. The extreme solution in this direction is that all tables are in one shard, that is, there is no vertical segmentation, so that no association is broken. Of course, this is very extreme, unless the whole database is simple and the number of tables is small.

The actual granularity control needs to be combined with the two factors of "business compactness" and "tabular data volume". Generally speaking:

If the tables grouped together are closely related, the amount of data is not large, and the growth rate is very slow, it is appropriate to put them in a shard, without the need for horizontal segmentation.

If the table data grouped together is huge and growing rapidly, it is bound to be split horizontally on the basis of vertical segmentation, which means that the original single shard will be subdivided into a number of smaller shard, and each shard has a main table (that is, a table that will be hashed by the table ID) and a number of related associated tables.

In a word, the granularity of vertical segmentation presents the situation of coexistence of advantages and disadvantages and game with each other in two opposite directions. What the architect needs to do is to strike a profitable balance between the two combined with the actual situation of the project.

After reading the above, have you mastered the basic idea of DB sub-library and sub-table and how to split the strategy? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.