How to understand the vertical and horizontal segmentation of MySQL 07/04 Update SLTechnology News&Howtos

How to understand the vertical and horizontal segmentation of MySQL

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "how to understand the vertical and horizontal segmentation of MySQL". In the daily operation, I believe many people have doubts about how to understand the vertical and horizontal segmentation of MySQL. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the questions of "how to understand the vertical and horizontal segmentation of MySQL". Next, please follow the editor to study!

The limitations of replication: once the database is too large, especially when writes are too frequent to be supported by a host, we still face an expansion bottleneck. Data segmentation (sharding): through some specific conditions, the data we store in the same database are distributed to multiple databases (hosts), so as to achieve the effect of dispersing the load of a single device. Data segmentation can also improve the overall availability of the system, because after a single device Crash, only some part of the overall data is not available, not all of the data.

Sharding mode of data

One is to split the data into different databases (hosts) according to different tables (or Schema), which can be called vertical (vertical) segmentation of data; the other is to split the data in the same table to multiple databases (hosts) according to certain conditions according to the logical relationship of the data in the table, which is called horizontal (horizontal) segmentation of data.

Vertical syncopation:

A well-designed application system, its overall function must be composed of many functional modules, and the data needed by each functional module corresponds to one or more tables in the database. In the architecture design, the more unified the interaction points between the functional modules are, the lower the coupling degree of the system is, and the better the maintainability and expansibility of the system modules are. In such a system, it is easier to realize the vertical segmentation of data.

In general, if it is a system with a relatively low load, and table associations are very frequent, then the database may concede, and the scheme of merging several related modules together to reduce the work of the application can reduce more workload. this is a feasible solution. An example of a vertical split:

1. User module table: user,user_profile,user_group,user_photo_album

two。 Group discussion table: groups,group_message,group_message_content,top_message

3. Album related tables: photo,photo_album,photo_album_relation,photo_comment

4. Event information table: event

There is mainly an association between the group discussion module and the user module through the user or group relationship. Generally, the association will be carried out through the user's id or nick_name and the id of group, and it won't bring much trouble to realize it through the interface between modules.

The photo album module is only associated with the user module through the user. The relationship between the two modules basically has the content associated through the user id, which is simple and clear, and the interface is clear.

The event module may be associated with each module, but all focus on the ID information of the objects in each module, which can also be easily split.

Advantages of vertical slicing

The splitting of the database is simple and clear, and the splitting rules are clear.

The application module is clear and easy to integrate.

Data maintenance is easy and easy to locate.

Disadvantages of vertical slicing

Partial table association cannot be done at the database level and needs to be done in the program

There is still a performance bottleneck for tables with extremely frequent access and a large amount of data, which may not necessarily meet the requirements.

Transaction processing is relatively more complex

After the segmentation reaches a certain degree, the scalability will be limited.

Overreading syncopation may make the system too complex and difficult to maintain.

Horizontal syncopation

A frequently accessed table is then distributed into multiple tables according to some rules of a field, each table containing a portion of the data.

For the above example: all the data is associated with the user, then we can split the data of different users into different databases horizontally according to the user.

Now the Internet is very popular Web2.0 type websites, basically most of the data can be related through member user information, maybe many core tables are very suitable for horizontal data segmentation through member ID. For example, the forum community discussion system is easier to split, and it is very easy to split the data horizontally according to the forum number. After segmentation, there is basically no interaction between the various libraries.

Advantages of horizontal slicing

Table association can basically be completed on the database side.

There will not be bottlenecks in some very large data volumes and high load tables.

There are relatively few changes to the overall architecture of the application side

Transaction processing is relatively simple

As long as the segmentation rules can be well defined, it is basically difficult to encounter scalability restrictions.

Disadvantages of horizontal slicing

The segmentation rule is relatively more complex, so it is difficult to abstract a segmentation rule that can satisfy the whole database.

In the later stage, it is more difficult to maintain the data, and it is more difficult to locate the data manually.

The coupling degree of each module of the application system is high, which may cause some difficulties to the migration and separation of the later data.

The combination of two kinds of segmentation:

Generally speaking, it is difficult for all the tables in our database to be associated with one (or a few) fields, so it is difficult to solve all the problems simply through the horizontal segmentation of the data. Vertical sharding can only solve part of the problem, for those systems with very high load, even a single table cannot bear its load through a single database host. We must use both "vertical" and "horizontal" segmentation at the same time.

The load of each application system increases step by step. When they encounter performance bottlenecks, most architects and DBA will choose to split the data vertically first, because this cost is the first and most in line with the maximum input-output ratio pursued in this period. However, with the continuous expansion of business and the continuous growth of system load, after a period of stability of the system, the database cluster after vertical split may once again be overwhelmed and encounter performance bottlenecks.

If we continue to subdivide the modules and split the data vertically as we did at the beginning, we may encounter the same problems we face now in the near future. And with the continuous refinement of the module, the architecture of the application system will become more and more complex, and the whole system is likely to get out of control.

At this time, we must solve the problems encountered here through the advantage of horizontal segmentation of the data. Moreover, when we use horizontal data segmentation, we do not need to overturn the results of data vertical segmentation before, but on its basis to use the advantages of horizontal segmentation to avoid the disadvantages of vertical segmentation and solve the problem of increasing complexity of the system. The disadvantages of horizontal split (the rules are difficult to unify) have also been solved by the previous vertical split, so that horizontal split can be carried out easily.

Sample database:

Suppose that at the beginning, we carried out the vertical segmentation of the data, but with the continuous growth of the business, the database system encountered a bottleneck, we chose to reconstruct the architecture of the database cluster. How to reconstruct? Considering that the vertical segmentation of the data has been done before, and the module structure is clear and clear. And the momentum of business growth is getting stronger and stronger, even if we further split the module now, it will not last long.

The horizontal split is selected on the basis of vertical split.

After a vertical split, each database cluster has only one functional module, and basically all the tables in each functional module are associated with a field. For example, all user modules can be segmented through user ID, group discussion modules can be segmented through group ID, and photo album module can be segmented according to album ID. The final event notification information table takes into account the time limit of the data (only the information of the most recent event segment will be accessed).

Data segmentation and integration scheme.

After the data in the database is stored in different database hosts after vertical and / or horizontal segmentation, the biggest problem faced by the application system is how to integrate these data sources well, in which there are two solutions:

Configure and manage one (or more) data sources in each application module, access each database directly, and complete the data integration within the module.

All data sources are managed uniformly through the intermediate agent layer, and the back-end database cluster is transparent to the front-end applications.

The second scheme, although the cost may be relatively higher in the short term, is very helpful to the scalability of the whole system. For the second scheme, the methods and ideas that can be chosen are as follows:

1. Use MySQLProxy to realize data segmentation and integration.

It can be used to monitor, analyze or transmit communication between them. Its flexibility allows you to make the most of it. The main functions currently available are connection routing, Query analysis, Query filtering and modification, load balancing, and basic HA mechanisms. MySQLProxy itself does not have all of these functions, but provides the basis for implementing them. To achieve these functions, we also need to write our own LUA script to achieve.

Principle: MySQLProxy actually establishes a connection pool between the client request and the MySQLServer. All client requests are sent to MySQLProxy, and then analyzed by MySQLProxy to determine whether the read operation or write operation is distributed to the corresponding MySQLServer. For multi-node Slave clusters, it can also achieve the effect of load balancing.

two。 Using Amoeba to realize data segmentation and integration

Amoeba is an open source framework based on Java and focuses on solving distributed database data source integration Proxy programs. Amoeba already has Query routing, Query filtering, read-write separation, load balancing and HA mechanism and other related content. The following problems are mainly solved by Amoeba:

Integration of complex data sources after data segmentation

Provide data segmentation rules and reduce the impact of data segmentation rules on the database

Reduce the number of connections between database and client

Read-write separate routing

AmoebaFor MySQL is mainly a solution for MySQL database. The protocol requested by the front-end application and the data source database connected to the back-end must be MySQL. For any application on the client side, AmoebaForMySQL is no different from a MySQL database, and any client request using the MySQL protocol can be parsed by AmoebaForMySQL and processed accordingly.

The common functions of Proxy programs, such as read-write separation, load balancing and so on, are configured in amoeba.xml. Amoeba already supports automatic routing for vertical and horizontal data segmentation, and routing rules can be set in rule.xml.

3. Using HiveDB to realize data segmentation and integration

HiveDB is also an open source framework for data sharding and integration based on Java for MySQL databases, but the current HiveDB only supports horizontal data sharding. It mainly solves the problems of database expansibility and high-performance data access under a large amount of data, while supporting data redundancy and basic HA mechanism.

The implementation mechanism of HiveDB is different from that of MySQLProxy and Amoeba. It does not use the Replication function of MySQL to achieve data redundancy, but implements the data redundancy mechanism on its own, and its bottom layer is mainly based on HibernateShards to achieve data segmentation. Possible problems in data segmentation and integration

The problem of introducing distributed transactions?

Once the data is split and stored in multiple MySQLServer, no matter how perfect our sharding rules are designed (in fact, there are no perfect sharding rules), it is possible that the data involved in some previous transactions is no longer in the same MySQLServer.

Split a distributed transaction across multiple databases into multiple small transactions that are only on a single database, and control each small transaction through the application.

The problem of cross-node Join?

First take the data from one node, then according to the data, and then fetch the data from another table.

With the Federated storage engine, the problem is that if the remote table structure changes, the local table definition information will not change accordingly.

Cross-node merge sorting paging problem?

There is generally a sequential relationship between data reads between multiple tables involved in Join itself. But sorting paging is not the same, the data source of sorting paging can basically be said to be a table (or a result set), and there is no order relationship itself, so the process of fetching data from multiple data sources can be completely parallel. In this way, the fetch efficiency of sorting paged data is higher than that of cross-library Join, so the performance loss is relatively smaller.

At this point, the study on "how to understand the vertical and horizontal segmentation of MySQL" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.