Evolution of database architecture and sub-database and table 07/01 Update SLTechnology News&Howtos

Evolution of database architecture and sub-database and table

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

When the volume of business in the production environment surges, the amount of database data will also increase greatly. When the amount of data in the database reaches a certain extent (database bottleneck), the load of the database host is too high, which will seriously affect the business and lead to database downtime.

In order to avoid this extreme situation, we should make a plan before it occurs to solve the problem of database data overload.

Here are the solutions I use in my personal work:

1) Master-slave or multi-master and multi-slave scheme of database

2) split hot and cold data of database

3) Database sub-database and sub-table operation

4) add cache redis or memcached to the front end of the database

At the beginning, part of the company's business is structured as follows (all single-node cases)

After emphasizing the serious shortcomings of the architecture, such as a single node, serious service interruption and other serious problems, it evolved into the following schematic architecture:

In the above architecture, we update the code to achieve uninterrupted service, while solving part of the single node problem. Later, for a period of time, the business did promotional activities, the volume of business soared, and the amount of database data increased exponentially. The database load has remained high. The current architecture can no longer meet the business requirements, and then proposes to split hot and cold data and separate data reading and writing from the database (you can consider using middleware Mycat, Atlas, MaxScale, or you can let developers write routing rules, and some frameworks have their own routing rules). Here we use Mycat, such as the architecture shown below, where only the backend is shown:

With the passage of time, single database and single table has reached the limit of business tolerance, at this time, it has been unable to meet the business either in hardware or in mysql parameter tuning. At this time, the sub-database and sub-table of the database is put forward. This paper mainly explains the sub-database and sub-table of the database.

There are two solutions for sub-database and sub-table:

1) Vertical split

2) horizontal split

1. Vertical split

The so-called vertical split is to split a single database into multiple databases, which can be considered:

1) split according to business logic

2) split according to hot and cold data

Here's how to split according to business logic:

Here, take goods-orders-users as an example, when many kinds of goods are stored in a database and a table, with the passage of time, the amount of data increases, the query ability of a single database and single table decreases, and it can be split according to the types of goods. for example, the company's products: high school, junior high school, primary school, by a single table is divided into three independent tables. If it is still big, continue to split, high school split into high school one, high two, high three. Junior high school is divided into Grade one, Grade two and Grade three. The primary school is divided into Primary one and Primary two. Xiao Liu. All sub-databases and tables are derived from the logic of life.

In our production environment, we split the data into cold data (using MyISAM engine) and hot data (using Innodb engine). At the same time, the host of MyISAM engine database is mostly cached by redis or memcache. The host of Innodb engine database is allocated computing and memory physical machines or hybrid ones according to business conditions. At the same time, we can consider using redis for partial storage to ease the pressure on the database.

Vertical split pros and cons:

Advantages:

1) split single database and single table into multiple databases and multiple tables, which reduces the pressure of database addition, deletion, modification and query.

2) split the hot and cold data, reduce the cost and make rational use of the hardware.

3) for the vertically split database, the extensibility of the database architecture should be considered when designing the database (otherwise it will cause great resistance to the later expansion)

4) after dividing the database according to the business, the business logic is clearer and more convenient for operation and maintenance management.

Disadvantages:

The main results are as follows: 1) it brings inconvenience to the join table query, and the operation of the join table query can be triggered by calling the interface, which increases the complexity for the whole system.

2) for some databases, the performance bottleneck of a single database affects the whole business situation.

3) at the same time, the complexity of the transaction is increased.

2. Horizontal split

The so-called horizontal split is simply to split the data in a table into multiple copies according to rows and store them in tables in different databases. For example, the user table has 9000 pieces of data, and the user table is divided into user1, user2 and user3 tables and stored in different databases. User1 stores the first 3000 pieces of data, the middle 3000 pieces of data are stored in the user2 table, and the last 3000 pieces of data are stored in the user3 table.

The fragmentation dimension of horizontal split (there are many algorithms to decide which horizontal split scheme to use)

1) according to the hash slice, the hash value of a field is calculated, and then divided by the number of shards. Finally, the data with the same module is taken as a shard, which is called hash slicing.

This slicing method is not timely. The data is distributed evenly, but the disadvantage is that if you need to query, you need to aggregate the data.

2) slicing according to the time series algorithm, some businesses will have obvious quarterly fluctuations, you can use the time algorithm. This algorithm is that the data is unevenly distributed.

In the production environment, part of our business uses a hybrid of hash algorithm and time series algorithm.

When it comes to horizontal split, we have to mention the routing rules of horizontal split.

When designing the database, we should not only consider the routing rules of various tables in the database, but also need to consider the routing rules of the data tables in the future.

For example, if a new user registers, how is the user assigned to which library and which table? Generally speaking, when registering, the system will automatically assign a uid. According to this uid, it can be allocated according to some algorithm. For example, uid%4, (here divides a table into four tables), if it is 1, it is assigned to the first table, and so on.

Advantages and disadvantages of horizontal split:

1) the maximum amount of data in a single database and single table is that large, and there is a limit on the amount of data. We can allocate it according to the business situation, just reaching the maximum (this amount is difficult to grasp), which can not only improve the operational performance of the table, but also save resources.

2) because there are very few structural changes to the table, the amount of code changes for developers is very small, and only routing rules need to be added.

3) it can greatly improve the stability and load of the whole system.

Disadvantages:

1) jion operation is very difficult, especially the query of joined tables across libraries.

2) some splitting rules are difficult to abstract.

3) the transaction consistency of fragments is difficult to solve.

4) it is also difficult to expand and maintain the database.

In view of the above vertical split and horizontal split, there are the following shortcomings:

1) it is difficult to operate jion

2) transaction consistency is difficult

3) the management of multiple data sources becomes complicated.

How to solve the problem of transaction consistency? (this importance comes first.)

1) adopt the consistency of local transactions (if available)

2) distributed transaction processing

Solutions for distributed transaction processing:

1) two-phase commit scheme (the strictest scheme is rarely used because the blocking protocol causes performance problems): it is divided into two phases: preparation (locking resources) and commit (consumption of resources). This approach depends on the resource manager.

2) best effort to ensure that the pattern (most commonly used, extreme cases require real-time compensation, and rollback of submitted data) does not depend on the resource manager

There are two operations: consuming messages from message queues and updating database operations

Start message transaction  start database transaction  receive message  update database  commit database transaction  commit message transaction

When the database is updated, it will be suddenly interrupted, and will be rolled back to the initial state. In special cases, if the commit database transaction succeeds, but the commit message transaction fails, it will cause the message to be consumed again. This can be done through idempotent processing of messages (sometimes many messages can not meet idempotency such as update operation, you can consider adding a message application state table to record message consumption and database transactions). The above extreme cases need to be compensated in real time. This compensation mechanism is similar to the TCC mode Try-Confirm-Cancel (if the try is successful, it repeats the confirm or cancel until all are successful), and TCC requires that the Confirm/Cancel must be idempotent.

3) transaction compensation mechanism

The first two schemes are not the best, and the transaction compensator can not only ensure the performance but also ensure the consistency of the transaction as much as possible. To put it bluntly, it breaks through the idea that transactions will be rolled back as soon as they fail, specifying that they will continue to commit for a certain period of time until they are successful, and roll back if they time out.

Problems in the process of dividing libraries and tables

1. Capacity expansion and data migration

The data has been sliced, and the amount of data has almost reached the threshold, so it is necessary to expand the capacity of the cluster.

Here are 5 steps to expand the capacity:

The main contents are as follows: 1) add new routing rules, write new routing rules for new database, and write old routing rules for old database. (double write, two sets of routing rules)

2) write the old data before double writing to the new database according to the new rules (a lot of data cleaning needs to be done).

3) change the query according to the old sharding rule to the new sharding rule.

4) offline the double-written routing rule code, and only write data according to the new rule

5) Delete the historical data written according to the old sharding rules.

2. the problem of join table query caused by sub-database and sub-table.

For example:

Sellers and buyers, sellers need to check the sales of the goods, and buyers need to check their own transactions. You can consider using two tables, one in the buyer dimension to record the buyer's commodity transactions, and the other in the seller dimension to record the seller's transactions in the goods. That is, the query transaction and transaction data are stored separately and provide interfaces from different systems.

Method 2: solve it through search engine (big data)

Reference documentation:

Https://blog.csdn.net/it_man/article/details/21593187

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.