Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the reason for the need for sub-database and sub-table

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what is the reason for the need to divide the library and table". The explanation in this article is simple and clear, and is easy to learn and understand. Please follow the idea of Xiaobian and go deep into it slowly to study and learn "what is the reason for the need to divide the library and table" together.

Why do we need to divide the inventory into separate tables?

First of all, why do you want to divide the database into tables? The answer is very simple: the database has performance bottlenecks. In plain English, the database is running out of steam.

Database performance bottleneck, external performance has several aspects:

massive request blocking

In high concurrency scenarios, a large number of requests need to operate on the database, resulting in insufficient connections and blocking requests.

SQL operations slow down

If there is a table with hundreds of millions of data in the database, a SQL miss index will scan the whole table, and this query will take a long time.

Storage problems

The business volume increases sharply, and the single database data volume becomes larger and larger, which causes great pressure on storage.

From the point of view of the machine, the performance bottleneck is nothing more than CPU, memory, disk, network these, to solve the performance bottleneck is the simplest and crudest way to improve the performance of the machine, but through this method the cost and benefit investment ratio is often too high, not cost-effective, so the focus is still from the software point of view.

Database correlation optimization scheme

There are many database optimization schemes, which are mainly divided into two categories: software level and hardware level.

Software level includes: SQL optimization, table structure optimization, read-write separation, database clustering, sub-database sub-table, etc.

Hardware level is mainly to increase machine performance.

SQL Tuning

SQL tuning is often the first step in solving a database problem, and it is often possible to get a large benefit with a small amount of effort.

SQL tuning is mainly aimed at making slow SQL as fast as possible, and the method is actually very simple: SQL execution hits the index as much as possible.

Open slow SQL records

If you are using Mysql, you need to configure a few parameters in the Mysql configuration file.

slow_query_log=on long_query_time=1 slow_query_log_file=/path/to/log

Tools for tuning

The explain command is often used to view the execution plan of SQL statements. It is easy to know whether the SQL statement is a full table scan and whether it hits the index by observing the execution results.

select id, age, gender from user where name = 'Laughing Architect';

There is a column called "type" returned, and common values are:

ALL, index, range, ref, eq_ref, const, system, NULL(from left to right, poor to good performance)

ALL means that the SQL statement has been scanned in its entirety and needs to be optimized. Generally speaking, it needs to reach the range level and above.

Table structure optimization

Take an example of a scenario:

"user" table has user_id, nickname and other fields,"order" table has order_id, user_id and other fields, if you want to get user nickname how to do? In general, the user table is queried in association with the order table through the join association table operation, so as to obtain the nickname of the guide user.

However, with the increase of business volume, the order table and user table will definitely increase exponentially. At this time, it is more laborious to associate data through the two tables. In order to obtain a nickname field, we have to associate and query hundreds of millions of user tables. The speed can be imagined.

At this point, try adding the nickname field to the order table (order_id, user_id, nickname), which is often called database table redundancy. The advantage of doing this is that you don't need to query the user table when presenting the order list.

Redundant fields also have a disadvantage. If the update of this field involves the update of multiple tables at the same time, it is necessary to choose fields that are not updated frequently when selecting redundant fields.

architecture optimization

When a single database instance cannot withstand, we can add instances to form a cluster for external services.

When we find that there are significantly more read requests than write requests, we can make the master instance responsible for writing, and the slave instance provides read capability to the outside world;

If the reading instance pressure is still high, you can add a cache in front of the database, such as redis, so that requests take priority from the cache to reduce database access.

After the cache shares part of the pressure, the database is still a bottleneck. At this time, you can consider the scheme of dividing the database into tables, which will be described in detail later.

hardware optimization

Hardware costs are very high, and it is generally impossible to upgrade hardware when encountering database performance bottlenecks.

In the early days when the business volume is relatively small, upgrading the hardware database performance can be greatly improved; but in the later period, the benefits of upgrading the hardware are not so obvious.

Detailed explanation of sub-library and sub-table

Below we take a mall system as an example to explain how the database evolves step by step.

Single application Single database

In the early stage of entrepreneurship, I want to make a mall system, basically a system containing multiple basic function modules, and finally packaged into a war package deployment, which is a typical single architecture application.

Shopping mall project use single database

As shown in the above figure, the mall system includes home Portal template, user module, order module, inventory module, etc. All modules share a database, usually there are many tables in the database.

Because the number of users is small, such an architecture is fully applicable in the early days, and developers can take demos everywhere to find (cheat) investors.

Once you get the investor's money, the business needs to start large-scale promotion, and the system architecture needs to match the rapid development of the business.

Multi-application single database

In the early stage, in order to seize the market, this system was constantly updated iteratively, the code volume became larger and larger, and the architecture became more and more bloated. Now, as the system access pressure gradually increased, the system split was imperative.

In order to ensure smooth service, system architecture reconfiguration is also divided into several stages.

In the first stage, the mall system monomer architecture is divided into sub-services according to functional modules, such as Portal service, user service, order service, inventory service, etc.

Multi-application single database

As shown in the figure above, multiple services share a database, so that the underlying database access logic can be left untouched and the impact is minimized.

Multi-application multi-database

With the increase in business promotion efforts, the database has finally become a bottleneck. At this time, it is basically impossible for multiple services to share a database. We need to split up each service-related table to create a separate database, which is actually a "sub-database."

A single database can support a limited amount of concurrency, split into multiple libraries can make services do not compete, improve service performance.

Multi-application multi-database

As shown in the above figure, multiple small databases are separated from a large data, and each service corresponds to a database. This is the "database division" operation necessary for the system to develop to a certain stage.

Now the very popular microservice architecture is the same, if only split the application does not split the database, can not solve the fundamental problem, the whole system is also easy to reach the bottleneck.

sub-table

After we finish dividing the treasury, when will the tables be divided?

If the system is in a high-speed development stage, take the mall system for example, the number of orders per day may be hundreds of thousands, and the order table in the database will grow particularly fast, and the database query efficiency will decrease significantly when it grows to a certain stage.

Therefore, when the single-table data increment is too fast, the industry is circulating more than 5 million data volume to consider the sub-table. Of course, 5 million was just an experience value. Everyone could make a decision based on the actual situation.

How to divide the tables?

Sub-tables have several dimensions, one is horizontal segmentation and vertical segmentation, the other is sub-tables in single database and sub-tables in multiple databases.

Horizontal and vertical splitting

Take the user table for example. There are 7 fields in the table: id,name,age,sex,nickname,description. If nickname and description are not commonly used, we can split it into another table: user details table. In this way, a user table is split into user basic information table + user details table. The two tables have different structures and are independent of each other. However, from this point of view, vertical splitting does not fundamentally solve the problem of excessive data volume in a single table, so we still need to do a horizontal split.

split tables

There is also a split method, such as a table has ten thousand data, we split into two tables, id is odd: 1, 3, 5, 7... placed in user1, id is even: 2, 4, 6, 8... placed in user2, such a split method is horizontal split.

There are also many ways to split horizontally. In addition to splitting the table according to id mentioned above, you can also split it according to time dimension. For example, the order table can be split according to daily or monthly.

Daily tables: Store data only for the day.

Monthly table: You can start a regular task to migrate all the data of the previous day to the current month table.

History tables: You can also migrate data older than 30 days to the history table using timed tasks.

Summarize the characteristics of horizontal and vertical splitting:

Vertical segmentation: based on table or field division, table structure is different.

Horizontal segmentation: Based on data division, the table structure is the same, but the data is different.

Split within a single library and split across multiple libraries

Take horizontal splitting as an example, where each table is split into multiple subtables that exist in the same database. For example, the following user table is split into user 1 table and user 2 table.

single-library splitting

Splitting a table into several sub-tables in a database can solve the problem of single-table query performance to a certain extent, but it will also encounter a problem: single-database storage bottleneck.

So in the industry more or will sub-table split into multiple databases. For example, in the following figure, the user table is split into two sub-tables, which exist in different databases.

multi-library splitting

In a word: sub-table is mainly to reduce the size of a single table, to solve the performance problems caused by the amount of data in a single table.

Complexity brought by sub-library and sub-table

Since the sub-library table is so good, should we adopt this scheme at the beginning of the project? Don't be excited, calm down, sub-library sub-table does solve a lot of problems, but also brings a lot of complexity to the system, the following briefly said.

(1)Cross-database associative query

Before the single database is not split, we can easily use the join operation to associate multiple tables to query the data, but after the sub-database table, the two tables may not be in a database, how to use join?

There are several solutions:

Field redundancy: Put the fields that need to be associated into the main table to avoid join operations;

Data abstraction: data aggregation through ETL, etc., to generate new tables;

Global tables: For example, some basic tables can be placed in each database;

Application layer assembly: basic data will be found out, through the application program calculation assembly;

(2)distributed transaction

A single database can be handled with local transactions, but with multiple databases it can only be solved with distributed transactions.

Common solutions include reliable messaging (MQ) based solutions, two-phase transaction commit, flexible transactions, etc.

(3)Sorting, pagination, function calculation problems

When using SQL, keywords such as order by and limit need special processing. Generally speaking, the idea of fragmentation is adopted:

The corresponding function is executed on each shard first, and then the result set of each shard is summarized and recalculated again, and finally the result is obtained.

(4)Distributed ID

If you use Mysql database, you can use id self-increment as the primary key in a single database and table, but it will not work after sub-database and sub-table, and id duplication will occur.

Common distributed ID solutions include:

UUID

Maintain a separate ID table based on database self-increment

block mode

Redis Cache

Snowflake algorithm

Baidu uid-generator

Meituan Leaf

Tinyid

These programs will be written later in the article specifically introduced, here no longer expand.

(5)multiple data sources

After sub-database and sub-table, you may face obtaining data from multiple databases or multiple sub-tables. The general solution is: client adaptation and proxy adaptation.

Common middleware used in the industry are:

shardingsphere(formerly sharding-jdbc)

Mycat

Thank you for your reading. The above is the content of "What is the reason for the need to divide the library and table?" After studying this article, I believe everyone has a deeper understanding of what is the reason for the need to divide the library and table. The specific use situation still needs to be verified by practice. Here is, Xiaobian will push more articles related to knowledge points for everyone, welcome to pay attention!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report