How to divide the database and table in big data 07/06 Update SLTechnology News&Howtos

How to divide the database and table in big data

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Big data in how to carry out sub-database sub-table, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.

I. the problems existing in the single database and single table

Suppose you want to design an e-commerce website, in the beginning, User table, Order table, Product table and so on are all in the same database, each table contains a large number of fields. When the number of users is relatively small and the number of visits is relatively small, there is no problem with single database and single table.

However, the company may develop better, the number of users begins to increase greatly, and the business is becoming more and more complicated. There may be dozens or even hundreds of fields in a table, and a table stores a lot of data, up to tens of millions of data, and what's even worse is that there are a lot of such tables. So the pressure on a database is too great, and the pressure on a table is also great. Just imagine, when we query data in a table with tens of millions of data, the pressure is already great. If this table still needs to be related to the query, the pressure on time and so on will be even greater.

(1) the single database is too large: there are too many tables in the database to fit the disk space of the server, and CPU is too busy for too many IO times.

(2) A single table is too large: there are too many fields and too much data in a table. It is difficult to inquire.

At this point, we begin to think about how to solve the problem.

Second, master-slave replication architecture

Under the single database and single table, it is more and more difficult to meet the demand, so we first consider the separation of read and write. We separate the write and read operations of the database, use multiple copies of the slave library (Slaver) to read, use the master library (Master) to write, update the data synchronously from the master database, and keep the data consistent.

This can solve the problem to a certain extent, but when there are a large number of users, such as hundreds of millions of users, there will be more and more write operations, and a master library (Master) can not meet the requirements, so split the main database. In order to ensure data consistency, synchronization will begin, which will bring a series of problems:

(1) it is difficult to expand the write operation because it is necessary to ensure the data consistency of multiple main databases.

(2) replication delay: it means the time consumption caused by synchronization.

(3) the rate of locking table increases: the reading and writing are separated, the hit rate is less, and the probability of locking table is increased.

(4) when the table becomes larger, the cache rate decreases: once the cache rate decreases, it will bring time consumption.

Note that at this time, master-slave replication is still a single database and single table, but many copies are copied and synchronized.

With the increase of the number of users, visits and data, the master-slave replication architecture will still bring a lot of problems, so we should consider another solution. This is the topic we are talking about today, sub-library and sub-table.

III. Sub-database and sub-table

Whether it is sub-library or sub-table, there are two ways of segmentation: horizontal segmentation and vertical segmentation. Let's take a look at how to split it.

1. Sub-table

(1) Vertical subtable

There are many fields in the table, and the less commonly used ones with larger data and longer length are generally split into the "extended table". In general, there may be hundreds of columns in the field of adding a table, which is cut vertically according to the number of fields. Note that the vertical score is in the case of more columns.

(2) horizontal subtable

The amount of data in a single table is too large. According to certain rules (RANGE,HASH, etc.), it is divided into multiple tables. But these tables are still in the same library, so database operations at the library level still have IO bottlenecks. This situation is not recommended because the amount of data increases gradually and needs to be segmented when the amount of data increases to a certain extent. It's troublesome.

2. Sub-library

(1) Vertical sub-library

There are too many tables in one database. At this point, it will be cut vertically according to certain business logic, such as user-related tables in a database and order-related tables in a database. Note that different databases should be stored on different servers at this time, and disk space, memory, TPS, and so on will be resolved.

(2) horizontal sub-library

It is troublesome to split the horizontal database in theory, it refers to dividing the data of a single table into multiple servers, each server has a corresponding library and table, but the data set in the table is different. Horizontal sub-library and sub-table can effectively alleviate the performance bottleneck and pressure of single machine and single library, and break through the bottleneck of IO, connection number, hardware resources and so on.

IV. Problems after sub-database and sub-table

1. Difficulty in federated query

Union queries are not only difficult, but also impossible, because two associated tables may be distributed in different databases and on different servers.

2. Transactions need to be supported

After sub-library and sub-table, you need to support distributed transactions. The database itself provides us with the function of transaction management, but it is not applicable after dividing the database and tables. If we program to coordinate transactions ourselves, there will be trouble with the code again.

3. Cross-database join is difficult.

After sub-database and sub-table, the association operation between tables will be restricted. We cannot join tables in different sub-databases, nor can we join tables with different granularity. As a result, businesses that can be completed with one query may require multiple queries. We can use global tables and make a copy of all the libraries.

4. as a result, the combination of troubles

For example, we have purchased goods, the order table may be split, and so on, so it is more difficult to merge.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.