What is the reason why the total storage becomes larger after MySQL sub-database and sub-table? 07/11 Update SLTechnology News&Howtos

What is the reason why the total storage becomes larger after MySQL sub-database and sub-table?

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "what is the reason for the increase of total storage after MySQL sub-database and sub-table". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

1. Background

After completing a sub-table project, it is found that after the data migration of the sub-table, the storage capacity of the new database is much larger than the size of the original two tables. After doing some query and understanding, the optimization is completed.

In retrospect, we need to know more about why this happened.

Similar to the question of the title is why the contents of the table data are deleted and the table size does not change. It's all the same in essence.

To answer these questions, we need to start with mysql's index model.

The Index Model of 2.InnoDB

In MySQL, indexing is implemented at the storage engine layer, so there is no uniform index standard, that is, indexes of different storage engines work differently.

Even if multiple storage engines support the same type of index, the underlying implementation may be different. Since the InnoDB storage engine is the most widely used in MySQL databases, we will take InnoDB as an example to analyze the index model.

In InnoDB, tables are stored in the form of indexes according to the order of primary keys, which is called index organization tables. In InnoDB, the B + tree index model is used, so the data is stored in the B + tree, and each index corresponds to a B + tree.

Suppose we have a table with a primary key listed as ID, a field k in the table, and an index on k, with the following statement

CREATE TABLE `t` (

`id`int (11) NOT NULL

`k` int (11) NOT NULL

`name` varchar (16) DEFAULT NULL

PRIMARY KEY (`id`)

KEY `k` (`k`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8

The (ID,k) values of R1~R5 in the table are (10), (20), (30), (5) and (70), respectively. An example diagram of the B+ tree of index id and index k is shown below.

According to the content of leaf node, the index type is divided into primary key index and non-primary key index. The leaf node of primary key index stores the whole row of data R1~R5, and the content of non-primary key index leaf node is the value of primary key.

As can be seen from the figure, queries based on non-primary key indexes need to scan one more index tree to find the corresponding data. As a digression, we should try our best to use primary key query in our application.

3. Index maintenance

In order to maintain the order of the index, B+ tree needs to do necessary maintenance when adding, deleting and modifying data.

Suppose we want to delete the record R4, and the InnoDB engine will only mark the record R4 as deleted. This location may be reused later if you want to insert a record with an ID between 300,600.

If all records on a data page are deleted, the entire data page can be reused. Further, what if we delete the data from the entire table with the delete command? As a result, all data pages associated with this table are marked as reusable.

However, in any case, the size of the disk file does not shrink.

These spaces that are marked as reusable and are not actually used are just "storage holes".

In fact, it is not only deleting data that creates holes, but also inserting data.

As an example in the above figure, if you insert a new row with an ID value of 80, you only need to insert a new record after the record in R5.

If the newly inserted ID value is 60, it is relatively troublesome, and you need to logically move the following data to vacate the location.

To make matters worse, if the data page where R5 is located is full, according to the B+ tree algorithm, you need to apply for a new data page and move some of the data over. This process is called page splitting. In this case, performance will naturally be affected.

In addition to performance, the page split operation also affects the utilization of data pages. The data originally placed on one page is now divided into two pages, and inserting a record actually reduces the overall space utilization by about 50%.

As you can see, because page 2 is full, when you insert a data with an ID of 60, you have to apply for a new page page 3 to save the data.

After the page split is complete, a hole is left at the end of page 2 (note: in fact, there may be more than one record position is empty).

In addition, updating the value on the index can be understood as deleting an old value and inserting a new value. It is not difficult to understand that this can also cause emptiness.

Therefore, after a large number of additions, deletions and changes of the table, there may be a large "data hole".

Therefore, we can explain why the total storage after the sub-table becomes larger.

Because after dividing the table, it is necessary to synchronize all the data from the old database to the new database, and the data synchronization platform opens multiple threads for synchronization, and inserting each sub-table is not inserted in an incremental order, so there will be a huge amount of "data holes". Resulting in larger storage space.

If we can get rid of these holes, we can achieve the purpose of shrinking the table space. Rebuilding the table can achieve this goal.

4. Rebuild the table

If we rebuild a table manually, we can create a new table B with the same structure as table A, and then read the data row by row (that is, incrementally) from table An and insert it into table B in the order in which the primary key ID is incremented. Because Table B is a newly created table, the holes in the index of the primary key of Table A no longer exist in Table B. Obviously, the primary key index of Table B is more compact and the utilization of data pages is higher. If we take Table B as a temporary table, after the operation of importing data from Table A to Table B is completed, replace A with Table B. in terms of effect, it will shrink the space of Table A.

Here, you can use the alter table An engine=InnoDB command to rebuild the table. Before MySQL version 5.5, the execution process of this command was similar to that described earlier, except that the temporary table B does not need to be created by you. MySQL will automatically dump data, exchange table names, and delete old tables. Obviously, the most time-consuming step is the process of inserting data into a temporary table, which can result in data loss if new data is written to Table A. Therefore, there can be no updates in Table A throughout the DDL process. In other words, this DDL is not Online's.

Online DDL, which was introduced in MySQL version 5.6, optimizes this operation process.

Create a temporary file and scan all the data pages of the table A primary key

Generate a B+ tree from the records of Table An in the data page and store it in a temporary file

During the generation of temporary files, all operations on An are recorded in a log file (row log)

After the temporary file is generated, apply the operations in the log file to the temporary file to get a data file with the same logical data as Table A; (the process of applying row log may have page splits again)

Replace the data file in Table A with a temporary file.

As you can see, in this process, due to the existence of log file recording and playback operation, this scheme allows table A to be added, deleted and modified in the process of rebuilding the table. This is where the name Online DDL comes from.

It should be added that the above reconstruction methods scan the original table data and build temporary files. For large tables, this operation consumes IO and CPU resources. Therefore, if it is an online service, you should be very careful to control the operating time.

The differences between optimize table, analyze table, and alter table in rebuilding tables:

Since MySQL version 5.6, alter table t engine = InnoDB (that is, recreate) defaults to the above online DDL process

In fact, analyze table t is not rebuilding the table, but only re-counting the index information of the table, without modifying the data. MDL read locks are added in this process.

Optimize table t equals recreate+analyze.

This is the end of the content of "what is the reason why the total storage has become larger after MySQL sub-database and sub-table". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.