Introduction of out-of-memory data structure of InnoDB 07/13 Update SLTechnology News&Howtos

Introduction of out-of-memory data structure of InnoDB

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly explains the introduction of the out-of-memory data structure of InnoDB. The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn the out-of-memory data structure of InnoDB.

Part Ⅰ tables and tablespaces

"Everything is a file..." This wise saying tells us that everything has to start with the document. So for the study of InnoDB out-of-memory data structure, we also start with tables and files.

1. Table (Table)

When we use CREATE TABLE to create a table, MySQL creates a .frm file and an .ibd file. The .frm file is a file that describes the definition of the table structure, while the .ibd file is specific to the InnoDB engine layer and is used to record the data of the InnoDB table. For example, create a table "jersey_test" under db.CCCtest with the following statement:

CREATE TABLE `jersey_ test` (`id` bigint (20) NOT NULL AUTO_INCREMENT, `requestId` char (64) NOT NULL COMMENT 'request', `type` smallint (6) NOT NULL DEFAULT' 0' COMMENT 'type', `name`varchar (64) NOT NULL COMMENT 'name', PRIMARY KEY (`id`), KEY `request` (`requestId`) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8

At the same time, we insert a record into the table as follows:

[data in jersey_test table]

We go to the / data directory of MySQL and we can see the "jersey_test.frm" file and the "jersey_test.ibd" file.

[files in CCCtest Library]

The .frm file is in binary format. After a little analysis by the hexdump tool, we can see that in addition to some coding information, the main content of the file is table structure information.

[.frm file contents]

This is consistent with the table structure description we saw using the DESC TABLE syntax.

[table structure description]

The analysis of the data file .ibd is similar.

II. Row Formats

The tables in InnoDB are stored in separate files, and the row data in the table is recorded in the file according to the corresponding format. Here we briefly summarize the file and line storage formats supported by InnoDB. The file format of InnoDB is specified by the parameter innodb_file_format, which supports both Antelope and Barracuda file formats. Barracuda is a new file format that includes the Antelope format. The Antelope file format supports two row storage formats, Compact and Redundant, and the new row storage formats supported by Barracuda are Compressed and Dynamic.

[InnoDB line storage format]

The row storage format of the InnoDB table is specified by the parameter innodb_default_row_format, and the default value in version 5.7 is Dynamic. The row storage format determines how the records in the table are stored in the file. Different row storage formats have their own special advantages and disadvantages, and will also affect the behavior of the database. For example, using the Compressed format allows row records to have a higher compression ratio, and the more rows a physical page can hold, the faster its index or record lookup and memory consumption will be, but the compressed data itself brings back additional overhead. Another thing to note is that when migrating database tables, you need to pay attention to whether the Row Format of the source instance and the target instance match. For example, if you have a MyISAM table to migrate to InnoDB, and the Row Format of the MyISAM table is the default Fixed, you need to change it to Dynamic, because the two formats are inconsistent with the processing of variable-length fields such as varchar/blob/text.

Table space (TableSpace)

As we mentioned earlier, each table in InnoDB has its own separate file, which actually uses its default behavior, even if a separate tablespace is used, which is controlled by the parameter innodb_file_per_table. In fact, InnoDB contains many types of tablespaces, including system tablespaces (System TableSpace), independent tablespaces (File-Per-Table TableSpace), and general tablespaces (General TableSpace).

The system table space stores InnoDB data dictionary (metadata information), system table, double write buffer (doublewrite buffer), Change Buffer and so on. If the parameter innodb_file_per_table is set to OFF, all table data is stored in the system tablespace. However, when using InnoDB, the more recommended method is to set innodb_file_per_table to ON, even if a separate tablespace is used, which has the following benefits:

When you delete a table using the Truncate Table and Drop Table commands, the system deletes the data file of the table directly, that is, reclaims the physical space. Using system tablespaces cannot recycle these physical spaces.

Similar to the above, the system can also reclaim physical space when using the syntax of rebuilding tables, such as OPTIMIZE TABLE or ALTER TABLE ENGINE = InnoDB

You can individually specify a table to the corresponding storage location, which may not be under the MySQL data directory. For example, if you want to use RAID or SSD to store a table, when you use a separate tablespace, you can use CREATE TABLE. DATA DIRECTORY this syntax to achieve.

There are also some potential problems with using independent tablespaces. For example, each table has its own separate file, which can easily lead to a waste of physical space, which will be more obvious if the database has many small tables. Common tablespaces (General TableSpace) can alleviate this problem. Common tablespaces can be thought of as a compromise between all-in-one (system tablespaces) and file-per-table. It allows you to use CREATE TABLESPACE syntax to create a large space, and then you can add some table data files to this space for storage, which are shared storage space.

Part Ⅱ Index

Index can be said to be the most important data structure of InnoDB, and there are many materials to introduce database index. What is an index? Index is actually an auxiliary structure that helps us to quickly find data (data). It can be said that indexes are needed where there is data. For example, in a file system, the index of the data is stored in the metadata inode information, which records the location of all the data pages of the file. For example, if the file has 10 pages, it corresponds to the physical address of the 10 page frames. Of course, the index of the file system will also have direct index and indirect index, because if the direct index cannot be loaded, it will be loaded with a secondary index, and its structure is shown in the following figure.

[file system inode structure]

The most common (and possibly the fastest) index in the operating system is probably the mapping table of virtual addresses to physical addresses. The reason why it is fast, first of all, is that it is continuous, when you do cross-page access, you do not need to calculate the address of the next page, in addition, the address translation is done by the special hardware MMU, the hardware must be faster. It is conceivable that if the index stored in the file or database also uses a virtual address mapping table plus hardware acceleration, it will certainly be faster than the existing way. Mapping the virtual address directly to the process address space can also reduce the overhead of entering the kernel state.

Back to the index structure of InnoDB, the index of InnoDB uses the data structure of B+ tree, and the row data in InnoDB table is organized by clustering index (clustered index), which is also called primary key index (primar key), that is, the leaf node of this B+ tree stores the whole row of data corresponding to the primary key. It is important to have a self-incrementing primary key index for each table in InnoDB, and it needs to be self-incremented because it can be inserted continuously and appended when inserting new records, which reduces the extra overhead of index lookup and index page splitting. Other indexes in InnoDB are called secondary indexes (secondary index), and the leaf nodes of secondary indexes store the values of primary key indexes. Therefore, when using secondary indexes to query records, they will first find the values of primary key indexes through secondary indexes, and then find row records through primary key indexes.

Part Ⅲ recovery log 1, redo log and rollback log (Redo Log & Undo Log)

In InnoDB, data consistency is guaranteed by Redo Log, which uses the WAL (Write-Ahead Logging) mechanism, which writes logs first and then writes data. InnoDB uses this method to redo the logs in Redo Log during failure recovery, that is, to reexecute uncommitted transactions in the system. By default, Redo Log is recorded in the ib_logfile0 and ib_logfile1 files on disk, and MySQL writes these two files cyclically. Therefore, Redo Log will be full. Here we need to introduce the checkpoint mechanism in the log. Checkpoint records where the current log of the entire system has been synchronized, that is, transactions before checkpoint are committed transactions, and there will be no data inconsistencies. When MySQL writes Redo Log records to catch up with checkpoint, Redo Log is full, and you need to wait for Redo Log to synchronize data and free space.

On the other hand, in a storage system such as a database, it is common for update operations to fail and rollback, so special attention needs to be paid to this situation, and Undo Log is used to solve this problem. Undo Log records what reverse operations should be taken when an update operation fails and needs to be rolled back. That is, when you insert a record, a corresponding delete record is recorded in Undo log, and vice versa.

II. Binlog

Redo Log solves the problem of consistency of local data (in this case, single point instances). However, in order to make the database highly available, we also need to consider multi-copy or cross-region and cross-region disaster recovery. MySQL Binlog provides this capability, and Binlog supports Statement,Row and Mixed modes. Among them, Row mode records the modification operation of each row of data, compared with Statement mode, it can ensure the correctness of master-slave replication.

As mentioned earlier, Redo Log and Binlog must be used at the same time to achieve data consistency and high availability. Let's take a brief look at how the database does this when inserting or updating the database. MySQL uses the WAL mechanism for update operations, that is, write Redo Log and Binlog first, and then write data. Writing Redo Log and Binlog must be atomic, either updating successfully or failing, otherwise it will cause inconsistency between local data and other replica data. The process of updating Redo Log and Binlog is called a two-phase commit, and the steps are:

Write the updated operation to Redo Log first, and the process is marked as prepare status

Update Binlog. You need to flush BinLog back to disk to be considered a success.

The transaction is committed (the transaction Undo log is also cleared at this time) and the process is marked as commit.

Two-phase commit can ensure the consistency of data, and it can be recovered if it fails abnormally at any stage. For example, if the transaction is already in commit state, both Redo Log and Binlog have been updated successfully; if it is in prepare state, you need to determine whether there is complete information in BinLog, if so, commit will be performed, and if there is no complete information, the whole transaction will be rolled back.

Thank you for your reading, the above is the content of "introduction of InnoDB's out-of-memory data structure". After the study of this article, I believe you have a deeper understanding of the introduction of InnoDB's out-of-memory data structure, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.