In-depth Exploration of MySQL InnoDB Storage engine 07/02 Update SLTechnology News&Howtos

In-depth Exploration of MySQL InnoDB Storage engine

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Preface

InnoDB belongs to the storage engine layer in MySQL and is integrated in the database in the form of plug-ins. Starting with MySQL5.5.8, InnoDB has become its default storage engine. InnoDB storage engine supports transactions, and its design goal is mainly for OLTP applications. Its main features are: transaction support, row lock design support for high concurrency, foreign key support, automatic crash recovery, clustered index organization table structure and so on.

System structure

InnoDB storage engine is composed of memory pool, background thread and disk storage.

Thread

InnoDB uses a multithreaded model, in which there are several different threads in the background to handle different tasks.

Master Thread

Master Thread is the most core background thread, which is mainly responsible for asynchronously refreshing the data in the buffer pool to disk to ensure the consistency of the data. It includes dirty page refresh, merge insert buffer, UNDO page recycling and so on.

IO Thread

Asynchronous IO (Async IO) is widely used in the InnoDB storage engine to handle write IO requests, and IO Thread is mainly responsible for the callback of these IO requests.

Purge Thread

After a transaction commits, the undo log it uses may no longer be needed, so Purge Thread is needed to recycle UNDO pages that have been allocated and used. InnoDB supports multiple Purge Thread, which speeds up the recycling of UNDO pages, increases CPU utilization, and improves the performance of the storage engine.

Page Cleaner Thread

The function of Page Cleaner Thread is to replace the dirty page refresh operation in Master Thread. Its purpose is to reduce the work of the original Master Thread and the blocking to the user query thread, and further improve the performance of the InnoDB storage engine.

Memory

The structure of InnoDB storage engine memory

Buffer pool

The InnoDB storage engine is based on disk storage and manages the records in it as pages. However, because of the gap between CPU speed and disk speed, disk-based database systems usually use buffer pool records to improve the overall performance of the database.

In fact, the buffer pool is to compensate for the impact of slow disk speed on database performance through the speed of memory. When the database reads, the pages in the disk are first put into the buffer pool, and the next time the same page is read, the page data is first obtained from the buffer pool, which acts as a cache.

The data modification operation first modifies the page data in the buffer pool and then flushes it to disk using a mechanism called Checkpoint.

The size of the buffer pool directly affects the overall performance of the database. For the InnoDB storage engine, the buffer pool configuration is set by the parameter innodb_buffer_pool_size. Use the SHOW VARIABLES LIKE 'innodb_buffer_pool_size' command to view the buffer pool configuration:

Mysql > SHOW VARIABLES LIKE 'innodb_buffer_pool_size'\ gateway * 1. Row * * Variable_name: innodb_buffer_pool_size Value: 1342177281 row in set (sec)

The types of data pages cached in the buffer pool are index pages, undo pages, insert buffers, adaptive hash indexes, InnoDB lock information, data dictionary information and so on. Index pages and data pages account for a large part of the buffer pool.

Redo log buffering

When the page data in the buffer pool is newer than the disk, the new data needs to be flushed to the disk. InnoDB uses Write Ahead Log strategy to refresh data, that is, when a transaction commits, the redo log buffer is written first, and the redo log buffer is flushed to the reset log file at a certain frequency, and then the dirty pages are flushed to disk according to the checkpoint mechanism.

The redo log buffer does not need to be set up too much, and 8m is usually suitable for most application scenarios. The redo log supports three situations to trigger refresh:

Master Thread flushes the redo log buffer to the redo log file every second when the remaining space in the redo log buffer pool is less than 1 / 2, the redo log buffer flushes to the redo log file every time the transaction commits

Extra memory pool

In the InnoDB storage engine, memory is managed through a method called memory heap. When allocating memory for some data structures themselves, you need to apply from an additional memory pool, and when there is not enough memory in that area, it is requested from the buffer pool.

Lock

The locks supported by InnoDB are:

Shared lock and exclusive lock intention lock record lock gap lock self-increasing lock

Shared lock and exclusive lock

The InnoDB engine implements two standard row-level locks, shared lock (shared (S) locks) and exclusive lock (exclusive (X) locks). A shared lock allows a transaction that owns a lock to read a row of data, while an exclusive lock allows a transaction to write to a row of records.

If a transaction holds a shared lock, other transactions can still acquire the shared lock for that row of records, but not the exclusive lock for that row of records. When a transaction acquires an exclusive lock for a row, other transactions can no longer acquire the shared and exclusive locks for that row of records.

Intention lock

In InnoDB, an intention lock is a table-level lock that is divided into shared and exclusive locks:

Intention shared lock: to get a row's shared lock intention exclusive lock: to get a row's exclusive lock

The transaction must acquire the intention share / exclusive lock before acquiring the shared / exclusive lock. The intention lock does not block any other operations on the table, but simply tells other transactions that he is going to acquire a shared lock or exclusive lock for a row.

Record lock

A record is a lock on an index that locks the index of a record rather than the record itself. if the current table does not have an index, InnoDB will create a hidden clustered index for it, while Record Locks will lock the hidden clustered index.

Gap lock

The gap lock acts on the index as well as the record lock, except that the record lock only acts on one index record and the gap lock can lock an index in a range. The only function of the gap lock in InnoDB is to prevent the insertion of other transactions, thus preventing the occurrence of phantom reading.

Self-increasing lock

A self-incrementing lock is a special table-level lock that works only when inserting columns that contain self-incrementing columns. When a transaction is inserting a piece of data, any other transaction must wait for the entire transaction to complete the insert operation before acquiring the lock to perform the insert operation.

Business

ACID

Transaction is the most important feature of database as OLTP. When talking about transaction, we have to mention four basic features of ACID:

Atomicity (Atomicity): transaction minimum unit of work, either full success or full failure consistency (Consistency): after the start and end of a transaction, the integrity of the database is not broken isolation (Isolation): different transactions do not affect each other, the four isolation levels are RU (read uncommitted), RC (read committed), RR (repeatable read), SERIALIZABLE (serialization) persistence (Durability): after transaction commit The modification of the data is permanent, even if the system failure will not be lost.

The atomicity, persistence and consistency of InnoDB are mainly accomplished through Redo Log, Undo Log and Force Log at Commit mechanisms. Redo Log is used to recover data in the event of a crash, Undo Log is used to undo the impact of transactions, and can also be used for multi-version control. The Force Log at Commit mechanism ensures that the Redo Log logs have been persisted after the transaction is committed. Isolation is guaranteed by locks and MVCC.

Isolation level

In MySQL, transactions have four levels of isolation, which are:

Read Uncommitted uncommitted read Read Committed committed read Repeatable Read repeatable readable Serializable serializable

Before we can understand the four isolation levels, we need to understand three other nouns:

Dirty reading

The a transaction reads the data that the b transaction has not yet committed, but the b transaction rolls back for some reason, so that the data read by the a transaction is not available, resulting in some abnormal results.

Non-repeatable

A certain data is queried several times in a transaction cycle, and the data is update or delete operated in a b transaction. Then the result of each query for a transaction may be different.

Illusory reading

In fact, the result of phantom reading is the same as that of unrepeatable reading. the difference is that unrepeatable reading is mainly update and delete for other transactions. Phantom reading is mainly aimed at insert operations. That is, during the life cycle of a transaction, the newly inserted data of another transaction is queried.

Read uncommitted did not submit read

Uncommitted read, in this case, one transaction a can see the uncommitted data of another transaction b. If transaction b rolls back, then transaction a gets dirty data, which is the meaning of dirty reading.

This isolation level is generally not recommended in MySQL InnoDB.

Read Committed has been submitted for reading

Committed reads, any changes made by one transaction from the beginning to the commit are not visible to other transactions. The problem of dirty reading is solved, but there is a phenomenon of phantom reading.

Repeatable Read can be read repeatedly

Repeatable, this level ensures that the results of multiple reads of the same record in the same transaction are consistent, and both phantom and non-repeatable reading problems are solved in the InnoDB storage engine.

The InnoDB engine solves the problem of phantom reading by using Next-Key Lock. Next-Key Lock is a combination of row lock and gap lock. When InnoDB scans an index record, it will first add a row lock (Record Lock) to the index record, and then add a gap lock (Gap Lock) to the gap on both sides of the index record. After the gap lock is added, other transactions cannot modify or insert records in this gap.

Serializable serializable

Serializable is the highest isolation level, which avoids the problem of phantom reading by forcing the serial execution of transactions, but Serializable will lock every row of data read, so it may lead to a large number of timeout and lock contention problems, so concurrency drops sharply, and it is also not recommended in MySQL InnoDB.

Open a transaction

BEGIN 、 BEGIN WORK 、 START TRANSACTION

Executing the BEGIN command does not really open a new transaction at the engine level, but simply sets a flag for the current thread to represent a transaction that is explicitly opened.

START TRANSACTION READ ONLY

When a read-only transaction is enabled, when MySQL Server receives any SQL of data changes, it will directly reject the modification and return an error, which I will not enter the engine layer.

START TRANSACTION READ WRITE

Allows super users to start a read-write transaction when the current thread's read-only state is true.

START TRANSACTION WITH CONSISTENT SNAPSHOT

Opening the transaction will enter the engine layer and open a readview. This operation is valid only at the RR isolation level, otherwise an error will be reported.

Undo log

The corresponding undo log is recorded when the data is modified, and if the transaction fails or rollback, you can use the recorded undo log to roll back. Undo log is a logical log that records the mirroring of data before changes. When modifying, if you need to read the current data at the same time, it can analyze the previous version of the data recorded by the row according to the version information. In addition, Undo log also generates redo logs because Undo log is also protected by persistence.

Transaction commit

Use the global transaction ID generator to generate the transaction NO, and add the currently connected transaction pointer (trx_t) to the global commit transaction list (trx_serial_list) to mark undo. If the transaction uses only one UndoPage and the usage is less than 3 Page, mark the Page as TRX_UNDO_CACHED, if it is not satisfied and insert undo, mark it as TRX_UNDO_TO_FREE, otherwise undo is marked as TRX_UNDO_TO_PURGE. Undo marked TRX_UNDO_CACHED will be recycled by the engine. Put update undo into the history list of the undo segment where it is located, and increment the rseg_history_len (global). At the same time, update the TRX_UNDO_TRX_NO on the Page, if the data is deleted, reset the delete_mark to delete the undate undo from the update_undo_list, and if it is marked as TRX_UNDO_CACHED, join the update_undo_cached queue mtr_commit (the log undo / redo is written to the public buffer), and the transaction commits at the file level. At this time, even if crash, the transaction can still be committed after restart. The next thing to do is update the status of in-memory data (trx_commit_in_memory) read-only transactions simply remove readview from the global readview linked list and then reset the information in the trx_t structure. The first step for a read-write transaction is to set the transaction state to TRX_STATE_COMMITTED_IN_MEMORY, release all row locks and remove trx_t from the rw_trx_list and readview from the global readview linked list. If there is an insert undo, remove it here. If there is a update undo, wake up the purge thread for garbage cleaning, and finally reset the information in the trx_t to facilitate the use of the next transaction.

Roll back

If it is a read-only transaction, it directly returns to determine whether to roll back the whole transaction or part of the transaction. If it is a partial transaction, record how many Undo log needs to be retained. If it is a partial transaction, roll back the last undo from update undo and insert undo. If it is update undo, the record cleaning mark will be marked as deleted, and the updated data will be rolled back to the oldest version. If it is insert undo, delete the clustered index and secondary index directly. If all undo has been rolled back or rolled back to the specified undo, stop and delete Undo log.

Indexes

The InnoDB engine uses the B+ tree as the index structure, the leaf node data field of the primary key index stores the complete field data, and the leaf node of the non-primary key index stores the value data pointing to the primary key.

The figure above is a schematic diagram of the InnoDB main index (which is also a data file). You can see that the leaf node contains the complete data record, which is called a clustered index. Because the InnoDB data file itself is aggregated by the primary key, InnoDB requires that the table must have a primary key. If it is not explicitly specified, the MySQL system will automatically select a column that can uniquely identify the data record as the primary key. If there is no such column, MySQL automatically generates an implied field for the InnoDB table as the primary key, this field is 6 bytes long and the type is long shaping.

The secondary index data domain of InnoDB stores the value of the primary key of the corresponding record instead of the address. In other words, all secondary indexes of InnoDB refer to the primary key as the data domain. The implementation of clustered index makes the search by primary key very efficient, but the secondary index search needs to retrieve the index twice: first retrieve the secondary index to obtain the primary key, and then use the primary key to retrieve the record in the primary index.

End

This article only introduces a small part of the many features of MySQL InnoDB. Interested students can read "inside MySQL Technology: InnoDB Storage engine" to learn more about it.

All right, that's all of this article. I hope the content of this article has a certain reference and learning value for everyone's study or work. If you have any questions, you can leave a message and exchange. Thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.