An overall summary of the concept of InnoDB storage engine 07/08 Update SLTechnology News&Howtos

An overall summary of the concept of InnoDB storage engine

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

This article mainly introduces the overall summary of the concept of InnoDB storage engine, which involves things, learned from the theoretical knowledge, there are many books, literature for your reference, from a practical point of view, accumulated years of practical experience can be shared with you.

MySQL corresponds to InnoDB version

MySQL 5.1 "InnoDB 1.0.X

MySQL 5.5 "InnoDB 1.1.X

MySQL 5.6 "InnoDB 1.2.X

Background thread

1.Master Thread

It is responsible for asynchronously refreshing the data in the buffer pool to the disk to ensure the consistency of the data, including refreshing dirty pages, merging insert buffers, and recycling undo pages.

2.IO Thread

AIO (Async IO) is widely used in the innodb storage engine to process write IO requests to improve the concurrent performance of the database. There are four types of IO threads: insert buffer thread, log thread, read thread, write thread. Read thread and write thread each have four threads, which can be configured through innodb_read_io_threads and innodb_write_io_threads.

SHOW VARIABLES LIKE 'innodb_%io_threads' or SHOW ENGINE INNODB STATUS\ G

3.Purge Thread thread

The purge Thread thread is used to recycle the undo pages allocated after the transaction is committed, which is turned on by default, and multiple Purge Thread threads can be configured through innodb_purge_threads=1.

Show variables like 'innodb_purge_threads'; is configured with 2 Purge thread. You can only modify the configuration of the configuration file, not innodb_purge_threads=2 online.

4.Page Cleaner Thread

It is used to recycle dirty pages generated by delete and update operations in multi-version control functions, and to perform flushing dirty pages to disk.

5.Binlog Dump thread

When replication is configured, a binlog Dump thread is generated on the primary cloud server to read the binary modification record.

6.lock thread

For lock control and deadlock detection

Memory

Do not understand that there is only innodb buffer in memory, including redo log buffering and additional memory pools (it is not known whether join buffer, order buffer, key buffer, table cache buffer, etc., are in the buffer pool or independent of the buffer pool in memory)

1. Cache pool

The cached data mainly includes data page, index page, redo log page (undolog), node information, system data, insert buffer, adaptive hash index, data dictionary, lock information, etc.

Check the size of the buffer pool (in bytes). Conversion to MB requires / 1024/1024show variables like 'innodb_buffer_pool_size'; default innodb has 8 buffer pools. You can query show engine innodb status\ G by configuring innodb_buffer_pool_instances, or SELECT * FROM information_schema.innodb_buffer_pool_status

Read operation:

The data is stored in pages, and many data pages are cached in the buffer pool. when the first read, the page is first read from the disk to the cache pool. if the next time you read the same data page in the cache pool, it is also in the buffer pool to read directly from the buffer pool without the need to read to the disk, the best way is to cache all the disk data in the buffer pool, but only if the memory is large enough.

Modify operation

The innodb storage engine also modifies the data pages in the buffer pool first (if they exist), and then refreshes the data files to disk according to a certain frequency, which involves the checkpoint mechanism.

Insert operation (insert buffer)

Because the data is arranged in the order of the clustered index, all inserts for the clustered index are generally very fast, while the insertion of the non-clustered index is not necessarily sequential. At this time, discrete access to the nonclustered index page is required. The performance of the insert is often very poor. One exception may be the time field of the nonclustered index, and the time is often sequential, which is relatively fast. Insert buffering is introduced for this case of nonclustered indexes.

Insert buffer (insert buffer) is introduced in innodb. Insert buffer only targets non-unique nonclustered indexes. Inserting and updating nonclustered indexes is not directly inserted into the index file every time, but first determines whether the inserted nonclustered index page exists in the buffer pool. If so, it is directly inserted into the nonclustered index file of the buffer pool, otherwise it is first placed in an insert buffer object. But it gives people the impression that it has been inserted into the index file, but in fact it is not, and then inserted into the index file at a certain frequency. In this process, if there are multiple identical index pages inserted, it will be merged and inserted, which greatly improves the insertion performance of nonclustered indexes.

Because each insert is first inserted into the buffer pool without looking for the index page to determine the uniqueness of the record, because to make the judgment requires discrete lookup, so the insert buffer is not aimed at the unique nonclustered index.

In the case of intensive write operations, inserting buffers will take up too much memory of the buffer pool, which can account for up to 50% by default. If the IBUF_POOL_SIZE_PER_MAX_SIZE=2 in the source code is modified to 3, the maximum memory of the buffer pool of 1max 3 can only be used.

2.LRU List 、 Free List 、 Flush List

The default size of the pages in the innodb buffer pool is 16KB, and the buffer pool is managed by the LRU (the latest Latest Recent Used rarely used) algorithm. The most frequently used pages are placed at the front of the LRU list, while the recently rarely used pages are placed at the end. When there is insufficient space in the cache pool, the pages at the end will be deleted first to free up space. LRU has a midpoint location, which defaults to 37% of LRU. The left side represents the old list and the right side represents the new list (hotspot data). The newly inserted page in the buffer pool is first placed in the midpoint position. If the newly inserted page is moved to the new list, it may cause some activities in the new list to be removed from the old list. For example, a table scan operation may need to access a lot of data pages at one time, and these data pages may be rarely used in the future, when will the newly inserted pages be put into the new list? in order to solve this problem, innodb introduced the innodb_old_blocks_time parameter, which is used to control how long the newly inserted data page is added to the new list after the mid location.

Check the location of midpoint. If you think the hot data space needs more, you can set this value to small show variables like 'innodb_old_blocks_pct' query innodb_old_blocks_ time (in milliseconds). The default is 1000 milliseconds, that is, 1 second show variables like' innodb_old_blocks_time'.

View the information of all the pages in the buffer pool, including free pages, all data pages * 16KB is actually the total size of the buffer pool. Select * from information_schema.INNODB_BUFFER_PAGE Check the LUR list information, including new list and old list but not free list, the fields in the table record the current data page information, including buffer pool ID, page type (data page, index page, undo log, other), table name, index name, whether the page is old list, whether it belongs to compressed page (can compress the original 16K page to 1K, 2K, 4K, 8K), the size of the compressed page, whether it belongs to dirty page. Select POOL_ID,LRU_POSITION,SPACE,PAGE_TYPE,FLUSH_TYPE,NEWEST_MODIFICATION,OLDEST_MODIFICATION,INDEX_NAME,DATA_SIZE,COMPRESSED_SIZE,COMPRESSED,IS_OLD from information_schema.INNODB_BUFFER_PAGE_LRU;OLDEST_MODIFICATION > 0 indicates the number of dirty pages, that is, (modified db pages) IS_OLD='YES' represents OLD List pages, COMPRESSED0 represents compressed pages.

Flush list: the value is the dirty page in LRU, and flust list exists in New List, that is, OLDEST_MODIFICATION > 0 (modified db pages)

3. Log buffer (log buffer): corresponding to innodb log files

View the redo log buffer show variables like 'innodb_log_buffer_size%'

The InnoDB storage engine first puts the redo log information into the redo log buffer, and then flushes it to the redo log file at a certain frequency. The default buffer size is 8m, which can basically meet the requirements, and there is no need to configure too large redo log buffers.

Refresh mechanism:

1.Master Thread flushes the redo log buffer to the redo log file every second

two。 The redo log buffer is flushed to the redo log file when each transaction is committed

3. When the remaining space of the redo log buffer pool is less than 1 / 2

Note: the size of the innodb_log_buffer_size should be larger than the maximum transaction size, otherwise the transaction needs to be refreshed before the innodb_log_buffer_size is full, which will cause a transaction to perform disk log refresh operations multiple times, resulting in low efficiency.

4. Extra memory

Normally, the memory used by our server MySQL process is larger than that of the configured InnoDB buffer pool, because MySQL needs part of the memory to control some resource information inside the buffer pool, such as LRU, lock resources, waiting, etc., in addition to the memory cached in the buffer pool.

CheckPoint mechanism

In order to solve the problem of CPU and disk direct speed, the buffer pool is used, so the operation of the data is completed in the buffer pool first. The data pages in the buffer pool are often newer than the data pages on the disk. The data pages that have been modified in the buffer pool but have not yet been applied to the disk are called "dirty pages". The data pages eventually need to be updated to the disk, which will involve the CheckPoint mechanism.

At the same time, in order to solve the problem of the loss of dirty pages in the buffer pool that have not yet had time to flush to disk due to a sudden server outage, redo log files are added (2 redo log files are configured by default, the default name is ib_logfile, the default size of redo log files is 48m, and the two redo log files are written in a loop), and the redo log is written first when the transaction is committed. When a server outage occurs, you can complete the recovery by redoing the log (you will recover by default after the server restarts), so make sure that the redo log file has free space. The default mechanism is to refresh some dirty pages to disk and empty the corresponding redo log space when the redo log file space reaches 75%-90%.

How many pages are refreshed to disk at a time:

Sharp Checkpoint: refresh all dirty pages back to disk when the database is closed. Default method. Parameter: innodb_fast_shutdown=1.

Fuzzy Checkpoint: refresh some dirty pages, which are divided into the following four situations

1.Master Thread Checkpoint

Master Thread flushes dirty pages back to disk from the buffer pool every few seconds

2.FLUSH_LRU_LIST CheckPoint

After version 5.6, you need to ensure that there are 1024 available pages in LRU by default. If less than 1024 pages are available, refresh some dirty pages back to disk and configure them with the parameter "innodb_lru_scan_dapth".

3.Async/Sync Flush Checkpoint

It refers to the synchronous or asynchronous refresh of dirty pages back to disk caused by insufficient redo log file space. Asynchronous refresh is triggered when the redo log space reaches 75%. Synchronous refresh is triggered if the redo log space is more than 90%. Synchronous refresh operation is generally not triggered unless the redo log file is too small and the BULK INSERT operation of LOAD DATA is performed.

4.Dirty Page too much

Ensure the proportion of dirty pages in the buffer pool. When the proportion of dirty pages in the buffer pool reaches 75%, the operation of refreshing dirty pages is triggered and configured with the parameter "innodb_max_dirty_pages_pct".

After reading the overall introduction of the concept of InnoDB storage engine above, I hope it can bring some help to everyone in practical application. Due to the limited space in this article, there will inevitably be deficiencies and areas that need to be supplemented. You can continue to pay attention to the industry information section and will update your industry news and knowledge regularly. If you need more professional answers, you can contact us on the official website for 24-hour pre-sales and after-sales to help you answer questions at any time.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.