What is doublewrite which is the key feature of Innodb? 07/12 Update SLTechnology News&Howtos

What is doublewrite which is the key feature of Innodb?

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "what is doublewrite, the key features of Innodb?". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

I. Classical Partial page write problem

It is necessary to understand the problem of partial page write (partial page failure) before introducing double write.

The Page Size of InnoDB is generally 16KB, and its data check is also calculated for this 16KB. Writing data to disk is operated in units of Page. We know that since the file system is not atomic for a big data page (such as InnoDB's 16KB) in most cases, this means that if the server goes down, only partial writes may be done. 16K of data, write 4K, the system power outage / os crash occurred, only part of the write is successful, in this case is the partial page write problem.

An experienced DBA might think that if a write failure occurs, the MySQL can recover based on the redo log. This is an approach, but it must be clearly recognized that physical changes to the page are recorded in redo log, such as offsets of800and write 'aaaa' records. If the page itself has been damaged, there is no point in redoing it. MySQL checks the checksum,checksum of the page during the recovery process is to check the last transaction number of the page. When the partial page write problem occurred, the page was corrupted and the transaction number in the page could not be found. In InnoDB's view, such a data page cannot be validated by checksum and cannot be recovered. Even if we force it to pass verification, we will not be able to recover from the crash, because some of the log types that currently exist in InnoDB, some of which are logical operations, are not idempotent.

To solve this problem, InnoDB implements double write buffer, which simply writes the data page to a separate physical file location (ibdata) before writing the data page, and then to the data page. In this way, in case of data page corruption during downtime restart, before applying redo log, you need to restore the page through a copy of the page, and then redo it with redo log, which is called double write. What the doublewrite technology brings to the innodb storage engine is the reliability of the data page. here is the analysis of the doublewrite technology, so that we can fully understand how doublewrite ensures the reliability of the data page.

II. Double write architecture and workflow

The double write consists of two parts, one is the double write buffer in the InnoDB memory, the size is 2m, and the other is the ibdata system table space on the physical disk, the size is 2MB, a total of 128consecutive Page, that is, 2 partitions. Of these, 120 are used for batch dirty writing and the other 8 for Single Page Flush. The reason for the distinction is that batch scrubbing is done by background threads and does not affect foreground threads. Single page flush is initiated by the user thread and needs to be cleaned and replaced with a free page as soon as possible.

For batch cleaning, each time you find a page that can do flush, hold S lock to it, and then copy the page to the dblwr. When the dblwr is full of the latter, brush all the page in the dblwr into the ibdata, note that this is a synchronous write operation, and then wake up the background IO thread to write the data page. When the background IO thread completes the write operation, it updates the count in dblwr to make room, releases the S lock on the block, and completes the write.

For Single Page Flush, the synchronous write operation is done. After picking out a page that can be scrubbed, add it to the dblwr, brush it to ibdata, and then write to the user tablespace. When finished, a fsync operation will be performed on the user tablespace.

Single Page Flush is triggered when there is not enough free page in buffer pool, which is usually initiated by the foreground thread. Because each single page flush will lead to a fsync operation, under a large concurrent load, if a large number of threads do flush, it will obviously cause a serious performance degradation. Percona is optimized in version 5. 6, and you can choose the backstage thread lru manager to do the pre-brush to prevent the user thread from getting caught up in it.

If there is an extreme situation (power outage) and after InnoDB starts again, it is found that a Page data has been corrupted, then the data can be recovered from double write buffer at this time.

The double write workflow is as follows:

When a series of mechanisms (main function trigger, checkpoint, etc.) trigger dirty pages in the data buffer pool to refresh to data file, they are not directly written to the disk, but will be copied to the double write buffer in memory first through the memcpy function, and then written to the physical disk of the shared tablespace twice through double write buffer, each time 1MB sequentially. Then immediately call the fsync function to synchronize the dirty pages into the disk. Because the double write pages are stored continuously in this process, the disk is written sequentially with high performance; after completing the double write, the dirty pages are written to the actual tablespace files, and the writes are discrete. The cooperation of each module is shown below (the first step should be the redo record log buffer generated by the dirty page, and then the log buffer should be written to redo log file, which is directly connected to simplify the secondary steps):

To view the operation of doublewrite, you can execute the command:

Mysql > show status like'% InnoDB_dblwr%'+--+-+ | Variable_name | Value | +-+- -+ | Innodb_dblwr_pages_written | 216261751 | | Innodb_dblwr_writes | 43307580 | +-- +

The above data show that double write has written a total of 61932183 pages and 15237891 times. From this group of data, we can analyze. As mentioned before, after enabling double write, each dirty page refresh must first write double write, while double write exists on disk with two consecutive sections, each consisting of consecutive pages. In general, an area has a maximum of 64 pages, so an IO write should be able to write up to 64 pages. According to the above ratio of my system Innodb_dblwr_pages_written to Innodb_dblwr_writes, it is about 4 pages at a time, far less than 64, so it can also be seen from this point of view that the system writing pressure is not high.

If the operating system sends a crash while writing a page to disk, during recovery, the InnoDB storage engine can find a copy of the page from the double write in the process tablespace, copy it to the tablespace file, and apply redo log. The following shows a process of recovery by double write:

090924 11:36:32 mysqld restarted

090924 11:26:33 InnoDB: Database was not shut down normally!

InnoDB: Starting crash recovery.

InnoDB: Reading tablespace information from the .ibd files...

InnoDB: Crash recovery may have faild for some .ibd files!

InnoDB: Restoring possible half-written data pages from the doublewrite.

InnoDB: buffer...

Third, the shortcomings of double write

The double write buffer where the dblwr is located on the shared tablespace is actually a file that introduces the overhead of one additional write, and each data page is required to be written twice. Because a large number of fsync operations are required, it degrades the overall performance of MySQL, but not to the original 50%. This is mainly because:

1) double write is a connected storage space, so the hard disk writes data sequentially instead of randomly, so it has better performance.

2) when the data is written from double write buffer to the real segment, the system will automatically merge the connection space refresh mode, which can refresh multiple pages at a time.

Doublewrite is enabled by default. Although the parameter skip_innodb_doublewrite can prohibit the use of the doublewrite feature, it is strongly recommended that you use doublewrite. Avoid partial write invalidation. Of course, if your data tablespace is placed on a file system that already provides a partial write failure prevention mechanism, such as the ZFS/FusionIO/DirectFS file system, you can disable doublewrite in this case.

4. How does double write work during recovery

If writing to double write buffer itself fails, the data will not be written to disk, and InnoDB will load the original data from disk at this time, and then calculate the correct data through InnoDB's transaction log and rewrite it to double write buffer.

If double write buffer writes successfully, but fails to write to disk, InnoDB will not be calculated from the transaction log, but will be written again with buffer data. As shown in the figure above, during recovery, InnoDB directly compares the checksum of the page. If not, the Innodb storage engine can find a recent copy of the page from the double write of the shared tablespace, copy it to the tablespace file, and then apply redo log to complete the recovery process. Because there are replicas, you don't have to worry about whether the data pages in the tablespace are corrupted, but InnoDB recovery usually takes a long time.

This is the end of the content of "what is doublewrite in the key features of Innodb". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.