Overview of MySQL8.0 redo log optimization and introduction of threading model 07/02 Update SLTechnology News&Howtos

Overview of MySQL8.0 redo log optimization and introduction of threading model

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "Overview of MySQL8.0 redo log optimization and introduction of threading model". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

1 brief Review of MySQL redo_log

Write performance in MySQL 5.7will be limited by the synchronous operation of redo_log, especially in the case of multiple cpu and fast storage devices, the design of MySQL 5.7redo_log cannot effectively take advantage of storage device performance, so it needs to be redesigned. The bottleneck of MySQL 5.7is that mtr will add a log_sys_t::mutex lock when writing log to log buffer, and then when the mtr adds dirty page to flush list, it will add a log_sys_t::flush_order_mutex lock to ensure global order. If the mtr of other user threads wants to write log to log buffer, they need to wait for log_sys_t::mutex, and in order to ensure that the global lsn is orderly, the mtr of other user threads needs to wait for log_sys_t::flush_order_mutex locks even if they want to add data to other flush list. These two locks are the main performance bottlenecks of redo_log in MySQL 5.7.

2 Overview of redo log optimization

Currently, redo log is a lock-free fully asynchronous design, and its process architecture diagram is as follows:

As shown in the figure above, redo log has four asynchronous worker threads and two asynchronous worker threads: log_writer, log_flusher, log_flush_notifier, log_write_notifier, log_checkpointer,log_close,log_flush_notifier / log_write_notifier is the log notifier thread group in the figure, and the worker thread is log_checkpointer,log_ closer.

Log_writer: responsible for writing logs from log buffer to disk and promoting write_lsn (atomic data)

Log_flusher: responsible for fsync and promoting flushed_to_disk_lsn (Atomic data)

Log_write_notifier: listen to the write_lsn and wake up the user thread waiting for the log to drop the disk (according to the flush_log_at_trx_commit setting, the user commit operation will wait for the write_lsn to advance)

Log_flush_notifier: listens to flushed_to_disk_lsn and wakes up user threads waiting for log fsync.

Log_closer:1, clean up all redo_log-related lsn\ log buffer-related data structures during normal exit; 2. Clean up the old data of recent_closer on a regular basis (detailed later for recent_closer)

Log_checkpointer: check checkpoint regularly, push checkpoint according to flush list dirty page, release log buffer, etc.

1 thread synchronization data structure

Asynchronous threads synchronize data through two data structures: atomic read and write (atomic) and Link_buf. Atomic read and write is a new feature of cantilever 11 for multicore cpu, which reads and writes 64-bit data without locking.

Link_buf is a new data structure implemented by MySQL. Logically, it is a circular array, the array subscript represents start lsn, the data content is lsn length, and the data type is atomic. If the data content (lsn length) is 0, the slot is empty. Through reasonable std::atomic_thread_fence (std::memory_order_release) / std::atomic_thread_fence (std::memory_order_acquire) operations, each ensures lock-free concurrency of lsn reads and writes of different slot inside.

3 mtr proc

3.1 process

In order to solve the competition between mtr and redo log of multiple user threads, MySQL introduces two examples of Link_buf: recent_write / recent_close. The data type of these two instances is Link_buf introduced above. Different Mtra threads and log_writer threads can read and write different slot of recent_write / recent_close without lock. The recent_write / recent_close linked list is fixed in length and is determined by innodb_log_recent_closed_size and innodb_log_recent_written_size. The process of mtr is as follows:

Mtr obtains the written through sn and log_len

Start_lsn/end_lsn

Log_buffer_reserve (* log_sys, len)

Memcpy each block of mtr to log_buffer

Push the lsn location of recent_write to end_lsn

Masking implicate-> m_log.for_each_block (write_log)

Check whether the recent_close has space. If the recent_close has no space to store the lsn corresponding to the current mtr because the slot corresponding to the earliest lsn is empty (indicating that the slot is waiting for data to be filled), then the current mtr needs to wait until the earliest slot advance of the recent_close

Log_wait_for_space_in_log_recent_closed (* log_sys, handle.start_lsn)

Add dirty block to flush list

Add_dirty_blocks_to_flush_list (handle.start_lsn, handle.end_lsn)

Writes the current lsn to recent_close

Log_buffer_close (* log_sys, handle)

3.2 Analysis

The conflict between log buffer and flush list among the mtr of each user thread is resolved through the reading and writing of recent_write / recent_close.

1. According to the lsn allocated in step 1, multiple mtr can write log_buffer concurrently without conflict. However, due to concurrent writes, the mtr with smaller lsn (the mtr that applied for lsn earlier) may not be able to memcpy earlier, so there will be a hole in log_buffer at some point in time. The resolution of the void will be described in detail later.

2. Each mtr writes the data to the log_buffer as the producer, the log_writer takes the data out as the consumer, and the log_writer obtains the reading position of the data log_buffer through the recent_write. Because of the lock-free design of the recent_write, there will be no lock waiting process between the mtr and the log_writer.

Log_writer must guarantee continuous log writes, but when analyzed in 1, concurrent mtr may cause log_buffer holes. So recent_write provides the method advance_tail_until, which advances the array to the place where the first slot storage value is 0, and the subscript of that slot corresponds to lsn, where the first hole occurs. Log_writer writes logs from write_lsn (last write disk location) to the corresponding lsn of that slot from log_buffer to disk. This mechanism ensures continuous log writes in log_writer, replacing the log_sys_t::mutex lock in MySQL 5.7.

3. Due to the concurrent writing of multiple mtr, the data added to each flush list is not globally ordered, but only the local order of each flush, so it is necessary to determine a lsn to ensure that all dirty page before the lsn ends flush. In response to this limitation, mtr threads are synchronized through recent_close. Due to the fixed length of the recent_close, if the distance between the smallest lsn in the recent_close and the currently applied lsn is greater than the len (recent_close), you will need to wait, and the promotion of the minimum lsn of the recent_close will be carried out by log_closer (more on later). Therefore, the maximum possible out-of-order length can be guaranteed to be len (recent_close). Once the block enters the flush list, the corresponding lsn minus the lsn at the len (recent_close) position must have been brushed, and checkpoint can be performed at this point. This mechanism ensures that a newer lsn (larger lsn) can be found, while ensuring the freshness of the checkpoint point, which is used to replace the log_sys_t::flush_order_mutex lock in MySQL 5.7.

4 redo log threading model

The redo log thread model is shown in the following figure:

4.1 log_writer

Log_writer is responsible for brushing data from log_buffer to disk. The process is as follows:

Push the tail of recent_write to the largest continuous lsn and get the value of lsn (ready_lsn)

/ * Advance lsn up to which data is ready in log buffer. , /

(void) log_advance_ready_for_write_lsn (log)

Ready_lsn = log_buffer_ready_for_write_lsn (log)

The log between write_lsn and ready_lsn is brushed from log_buffer to FS cache when the data is on the disk.

Write_blocks (log, write_buf, write_size, real_offset)

Push write_lsn, push write_lsn to ready_lsn

Const lsn_t new_write_lsn = start_lsn + lsn_advance

Ut_a (new_write_lsn > log.write_lsn.load ())

Log.write_lsn.store (new_write_lsn)

The whole process of log_writer only synchronizes the continuous log location of log_buffer through recent_write, and polls recent_write using spin lock + pthread_cond_timedwait mode. Previously, the design write operation of MySQL (5.7) was carried out synchronously in mtr, and the write strategy was relatively single, and the mtr write length (too large or too small for the file system) was not suitable, and must be written. If small data is written continuously, it will cause serious IO waste. The modified write strategy can be further optimized, and the write block size and whether or not to do batch can be optimized in log_writer.

4.2 log_flusher

Log_flusher is responsible for fsync the data to disk and pushing flush_up_to_lsn. Only write_lsn is used to synchronize the flushing position between log_flusher and log_writer, and the two threads perform flushing and fsnyc according to their respective speeds. The data synchronization between them is carried out in the OS/FS layer, and there is no lock in user mode.

4.3 log_notifier

Log_notifier includes log_write_notifier and log_flush_notifier, which periodically poll (and wake up for acceleration) their respective lsn locations: log_write_notifier cares about write_lsn,log_flush_notifier cares about flush_up_to_lsn. The waiting user thread will be awakened according to the latest LSN value. The user thread commit and other operations will wait on the log_write_up_to function, and when the flush_up_to_lsn/write_lsn advances to the waiting position, log_write_up_to returns.

4.4 log_closer

Log_closer has two functions:

Push the recent_close regularly to remove the continuous lsn fragments that have been put into the flush list in the recent_close to ensure that the mtr will not keep waiting because there is no room for the recent_close in step 4.

When the database exits normally, do the finishing touches and put the logs in redo_log into flush list.

4.5 New parameters

In the same way as innodb_log_writer_spin_delay/innodb_log_writer_timeout, the closer thread adjusts the polling speed corresponding to the two parameters inno_log_closer_spin_delay/innodb_log_closer_timeout.

4.6 log_checkpointer

Log_checkpointer monitors all flush list, selects the smallest (oldest) lsn of all flush lists, compares this lsn with flushed_to_disk_lsn, etc., and sets a new last_checkpoint_lsn. This saves checkpoint in the main process, changing the original checkpoint in 7 seconds in the main process to once in log_checkpointer1 seconds, saving the time of database crash recovery to a certain extent.

4.7 parameters

1. Innodb_log_xxx_spin_delay & & innodb_log_xxx_timeout

For log threads other than log_checkpointer, each thread sets its own polling parameters: innodb_log_xxx_spin_delay and innodb_log_xxx_timeout

The functions that control the frequency of polling by log threads are:

Template

Inline static Wait_stats os_event_wait_for (os_event_t & event)

Uint64_t spins_limit

Uint64_t timeout

Condition condition = {})

The log thread first waits for innodb_log_xxx_spin_delay (spins_limit parameter) cpu pause instructions, and if the event does not arrive (condition function returns false), it starts to wait for a specific semaphore event. Sleep k innodb_log_xxx_timeout (timeout parameter) us,k increases exponentially with the increase of continuous waiting times, and the total sleep time is limited to 100ms.

The list of parameter names is as follows

2. Innodb_log_spin_cpu_abs_lwm & & innodb_log_spin_cpu_pct_hwm

Innodb_log_spin_cpu_abs_lwm and innodb_log_spin_cpu_pct_hwm are used to control the use of spin lock in os_event_wait_for. Innodb_log_spin_cpu_abs_lwm indicates the lower threshold of using spin lock, and cpu below this threshold indicates that the system is idle and does not need to use spin lock. A cpu higher than innodb_log_spin_cpu_pct_hwm indicates that the system is particularly busy and spin lock is not allowed. Innodb_log_spin_cpu_abs_lwm means the utilization rate of a cpu, for example, 40core cpu,innodb_log_spin_cpu_abs_lwm is 80. The total utilization rate of cpu on MySQLd is less than 80%, and no spin is performed. Innodb_log_spin_cpu_pct_hwm indicates the total utilization of all cpu, for example, 40core cpu,innodb_log_spin_cpu_pct_hwm is 50. When the cpu utilization is greater than 2000%, no spin is performed.

5 conclusion

The optimization of redo_log resolves the previous lock conflict, and the "equal lock-write" mechanism of the user thread is transformed into the "write buffer-view write disk" mechanism. The user thread does not flush the disk, but the background thread does it uniformly. After the write buffer, the user thread can do other operations, so as to achieve the parallelism of log disk writing and mtr doing other things, and improve efficiency.

This is the end of the introduction to "Overview of MySQL8.0 redo log Optimization and threading Model". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.