The process of transaction commit in MySQL layer 07/19 Update SLTechnology News&Howtos

The process of transaction commit in MySQL layer

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "the process of transaction submission in MySQL layer". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn the process of transaction submission in the MySQL layer.

This section will explain the detailed submission process of the MySQL layer in the future, but due to limited capabilities, it is not possible to include all the steps here, just some important steps that I have learned. We first need to assume the parameter settings, because the settings of some parameters will directly affect the submission process, and we will explain the meaning of these parameters one by one. Most of the content in this section focuses on the functions MYSQL_BIN_LOG::prepare and MYSQL_BIN_LOG::ordered_commit.

I. Parameter setting

This section assumes that the parameters are set to:

Binlog_group_commit_sync_delay:0

Binlog_group_commit_sync_no_delay_count:0

Binlog_order_commits:ON

Sync_binlog:1

Binlog_transaction_dependency_tracking:COMMIT_ORDER

The parameter binlog_transaction_dependency_tracking needs to be explained in detail. We know that Innodb's row lock is acquired while the statement is running, so if multiple transactions enter the commit process (prepare phase) at the same time, there must be no row conflicts between transactions before the Innodb layer commits and releases Innodb row lock resources, so they can be executed in parallel from the library side. In COMMIT_ORDER-based parallel replication, last commit and seq number are generated based on this idea. If the last commit is the same, it can be seen as parallel playback from the library. In section 19, we will explain the rules for determining parallel playback from the library. In WRITESET-based parallel replication, last commit will continue to be reduced under the influence of WRITESET to get better parallel playback from the library, but it is also based on COMMIT_ORDER, which will be discussed in the next section. This section only discusses the generation of last commit and seq number in COMMIT_ORDER-based parallel replication.

The sync_binlog parameter has two functions:

Sync_binlog=0:binary log does not sync flushing, but depends on the OS flushing mechanism. At the same time, the DUMP thread is notified to send Event after the flush phase.

Sync_binlog=1:binary log performs sync flushing every time the sync queue is formed, which is about equal to every group commit flushing. At the same time, the DUMP thread is notified to send Event after the sync phase. Note that the setting of sync_binlog non-1 may cause the slave library to have more transactions than the master library.

Sync_binlog > 1:binary log will perform sync flushing after the specified sync queue is formed, which is approximately equal to the group commit flushing after the specified time. At the same time, the DUMP thread is notified to send Event after the flush phase.

II. Overall flow chart

Here we first show the whole process, as follows (figure 15-1, the original HD image is included in the original image at the end of the article):

Third, step analysis of the first stage (the blue part in the picture)

Note: there will be an operation to acquire the MDL_key::COMMIT lock before step 1, so FTWRL will block the 'commit' operation with a blocking state of' Waiting for commit lock', which can be called by the function make_global_read_lock_block_commit called by FTWRL.

(1.) prepare for binlog. Write the largest seq number in the previous COMMIT queue to the last_commit of this transaction. Please refer to the binlog_prepare function.

(2.) prepare for Innodb. Change the state of the transaction to prepare and write the status and XID of the transaction to Undo. Please refer to the trx_prepare function.

(3.) XID_EVENT is generated and written to binlog cache. We said in Section 10 that XID actually comes from query_id, which was generated a long time ago, and here it just generates Event. Please refer to the MYSQL_BIN_LOG::commit function.

Step analysis of the second stage (pink part of the picture)

(4.) form a FLUSH queue. In this step, transactions are constantly being added to the FLUSH queue. The first to enter the FLUSH queue is the current leader, and the non-leader thread will be blocked until it is awakened by the leader thread after the COMMIT phase.

(5.) acquire the LOCK log lock.

(6.) this step is to pull out the queue of the FLUSH phase and prepare it for processing. It is at this time that the FLUSH queue cannot be changed. Please refer to the stage_manager.fetch_queue_for function.

(7.) here the transaction will persist the redo in the Innodb layer and help other transactions to persist the redo. You can refer to MYSQL_BIN_LOG::process_flush_stage_queue function. Here are the comments and a small piece of code:

/ * We flush prepared records of transactions to the log of storage engine (for example, InnoDB redo log) ina group right before flushing them to binary log. * / ha_flush_logs (NULL, true); / / persist innodb redo

(8.) generate GTID and seq number, and then generate GTID_EVENT along with the previous last commit, which is then written directly to binary log. We notice that this is written directly to binary log rather than to binlog cache, so GTID_EVENT is the first Event of the transaction. Refer to the following paragraph in the function binlog_cache_data::flush:

Trn_ctx- > sequence_number= mysql_bin_log.m_dependency_tracker.step (); / / int64 state + 1... If (! error) if ((error= mysql_bin_log.write_gtid (thd, this, & writer)) / / generate GTID to write to binary log file thd- > commit_error= THD::CE_FLUSH_ERROR; if (! error) error= mysql_bin_log.write_cache (thd, this, & writer); / / write other Event to binary log file

For the values of seq number and last commit, a global structure, Transaction_dependency_tracker, is actually maintained within MySQL. There are three possible values, as follows:

Commit_order_trx_dependency_tracker

Writeset_trx_dependency_tracker

Writeset_session_trx_dependency_tracker

Which value method to use is determined by the parameter binlog_transaction_dependency_tracking.

Here we first study the way in which the parameter is set to COMMIT_ORDER, which is specifically discussed in the next section.

The value of Commit_order_trx_dependency_tracker will be used when set to COMMIT_ORDER, which has the following characteristics:

Characteristics

The seq number will be incremented by 1 each time the transaction is committed.

Last commit is assigned to each transaction in the previous binlog preparation phase. This has been described earlier.

Last commit is the largest seq number of the previous COMMIT queue. We can see this later.

Secondly, the value types of seq number and last commit are Logical_clock, where a value called offsets offset is maintained, which is used to record the relative offset of sequence_number each time binary log is switched. So seq number and last commit are always recounted at each binary log, and here are the source comments for offset:

/ * Offset is subtracted from the actual "absolute time" value at logging a replication event. That is the event holds logical timestamps in the "relative" format. They are meaningful only in the context of the current binlog. The member is updated (incremented) per binary log rotation. * / int64 offset

Here is how we calculate seq number. You can refer to the Commit_order_trx_dependency_tracker::get_dependency function.

Sequence_number= trn_ctx- > sequence_number-m_max_committed_transaction.get_offset (); / / get seq number here

We clearly see that there is an operation to subtract offset, which is why our seq number and last commit are always re-counted at each binary log.

(9.) this step will write all the Event in our binlog cache to our binary log. As far as a transaction is concerned, it should be clear here that the Event included here are:

QUERY_EVENT

MAP_EVENT

DML EVENT

XID_EVENT

Notice the binary logfile that GTID_EVENT has written to before. The write I'm talking about here is the write function of the called Linux, which normally goes into the OS CACHE in the figure. In fact, it may not really be written to the disk media at this time.

Repeat steps 7-9 to do the same for all transactions in the FLUSH queue.

Note: if sync_binlog! = 1, the DUMP thread will be awakened to send Event.

(10.) this step also determines whether binary log needs to switch and sets a toggle flag. The basis is whether the total amount of Event written by each transaction in the entire queue plus the existing binary log size exceeds the max_binlog_size. You can refer to the MYSQL_BIN_LOG::process_flush_stage_queue function as follows:

If (total_bytes > 0 & & my_b_tell (& log_file) > = (my_off_t) max_size) * rotate_var= true; / / tag needs to be switched

Note, however, that all the Event is written to binary log before the judgment is made. So for large transactions, their Event must all be contained in the same binary log.

This is the end of the FLUSH phase.

Step analysis of the third stage (purple part in the picture)

(11.) the FLUSH queue joins the SYNC queue. The leader of the first FLUSH queue entered is the leader of this phase. Other FLUSH queues join the SYNC queue, and the leader of other FLUSH queues is blocked by LOCK sync until awakened by the leader thread after the COMMIT phase.

(12.) release LOCK log.

(13.) get LOCK sync.

(14.) whether to wait for a period of time is determined according to the setting of the parameter delay. We can see from the figure that the longer the delay, the longer it takes to join the SYNC queue, the more FLUSH queues are likely to join, and the more transactions the SYNC queue has. This not only improves sync efficiency, but also increases the number of GROUP COMMIT group members (because last commit has not changed, the longer the time drags on, the more transactions in a set of transactions), thus improving the parallel efficiency of MTS from the library. However, the disadvantages can obviously lead to the lengthening of simple DML statements, so it cannot be set too large. The following is a case column in my brief book, which is caused by improper setting of delay parameters, as follows:

Https://www.jianshu.com/p/bfd4a88307f2

The parameter delay contains two parameters as follows:

Binlog_group_commit_sync_delay: artificially set the length of delay to increase the number of transactions in the entire GROUP COMMIT group, and reduce the number of disk flushing sync, but it is limited by binlog_group_commit_sync_no_delay_count. The unit is 11,000,000 seconds, with a maximum value of 1000000, or 1 second.

Binlog_group_commit_sync_no_delay_count: within the time of delay, if the number of transactions in GROUP COMMIT reaches this setting, it will jump out of waiting directly without the length of waiting for binlog_group_commit_sync_delay. The unit is the number of transactions.

(15.) this step is to pull the queue out of the SYNC phase and prepare it for processing. This is when the SYNC queue can no longer be changed. This queue is not the same as the FLUSH queue, the order of transactions is the same but the number of transactions may be different.

(16.) decide whether to swipe the disk or not according to the setting of sync_binlog. You can refer to the function MYSQL_BIN_LOG::sync_binlog_file, and the logic is very simple.

This is the end of the SYNC phase.

Note: if sync_binlog = 1, the DUMP thread will be awakened to send Event.

Step analysis of the fourth stage (yellow part of the picture)

(17.) the SYNC queue joins the COMMIT queue. The leader of the first SYNC queue entered is the leader of this phase. Other SYNC queues join the COMMIT queue, and the leader of other SYNC queues is blocked by LOCK commit until awakened by the leader thread after the COMMIT phase.

(18.) release LOCK sync.

(19.) get LOCK commit.

(20.) whether to commit the Innodb layer according to the queue order is determined according to the setting of the parameter binlog_order_commits. If the binlog_order_commits=1 is committed according to the queue order, the visible order of the transaction is the same as the commit order. If binlog_order_commits=0, the next steps 21 to 23 will not take place, that is, there will be no Innodb layer commit.

(21.) this step is to pull the queue out of the COMMIT phase and prepare it for processing. That's when the COMMIT queue can no longer be changed. This queue is not the same as FLUSH queue and SYNC queue, the order of transactions is the same, and the number of transactions may be different.

Note: if the rpl_semi_sync_master_wait_point parameter is set to 'AFTER_SYNC', the ACK confirmation will be done here, and you can see that the actual Innodb layer commit operation has not yet taken place, and the wait status is' Waiting for semi-sync ACK from slave'.

(22.) the last_commit must be changed before the Innodb layer is committed. Every transaction in the COMMIT queue updates it. If it is larger, it changes, and if it is less than, it does not change. You can refer to the Commit_order_trx_dependency_tracker::update_max_committed function. Here is a small piece of code:

{m_max_committed_transaction.set_if_greater (sequence_number); / / change if larger}

(23.) each transaction in the COMMIT queue commits the Innodb layer sequentially. Please refer to the innobase_commit function.

The Innodb layer will do a lot of actions in this step, such as:

Updates to Readview

Status update of Undo

Release of Innodb lock resources

To complete this step, you can actually see the transaction at the Innodb layer. I have encountered a case where the entire commit operation hang resides due to a problem with the leader thread waking up other threads in this group, but the modification of these transactions is already visible in the database.

Loop 22 through 23 until the COMMIT queue is finished processing.

Note: if the rpl_semi_sync_master_wait_point parameter is set to 'AFTER_COMMIT', the ACK confirmation will be done here, and you can see that the actual Innodb layer commit operation has been completed, and the wait status is' Waiting for semi-sync ACK from slave'.

(24) release LOCK commit.

This is the end of the COMMIT phase.

Step analysis of the fifth stage (the green part in the picture)

(25.) here the leader thread wakes up all members of the group to perform their own operations.

(26.) each transaction member resets the binlog cache and empties the cache to release the temporary files.

(27.) if binlog_order_commits is set to 0memt queue, each transaction in the commit queue commits at its own Innodb layer (not in the order of transactions in binary log).

(28.) according to the handover flag set in step 10 above, it is decided whether or not to perform binary log handover.

(29.) if you switch binary log, you also need to determine whether to clean up binlog log according to the settings of expire_logs_days.

VIII. Summary

Throughout the process, we see that there is no additional overhead in the process of generating last commit and seq number, but there is some overhead in the WRITESET-based parallel replication described in the next section.

What we need to understand is that each phase of FLUSH/SYNC/COMMIT has a corresponding queue, and each queue is different. But the transaction order is the same, and whether it can be played back in parallel from the library depends entirely on the last_commit obtained in the preparation phase, which we will describe in detail in Section 19.

There is actually a relationship between the number of transactions in the three queues of FLUSH/SYNC/COMMIT, that is, COMMIT queue > = SYNC queue > = FLUSH queue. If the pressure is not great, they may all be the same and contain only one transaction.

It can be seen from the process that parallel replication based on COMMIT_ORDER may have only one transaction per queue if the database is not under great pressure. This situation can no longer be played back in parallel from the library, but the WRITESET-based parallel replication we'll talk about in the next section can change that.

It is also more obvious here that the Event of a large transaction will be written to binary log once at the time of commit. If large transactions are included in the COMMIT queue, other transaction commits in this queue must be blocked and subsequent commits cannot be completed. I think this is also an important reason why MySQL is not suitable for big business.

At this point, I believe you have a deeper understanding of the "MySQL layer transaction commit process". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.