How to realize multi-thread parallel playback of MTS from library 07/01 Update SLTechnology News&Howtos

How to realize multi-thread parallel playback of MTS from library

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

Today, I will talk to you about how to achieve multi-threaded parallel playback from the library MTS, many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

From the library MTS multithreaded parallel playback, this paper focuses on describing the concept of checkpoints in MTS. As we can see in section 25 below, the exception recovery of MTS depends on this checkpoint in many cases. Scan relay log from the checkpoint location for recovery operation, but this dependency will be weakened when GTID AUTO_POSITION MODE mode is set and recovery_relay_log=1 is set. Worker thread executes Event

In fact, the coordinator thread simply distributes the Event to the worker thread's execution queue. Then the worker thread needs to take the Event out of the execution queue to execute the Event. The whole process can refer to the function slave_worker_exec_job_group. Because this process is relatively simple, there is no need to draw, but we need to pay attention to the following points:

(1) read Event from the execution queue. Note here that if there is no Event in the execution queue, then it goes into idle waiting, that is, the worker thread is in a state of having nothing to do and the waiting state is' Waiting for an event from Coordinator'.

(2) if the execution to XID_EVENT means that the transaction has ended, then the memory information update operation needs to be completed. You can refer to Slave_worker::slave_worker_exec_event and Xid_apply_log_event::do_apply_event_worker functions. To update memory-related information, please refer to the function commit_positions. Here are some updated information, which we can see is basically the same as the information in the slave_worker_ info table, as follows:

1. Update the current information: strmake (group_relay_log_name, ptr_g- > group_relay_log_name,sizeof (group_relay_log_name)-1); group_relay_log_pos= ev- > future_event_relay_log_pos;set_group_master_log_pos (ev- > common_header- > log_pos); set_group_master_log_name (set_group_master_log_name-> get_group_master_log_name ()) 2. Write checkpoint information: strmake (checkpoint_relay_log_name, ptr_g- > checkpoint_relay_log_name,sizeof (checkpoint_relay_log_name)-1); checkpoint_relay_log_pos= ptr_g- > checkpoint_relay_log_pos;strmake (checkpoint_master_log_name, ptr_g- > checkpoint_log_name,sizeof (checkpoint_master_log_name)-1); checkpoint_master_log_pos= ptr_g- > checkpoint_log_pos 3. Set the GAQ serial number: checkpoint_seqno= ptr_g- > checkpoint_seqno; to update the entire BITMAP. GAQ may have been deported by the checkpoint: for (uint pos= ptr_g- > shifted; pos)

< c_rli->

Checkpoint_group Pos++) / / reset the bitmap because checkpoint has {/ / ptr_g- > shifted is the number of transactions out of the queue in GAQ if (bitmap_is_set (& group_shifted, pos)) / / the transaction that is out of queue needs to be offset here Bitmap_set_bit is no longer needed for recovery (& group_executed, pos-ptr_g- > shifted) } 4. Set bitmap: bitmap_set_bit (& group_executed, ptr_g- > checkpoint_seqno); / / set it to 1 in the corresponding location of this transaction

(3) if the execution to XID_EVENT means that the transaction has ended, then the persistence of memory information needs to be completed, that is, the memory information is forced to be persisted to the slave_worker_ information table (relay_log_info_repository is set to TABLE). You can refer to the commit_positions function as follows:

If ((error= w-> commit_positions (this, ptr_group,w- > is_transactional ()

(4) if you execute to XID_EVENT, you still need to commit the transaction, that is, commit the transaction in the Innodb layer.

From the above, we can see that the commit of each transaction in MTS does not update the slave_relay_log_info table, but updates the slave_worker_ information table, writing the latest information to the slave_worker_ information table.

As we said earlier, the SQL thread has been transformed into a coordinator thread, so when will the slave_relay_log_info table be updated? Below we can see that the update of the slave_relay_log_ information table is actually updated by the orchestrating thread after the checkpoint.

II. Important concepts in checkpoints in MTS

Generally speaking, checkpoints in MTS are the starting point for MTS to perform exception recovery. In fact, it means that all transactions (including themselves) have already been executed from the library before they arrive at this location, but subsequent transactions may or may not have been completed. The checkpoint is carried out by the orchestration thread.

(1) coordinate the GAQ queue of a thread

We already know that an Event distribution queue is maintained for each worker thread in MTS. In addition, the coordinator thread maintains a very important queue, GAQ, which is a circular queue. The following is the definition in the source code:

/ * master-binlog ordered queue of Slave_job_group descriptors of groups that are under processing. The queue size is @ c checkpoint_group. Group assigned * / Slave_committed_queue * gaq

Each time the orchestrating thread distributes transactions, transactions are logged to the GAQ queue, so the order of transactions in GAQ is always the same as that in relay log files. Checkpoints act on the GAQ queue. The location of each checkpoint is called LWM. Remember the LWM I told you to ignore in the previous section? This is it. This is exactly the case as defined in the source code, which is maintained in the GAQ queue. As follows:

/ * The last checkpoint time Low-Water-Mark * / Slave_job_group lwm

A sequence number called checkpoint_seqno is also maintained in the GAQ queue, which is the sequence number of each allocated transaction since the last checkpoint. Here is the definition in the source code:

Uint checkpoint_seqno; / / counter of groups executed after the most recent CP

After the coordinator thread reads the GTID_LOG_EVENT, it is assigned a sequence number, which is marked as checkpoint_seqno, as follows:

Rli- > checkpoint_seqno++;// add seqno

When the orchestrating thread makes a checkpoint, the checkpoint_seqno sequence number subtracts the number of transactions out of the queue, as follows:

Checkpoint_seqno= checkpoint_seqno-shift; / / minus the outbound business here

This sequence number is also used in the case of MTS exception recovery. Each worker thread will use this sequence number to confirm the upper limit of the transaction executed by the worker thread, as follows:

For (uint I = (w-> checkpoint_seqno + 1)-recovery_group_cnt, j = 0; I checkpoint_seqno; iTunes, jacks +) {if (& w-> group_executed, I) / / if this bit has been set {DBUG_PRINT ("mts", ("Setting bit% u.", j)) Bitmap_fast_test_and_set (groups, j); / / then GTOUPS should be set in the bitmap, and eventually GTOUPS will contain all transactions that need to be recovered}}

The detailed exception recovery process is described in section 25.

(2) Bitmap of worker thread

With GAQ queues and checkpoints, you will know where the exception recovery begins. But we don't know which transactions each worker thread completed and which didn't, so we can't confirm which transactions need to be restored. In MTS, parallel playback transactions are not committed in distribution order, some large transactions (or other reasons than lock blockage) may not be committed late, while some small transactions will be completed quickly. These delayed transactions become the so-called 'gap','. If GTID is used, there may be some 'holes' when checking that GTID SET has been executed. To prevent the occurrence of 'gap', you usually need to set the parameter slave_preserve_commit_order. We will see this "hole" and the role of slave_preserve_commit_order in the next section. However, if you want to set the slave_preserve_commit_order parameter, you need to enable the ability to record binary log from the library, so the log_slave_updates parameter must be enabled. The following is the judgment of the source code:

If (opt_slave_preserve_commit_order & rli- > opt_slave_parallel_workers > 0 & & opt_bin_log & & opt_log_slave_updates) commit_order_mngr= new Commit_order_manager (rli- > opt_slave_parallel_workers); / / order commit Manager

Let's mention in advance that there are two key stages of MTS recovery:

Scanning phase

By scanning the relay log after the checkpoint. Through the Bitmap of each worker thread, we can distinguish which transactions have been completed and which transactions have not been completed, and summarize to form a recovery Bitmap, and get the total number of transactions that need to be recovered.

Execution phase

With this summary of the recovery Bitmap, the read relay log of these uncompleted transactions will be executed again.

This Bitmap bitmap corresponds to the transactions in GAQ one by one. This bit will be set to'1' when the execution XID_EVENT completes the submission.

(3) persistence of coordination thread information

As mentioned earlier, the location of the checkpoint needs to be solidified into the slave_relay_log_ info table (relay_log_info_repository is set to TABLE) each time a checkpoint is performed. So what is stored in slave_relay_log_info is actually not real-time information but checkpoint information. The following is the table structure of the slave_relay_log_ info table

At the same time, some of the information in show slave status is also the memory information of the checkpoint. The following information will come from the checkpoint:

Relay_Log_File: the relay log file name of the latest checkpoint.

Relay_Log_Pos: the relay log site of the latest checkpoint.

Relay_Master_Log_File: the binary log file name of the main library of the latest checkpoint.

Exec_Master_Log_Pos: the main library binary log site of the latest checkpoint.

Seconds_Behind_Master: the delay calculated based on the commit time of the checkpoint to the transaction.

It should be noted that our GTID module is independent of this theory. In section 3, when we talk about the initialization of the GTID module, we said that the initialization of the GTID module is completed before initialization from the library information. Therefore, it will be easier and safer to use GTID AUTO_POSITION MODE mode when doing MTS exception recovery, which is described in detail in Section 25.

(4) persistence of worker thread information

The worker thread's information is persisted in the slave_worker_info table, which we described earlier when the worker thread executed the Event attention points. Executing XID_EVENT writes the information to the slave_worker_info table (relay_log_info_repository is set to TABLE) after the transaction commit is completed. It includes information:

Relay_log_name: the name of the relay log file where the worker thread last committed the transaction.

Relay_log_pos: the relay log point where the worker thread last commits the transaction.

Master_log_name: the name of the main library binary log file where the worker thread last commits the transaction.

Master_log_pos: the binary log file location of the main library where the worker thread last commits the transaction.

Checkpoint_relay_log_name: the name of the relay log file where the worker thread last committed the transaction corresponding to the checkpoint.

Checkpoint_relay_log_pos: the relay log point of the checkpoint corresponding to the last committed transaction of the worker thread.

Checkpoint_master_log_name: the binary log file name of the main library where the worker thread last committed the transaction corresponding to the checkpoint.

Checkpoint_master_log_pos: the binary log site of the main library corresponding to the checkpoint of the worker thread's last commit transaction.

Checkpoint_seqno: the last committed transaction of the worker thread corresponds to the checkpoint_seqno sequence number.

Checkpoint_group_size: the number of Bitmap bytes of the worker thread, approximately equal to the GAQ queue size / 8, because 1 byte is 8 bits.

Checkpoint_group_bitmap: Bitmap bitmap information corresponding to the worker thread.

About the conversion reference function Slave_worker::write_info of Checkpoint_group_size.

(5) two parameters

Slave_checkpoint_group:GAQ queue size.

Slave_checkpoint_period: how often is the checkpoint performed? default is 300ms.

(6) timing of checkpoint execution

Exceeded slave_checkpoint_period configuration. You can refer to the next_event function as follows:

If (rli- > is_parallel_exec () & (opt_mts_checkpoint_period! = 0 | | force)) {ulonglong period= static_cast (opt_mts_checkpoint_period * 1000000ULL);... (void) mts_checkpoint_routine (rli, period, force, true/*need_data_lock=true*/);...}

The GAQ queue is full, as follows:

/ / if the size of GAQ is set to force to force checkpoint bool force= (rli- > checkpoint_seqno > (rli- > checkpoint_group-1))

Normal stop slave.

(7) one Liezi

The maximum Checkpoint_master_log_pos of all worker threads in the slave_worker_info usually under stress should be equal to the Master_log_pos in the slave_relay_log_info, because this is the location information of the last checkpoint, as follows:

Third, the flow of checkpoints in MTS

This section will describe the steps of checkpoint in detail. For checkpoint, you can refer to the function mts_checkpoint_routine.

Suppose there are now seven transactions that can be executed in parallel and the number of worker threads is four. Currently, five coordination threads have been distributed, the first four transactions have been completed, and one of the fifth transactions is a large transaction. Then it is possible that the current state diagram is as follows (figure 20-1, the original HD image is included in the original picture at the end of the article):

The first four transactions are assigned to each worker thread, and the last large transaction is assumed to be executed by worker thread 2, which is shown in red in the figure.

(1) it is determined that the size set by slave_checkpoint_period is exceeded, if the checkpoint is exceeded.

If (! force & & diff)

< period)//是否需要进行检查点是否超过了slave_checkpoint_period的设置 { /* We do not need to execute the checkpoint now because the time elapsed is not enough. */ DBUG_RETURN(FALSE); } （2）扫描GAQ队列进行出队操作，直到第一个没有提交的事务为止。图中红色部分就是一个大事务，检查点只能停留在它之前。 cnt= rli->

Gaq- > move_queue_head (& rli- > workers); / / the work array returns the number of outbound teams

The partial code of move_queue_head is as follows:

If (ptr_g- > worker_id = = MTS_WORKER_UNDEF | | my_atomic_load32 (& ptr_g- > done) = = 0) / / whether the current GROUP has been executed or not. If not, you need to stop this checkpoint break; / * 'gap' at i'th * /.

(3) the information of updating memory and relay_log_info_repository table is the location pointed to by this checkpoint.

First update the memory information, which is what we see in show slave status:

Rli- > set_group_master_log_pos (rli- > gaq- > lwm.group_master_log_pos); rli- > set_group_relay_log_pos (rli- > gaq- > lwm.group_relay_log_pos); rli- > set_group_relay_log_name (rli- > gaq- > lwm.group_relay_log_name)

Then force to write to the table slave_relay_log_info:

Error= rli- > flush_info (TRUE); / / write the checkpoint information to the relay_log_info_repository table

(4) update the last_master_timestamp information to the timstamp value of the XID_EVENT of the checkpoint location transaction

This value, described in detail in Section 27, is a factor in calculating Seconds_behind_master:

/ * Update the rli- > last_master_timestamp for reporting correct Seconds_behind_master. If GAQ is empty, set it to zero. Else, update it with the timestamp of the first job of the Slave_job_queue which was assigned in the Log_event::get_slave_worker () function. * / ts= rli- > gaq- > empty ()? 0: reinterpret_cast (rli- > gaq- > head_queue ())-> ts;//rli- > gaq- > head_queue checkpoint location GROUP time rli- > reset_notified_checkpoint (cnt, ts, need_data_lock, true); reset_notified_checkpoint function includes: last_master_timestamp= new_ts

Therefore, the calculation of Seconds_behind_master in MTS is closely related to checkpoint.

(5) finally, the number of transactions dequeued by the previous GAQ will be counted, because each worker thread needs to offset the Bitmap bitmap according to this value. It also maintains the checkpoint_ seqno value of the GAQ we mentioned earlier.

This operation is also done in the function Relay_log_info::reset_notified_checkpoint. In fact, the simple part of the code is as follows:

For (Slave_worker * * it= workers.begin (); it! = workers.end (); + + it) / / Loop each wokerw- > bitmap_shifted= w-> bitmap_shifted + shift; / / each worker thread increases this offset checkpoint_seqno= checkpoint_seqno-shift; / / minus the number of moves here.

At this point, the basic operation of the whole checkpoint is completed. We see that there are actually not many steps, and after getting the Bitmap offset, each worker thread will offset the bitmap when the first transaction commits, and the checkpoint_seqno count will be updated.

In our previous hypothetical environment, if a checkpoint is triggered and the orchestrating thread sends the last two parallel transactions to worker threads 1 and 3 for processing and completion. Then our picture will look like this (figure 20-2, the original HD image is included in the original picture at the end of the article):

In this picture, I use different colors to represent different lines because they cross more. The red transaction in GAQ is the big transaction we assume that it has not yet been executed, and it is also what we call 'gap'. If the MySQL instance restarts abnormally at this time, then the red 'gap' is the transaction we need to find after startup by comparing it with the Bitmap bitmap, which will be discussed later when the exception is restored. If GTID is turned on, this' gap' can be easily observed, which will be tested in the next section.

At the same time, we need to note that worker thread 2 does not distribute new transaction execution at this time, because worker thread 2 has not finished executing a large transaction, so its information is still displayed in the slave_woker_ info table as the last committed transaction. Worker thread 4, because no new transaction is assigned, its information in the slave_woker_ info table is also displayed as information about the last committed transaction. Therefore, the checkpoint information, Bitmap information, and checkpoint_seqno of worker thread 2 and worker thread 4 in slave_woker_info are all old information.

Okay, so far, I've explained three key points in MTS.

What rules are used by the orchestrating thread for transaction distribution.

How the worker thread gets the distributed transaction.

How checkpoints are performed in MTS.

After reading the above, do you have any further understanding of how to implement multithreaded parallel playback from the library MTS? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.