Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Analysis of binlog and internal XA transaction commit process from an online problem

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

1. problem

A new order record is added to the business. After receiving the MySQL slave data message pulled by BinLake, users immediately query the same MySQL slave database according to the order number in the message, and find that sometimes the data cannot be found. Wait for about 500ms~1000ms before querying the database, you can query the data.

Note: BinLake is a set of components for subscribing and consuming MySQL database binlog developed by the Database Technology Department of JD.com Mall. The problem described in this example is that the business side wants to obtain real-time orders and other business messages according to the subscribed binlog.

2. Binlog and internal XA

2.1. The concept of XA

The XA (distributed transaction) specification mainly defines the interface between (global) transaction manager (TM: Transaction Manager) and (local) resource manager (RM: Resource Manager). In order to realize the distributed transaction, XA divides the transaction commit into two phases: 2PC (tow phase commit). XA protocol implements the distributed transaction by dividing the transaction commit into two stages.

Two stages

1) prepare stage

The transaction manager sends a prepare "prepare to commit" request to all the database servers involved. After receiving the request, the database performs processing such as data modification and logging. After the processing is completed, the transaction state is changed to "can commit", and then the result is returned to the transaction manager. That is, in the prepare phase, TM issues prepare instructions to RM, RM operates, and then returns information about success or failure to TM.

2) commit stage

After receiving the response, the transaction manager enters the second stage. If an error occurs in the operation of any database in the first stage, or if the transaction manager does not receive a response from a database, the transaction is considered to have failed and all database transactions are withdrawn. The database server does not receive the second phase of the confirmation commit request, and it will withdraw the transactions that can be committed. If all databases in the first phase are successfully committed, the transaction manager issues a "confirm commit" request to the database server, and the database server changes the transaction's "can commit" status to "commit complete" status, and then returns a reply. That is, for the transaction commit or rollback phase, if TM receives success messages from all RM, TM issues a commit instruction to RM; otherwise, a rollback instruction is issued.

External and internal XA

The XA implementation in MySQL is divided into external XA and internal XA. The former refers to our usual sense of distributed transaction implementation; the latter refers to a single MySQL server, the Server layer as a TM (transaction coordinator, usually by the binlog module), and multiple database instances in the server as RM, and a distributed transaction, that is, MySQL cross-database transactions; that is, a transaction involves two innodb databases in the same MySQL server (at present, only innodb seems to support XA). Internal XA can also be used to ensure the consistency of redo and binlog.

2.2. The consistency of redo and binlog

In order to be compatible with the replication of other non-transaction engines, we MySQL introduced binlog at the server level, which can record the modification operations in all engines, so we can use replication for all engines; however, this situation will lead to the consistency problem between redo log and binlog; MySQL solves this consistency problem through the internal XA mechanism.

The first stage: InnoDB prepare, write/sync redo log;binlog do not do any operation

The second stage: contains two steps, 1 > write/sync Binlog; 2 > InnoDB commit (commit in memory)

Of course, the concept of group commit was introduced after 5.6. there can be some improvements in IO performance, but the overall order of execution will not change.

When step 1 of the second phase is complete and the binlog has been written, the MySQL will assume that the transaction has been committed and persisted (at this step the binlog is already ready and can be sent to the subscriber). At this point, even if the database crashes, the transaction can still be correctly resumed after restarting MySQL. Any failure to include this step before this step will result in the rollback of the transaction.

Most of the second part of the second phase is memory operations, such as releasing locks, releasing mvcc-related read view, and so on. MySQL believes that there will be no errors in this step, and once an error occurs, it will be a crash of the database, and MySQL itself cannot handle it. There is no logic that causes the transaction rollback at this stage. At the program running level, only after this step is completed, the transaction-induced change can be reflected through API or client query.

The following diagram shows when MySQL will send binlog to subscribers.

In theory, it is also possible to send the binlog to the subscriber after the commit phase is complete, but this increases the risk of master-slave latency.

3. Related code

Int MYSQL_BIN_LOG::ordered_commit (THD * thd, bool all, bool skip_commit) {. / / enter flush stage,change_stage (thd, Stage_manager::FLUSH_STAGE, thd, NULL, & LOCK_log);. / / notify the underlying storage engine to process_flush_stage_queue (& total_bytes, & do_rotate, & wait_queue) . / / write the binlog of each thread from cache to file flush_cache_to_file (& flush_end_pos);. / / enter Sync stagechange_stage (thd, Stage_manager::SYNC_STAGE, wait_queue, & LOCK_log,&LOCK_sync)) / / binlog fsync off disk sync_binlog_file (false) / / notify binlog sender that a new binlog disk can be sent to subscriber update_binlog_end_pos (tmp_thd- > get_trans_pos ()); / / enter commit statechange_stage (thd, Stage_manager::COMMIT_STAGE, final_queue,leave_mutex_before_commit_stage, & LOCK_commit); / / transaction status submission process_commit_stage_queue (thd, commit_queue);....

}

After the update_binlog_end_pos, the binlog sending thread can read the latest binlog and send it to the subscriber. When the subscriber receives these binlog, if the process_commit_stage_queue is not finished due to system scheduling and other reasons, then the subscriber happens to initiate the query described in the question at this time, and the query will not be available.

Let's take a look at what process_commit_stage_queue has done.

In process_commit_stage_queue, the commit method binlog_commit of binlog and the commit function trx_commit_in_memory of innodb are called respectively.

Static int binlog_commit (handlerton, THD, bool) {DBUG_ENTER ("binlog_commit"); / * Nothing to do (any more) on commit.*/DBUG_RETURN (0);}

Do nothing in binlog_commit because all the operations related to binlog have been done before.

Finally, take a look at what the storage engine innodb's trx_commit_in_memory has done. Static void trx_commit_in_memory (trx_t trx, /!

< in/out: transaction */const mtr_t mtr, /!< in: mini-transaction oftrx_write_serialisation_history(), or NULL ifthe transaction did not modify anything */bool serialised)/*!< in: true if serialisation log waswritten */{....//释放锁lock_trx_release_locks(trx);ut_ad(trx_state_eq(trx, TRX_STATE_COMMITTED_IN_MEMORY));.....//释放mvcc相关的read viewif (trx->

Read_only | | trx- > rsegs.m_redo.rseg = = NULL) {MONITOR_INC (MONITOR_TRX_RO_COMMIT); if (trx- > read_view! = NULL) {trx_sys- > mvcc- > view_close (trx- > read_view, false);} else {ut_ad (trx- > id > 0); MONITOR_INC (MONITOR_TRX_RW_COMMIT) }}.... / / Clean undo log related to insert operations (note that only the undo of insert needs to be cleaned at this time) if (mtr! = NULL) {if (trx- > rsegs.m_redo.insert_undo! = NULL) {trx_undo_insert_cleanup (& trx- > rsegs.m_redo, false);} if (trx- > rsegs.m_noredo.insert_undo! = NULL) {trx_undo_insert_cleanup (& trx- > rsegs.m_noredo, true);}}

After this step is completed, the changes to the transaction can be queried at run time. Keep in mind, however, that MySQL assumes that the persistence of the transaction has been completed after the binlog has been successfully deprecated.

thirty。 Summary

After the binlog goes down, MySQL will assume that the persistence of the transaction has been completed (after this point, even if the database crashes, the transaction can be correctly resumed after reboot). However, the data changes generated by this transaction are queried by other clients after all the commit is completed. MySQL will send the new binlog to subscribers immediately after the binlog is launched to minimize master-slave latency. However, due to reasons such as multithreaded timing, when a subscriber initiates a query operation immediately after receiving the binlog, it may not query any data changes resulting from the transaction (because the thread at the transaction may not have completed the final commit step at this time).

If the application needs to use binlog as the trigger point for some business logic, it still needs to consider introducing some delay retry mechanism or reconsidering the appropriate implementation architecture.

This article is provided by Wang Zhi, Database Technology Department of JD.com Mall.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report