How to understand the distributed transaction of database 07/19 Update SLTechnology News&Howtos

How to understand the distributed transaction of database

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article focuses on "how to understand the distributed transactions of the database", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn "how to understand the distributed transactions of the database"!

Database transaction

Database transaction (transaction for short) is a logical unit in the execution process of database management system, which is composed of a limited sequence of database operations.

Either all or none of these operations are performed, which is an indivisible unit of work.

Several typical characteristics of database transactions:

Atomicity (Atomicity)

Consistency (Consistency)

Isolation (Isolation)

Persistence (Durabilily)

The abbreviation is ACID:

Atomicity: transactions are executed as a whole, and either all or none of the operations on the database are performed.

Consistency: means that the data will not be destroyed before and after the transaction ends. If An account transfers 10 yuan to B account, regardless of success or not, the total amount of An and B will remain the same.

Isolation: when multiple transactions are accessed concurrently, transactions are isolated from each other, that is, one transaction does not affect the running effect of other transactions. In short, there is no intrusion into the river between affairs.

Persistence: indicates that the operational changes made by the transaction to the database after the transaction is completed will be persisted in the database.

The implementation principle of transaction

Local transaction

The transactions under the traditional single server and single relational database are local transactions. Local transactions are managed by resource managers, and JDBC transactions are a very typical local transaction.

Transaction log

The InnoDB transaction log includes redo log and undo log.

Redo log (redo log): usually a physical log, which records the physical changes to the data page, rather than how a row or rows are modified, it is used to restore the physical data page after the submission.

Undo log (rollback log): it is a logical log, which is different from the physical log recorded by redo log.

It can be thought that when a record is delete, a corresponding insert record is recorded in undo log, and when a record is update, it records a corresponding update record.

The idea of implementing the transaction ACID feature:

Atomicity: is implemented using undo log. If there is an error during transaction execution or the user executes rollback, the system returns the state of the transaction start through the undo log log.

Persistence: using redo log to achieve, as long as the redo log log is persisted, when the system crashes, the data can be recovered through redo log.

Isolation: transactions are separated from each other through locks and MVCC.

Consistency: consistency is achieved through rollback, recovery, and isolation in concurrency.

Distributed transaction

Distributed transaction means that the participants of the transaction, the server supporting the transaction, the resource server and the transaction manager are located on different nodes of different distributed systems.

To put it simply, distributed transaction refers to the transaction in distributed system, and its existence is to ensure the data consistency of different database nodes.

Why do you need distributed transactions? Next, it is divided into two aspects:

Distributed transaction under micro-service architecture

With the rapid development of the Internet, lightweight and well-defined micro-services have stepped onto the stage of history.

For example, the service for a user to place an order and buy a live gift is split into three service, namely Gold Coin Service (coinService), order Service (orderService) and Gift Service (giftService).

These services are deployed on different machines (nodes), and the corresponding databases (gold coin database, order database, gift database) are also on different nodes.

Users order to buy gifts, gift database, gold coin database, order database on different nodes, using local transactions is not allowed, so how to ensure data consistency on different databases (nodes)? This requires distributed transactions!

Distributed transactions under sub-database and sub-table

With the development of business, the data of the database is becoming more and more huge, more than ten million levels of data, we need to divide it into sub-database and sub-table (the company used to use Mycat sub-database sub-table, later using Sharding-JDBC).

As soon as there is a sub-database, the data is distributed on different nodes, for example, some are in the computer room in Shenzhen, some are in the computer room in Beijing, if you want to use local affairs to guarantee, you are indifferent ~ you still need distributed transactions.

For example, the account data of A transferring 10 yuan to BMaga An is in Beijing computer room, while B's account data is in Shenzhen computer room.

The process is as follows:

CAP Theory & BASE Theory

To learn distributed transactions, of course, you need to understand CAP theory and BASE theory.

CAP theory

As the basic theory of distributed system, CAP theory means that in a distributed system, Consistency (consistency), Availability (availability) and Partition tolerance (partition fault tolerance) can only be realized at most two points at the same time.

Consistency: consistency refers to whether data can be consistent across multiple replicas.

For example, after a data is updated in a partition node, the data read in other partition nodes is also the updated data.

Availability (A:Availability): availability means that the service provided by the system must be available all the time, and the result can always be returned in a limited time for every operation request of the user. The focus here is on "limited time" and "return results".

Partition tolerance: when a distributed system encounters any network partition failure, it still needs to be able to provide services that meet the consistency and availability.

BASE theory

BASE theory is an extension of AP in CAP. For our business system, we consider sacrificing consistency for system availability and partition fault tolerance.

BASE is the abbreviation of three phrases: Basically Available (basic available), Soft State (soft state), and Eventually Consistent (ultimate consistency).

Basically Available: basically available. This is achieved by supporting local failures rather than global system failures.

If users are partitioned on five database servers, the failure of one user database affects only 20% of the users on this particular host, and other users are not affected.

Soft State: soft state. States can be out of sync for a period of time.

Eventually Consistent: consistent in the end. In the end, it is fine that the data are consistent, not strongly consistent all the time.

Several Solutions of distributed transaction

There are mainly the following distributed transaction solutions:

2PC (two-phase submission) scheme

TCC (Try, Confirm, Cancel)

Local message table

Best effort notification

Saga transaction

Two-stage submission plan

Two-phase commit scheme is a commonly used distributed transaction solution. The commit of a transaction is divided into two phases: the preparation phase and the commit execution plan.

The successful submission of the second phase:

In the prepare phase, the transaction manager sends a prepare message to each resource manager, which returns success if the resource manager's local transaction operation executes successfully.

During the commit execution phase, if the transaction manager receives success messages from all resource managers, a commit message is sent to each resource manager, and the RM executes the commit according to the instructions of TM.

As shown in the figure:

Failure of the second phase submission:

In the preparation phase, the transaction manager sends a prepare message to each resource manager, which returns success if the resource manager's local transaction operation succeeds and fails if execution fails.

During the commit execution phase, if the transaction manager receives a message that any of the resource managers failed, a rollback message is sent to each resource manager.

The resource manager rolls back the local transaction operation according to the instructions of the transaction manager, releasing all lock resources used during the transaction.

The 2PC scheme is easy to implement and low cost, but it mainly has the following disadvantages:

Single point of problem: if the transaction manager fails, the resource manager will always be locked.

Performance problems: all resource managers are in a synchronous blocking state during the transaction commit phase, occupying system resources, and do not release resources until the commit is completed, which can easily lead to performance bottlenecks.

Data consistency issues: if some resource managers receive submitted messages and some do not, it will lead to data inconsistencies.

TCC (compensation mechanism)

TCC adopts the compensation mechanism, and its core idea is to register a corresponding confirmation and compensation (revocation) operation for each operation.

TCC (Try-Confirm-Cancel) implements distributed transactions through the decomposition of business logic.

For a specific business service, the TCC distributed transaction model requires the business system to implement the following three pieces of logic:

Try phase: try to execute, complete all business consistency checks, and reserve the necessary business resources.

Confirm phase: this phase confirms the submission of the business without any checks, because the Try phase has already been checked, and the default Confirm phase will not make mistakes.

Cancel phase: if the business fails to execute, it releases all business resources occupied by the Try phase and rolls back all operations performed by the Confirm phase.

The TCC distributed transaction model includes the following three parts:

Master business service: the main business service is responsible for initiating and completing the entire business activity.

Slave business service: the slave business service is the participant of the whole business activity, implementing Try, Confirm and Cancel operations for the master business service to invoke.

Business activity manager: the business activity manager manages and controls the entire business activity, including recording transaction status, invoking Confirm operations from business services, invoking Cancel operations from business services, and so on.

Let's take an example of a user placing an order to buy a gift to simulate the process of implementing a distributed transaction in TCC: suppose the user A balance is 100 gold coins and has 5 gifts. A spent 10 gold coins, placed an order and bought 10 roses. Balances, orders and gifts are all in different databases.

The Try phase of TCC:

Generate an order record with the order status to be confirmed.

Update the balance in the gold coin of user A's account to 90 and freeze the gold coin to 10 (reserve business resources).

Increase the number of gifts to 5 and pre-increase the number to 10.

After the Try is successful, it enters the Confirm phase.

Any exception in the Try process will enter the Cancel stage.

The Confirm phase of TCC:

The order status is updated to paid.

The update user balance is 90, which can be frozen to 0.

The number of gifts for users is updated to 15 and pre-increased to 0.

Any exception in the Confirm process will enter the Cancel stage.

If the Confirm procedure executes successfully, the transaction ends.

The Cancel phase of TCC:

Change the order status to cancelled.

Update the user balance back to 100.

Update the number of gifts for users to 5.

The TCC scheme allows applications to customize the granularity of database operations, reduces lock conflicts, and improves performance.

But there are also the following disadvantages:

The application is intrusive, and business logic is needed in the three stages of Try, Confirm and Cancel.

Different rollback strategies need to be implemented according to different reasons for network and system failures, which are difficult to achieve, generally with the help of TCC open source framework, ByteTCC,TCC-transaction,Himly.

Local message table

EBay originally proposed the scheme of local message table to solve the problem of distributed transactions. At present, this scheme is widely used in the industry, and its core idea is to split the distributed transaction into local transactions for processing.

Take a look at the basic implementation flowchart:

The basic implementation ideas are as follows:

Sender:

You need to have a message table that records information about the status of the message.

Business data and message tables are in the same database, that is, to ensure that they are both in the same local transaction.

After processing the business data and writing the message table in the local transaction, write the message to the MQ message queue.

The message is sent to the message consumer, and if the delivery fails, it will be retried.

Message consumer:

Process the messages in the message queue and complete your own business logic.

At this point, if the local transaction is processed successfully, it indicates that it has been processed successfully.

If the local transaction fails, execution will be retried.

If it is a failure of the business, send a business compensation message to the message producer to notify the rollback and other operations.

The producer and consumer scan the local message table regularly and send the unfinished message or failed message again. If there is a reliable automatic reconciliation logic, this scheme is still very practical.

Advantages and disadvantages: the advantage of this scheme is that it solves the problem of distributed transaction and realizes the final consistency. The disadvantage is that the message table is coupled to the business system.

Best effort notification

What is the maximum notice? Best effort notification is also a distributed transaction solution.

The following is an example of a corporate online bank transfer:

The corporate network bank system calls the front interface and jumps to the transfer page.

The corporate network bank calls the transfer system interface.

The transfer system completes the transfer processing and initiates a notification of the transfer result to the enterprise online banking system. If the notification fails, the transfer system will repeat the notification according to the policy.

If the enterprise online banking system does not receive the notification, it will actively call the interface of the transfer system to query the transfer result.

The transfer system will encounter situations such as foreign exchange refund and will come back to reconcile the account regularly.

The goal of the best effort notification program is that the initiator will try its best to notify the recipient of the business processing result through a certain mechanism.

The best effort notification implementation mechanism is as follows:

Best effort notification solution: to achieve best effort notification, you can use MQ's ACK mechanism.

The plan is as follows:

The initiator sends the notice to MQ.

The receiving party listens for MQ messages.

After receiving the message, the receiving party processes the business and responds to the ACK.

If the recipient does not respond to the ACK, the MQ will interval repeated notifications such as 1min, 5min, 10min, etc.

The recipient can use the message proofreading interface to ensure the consistency of the message.

Flow chart of money transfer business realization:

The interaction process is as follows:

The user requests the money transfer system to transfer the money.

The transfer system completes the transfer and sends the transfer result to MQ.

The corporate network banking system monitors the MQ and receives the notification of the transfer result. If the message is not received, the MQ will send the notification repeatedly. Receive the transfer result and update the transfer status.

The enterprise network banking system can also actively query the transfer result query interface of the transfer system and update the transfer status.

Saga transaction

The Saga transaction is proposed by Hector Garcia-Molina and Kenneth Salem of Princeton University.

Its core idea is to split the long transaction into several local short transactions, which is coordinated by the Saga transaction coordinator. If it ends normally, it is completed normally. If a step fails, the compensation operation is called once according to the opposite order.

Introduction to Saga:

Saga = Long Live Transaction (LLT, long-live transaction).

LLT = T1 + T2 + T3 +... + Ti (Ti is a local short transaction).

Each local transaction Ti has a corresponding compensation Ci.

The order in which Saga is executed:

Normal condition: T1, T2, T3. Tn

Abnormal condition: T1 T2 T3 C3 C2 C1

Saga has two recovery strategies:

Restore backwards and compensate for completed transactions if any local subtransactions fail. In the case of abnormal conditions, the execution order is T1 T2 Ti Ci C2 C1.

Forward recovery, that is, retry the failed transaction, assuming that each subtransaction will succeed in the end. The order of execution is: T1, T2. TJ (failed), Tj (retry),..., Tn.

For example, if a user places an order and buys more than 10 roses for 10 yuan, there are:

T1 = place an order

T2 = deduct 10 yuan from the user

T3 = user plus 10 roses

T4 = inventory minus 10 roses

C1 = cancel the order

C2 = add 10 yuan to the user

C3 = user minus 10 roses

C4 = inventory plus 10 roses

Suppose that the transaction executes until T4 has an abnormal rollback, and when C4 wants to add the roses back to the inventory, it is found that the user's roses have been used up. This is a disadvantage of Saga, which is caused by the lack of isolation between transactions.

This problem can be solved through the following solutions:

Add the logic of logic lock at the application level.

The Session layer is isolated to ensure serialization.

At the business level, this part of the funds is isolated by freezing funds in advance.

In the process of business operation, the update is obtained by reading the current status in time.

At this point, I believe you have a deeper understanding of "how to understand the distributed transactions of the database". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.