What is a distributed transaction 07/19 Update SLTechnology News&Howtos

What is a distributed transaction

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly introduces "what is distributed transaction". In daily operation, I believe many people have doubts about what is distributed transaction. The editor consulted all kinds of data and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "what is a distributed transaction"! Next, please follow the editor to study!

Business

What is a transaction? As a back-end development, transactions are sure to be used in daily development as long as there is interaction with the database. Now extract an explanation from wiki to explain what a transaction is.

It is a logical unit in the execution process of database management system, which is composed of a limited sequence of database operations.

The database system has the transaction characteristic, which is different from the file system. In a traditional file system, if a file is being written, the operating system suddenly crashes, and the file may be destroyed. The transaction feature is introduced into the database system, which can ensure the transition of the database from one state to another. When you submit your work, you can ensure that either all changes are saved or none are saved.

Usually a transaction consists of multiple read and write operations.

Transactions have four basic characteristics, commonly known as ACID.

A (Atomicity): atomicity. The transaction will be treated as a whole, either all statements succeed or all fail, and there can be no cases where some statements succeed and some fail.

C (Consistenc): consistency. The state of the database changes from one state to another, and the database integrity constraints remain unchanged before and after the transaction ends. What does it mean that database integrity constraints remain unchanged? For example, if a table name field is a unique constraint, if the name field becomes non-unique after the transaction is committed or rolled back, this breaks the integrity constraint of the database.

I (Isolation): isolation. Multiple concurrent transactions are executed without affecting each other.

D (Durability): persistence. After the transaction is committed, its changes to the database can be permanently saved in the database. So this feature requires that the database system can not lose the data that can be submitted when it needs to be recovered in the event of a crash.

Therefore, in the early days, our system can only rely on database system transactions to ensure the correctness of the business when there is only one data source.

However, with the continuous expansion of the business, there may be tens of millions of data in a single table of our business, and there may be related energy problems when using another database instance. At this time, we will consider sub-database and sub-table. But this may lead to a situation in which a single application connects to multiple data sources. The following figure is an example.

In the above figure, during the purchase process, the merchant balance table and the user balance table are in two separate database instances, so that the separate transaction can ensure that the deduction of the merchant balance or the user balance is either successful or failed. But we cannot guarantee that two transactions succeed or fail at the same time.

In another case, as the system becomes larger and larger, we will choose to split the system application into multiple micro-services, allowing a single application to operate on only one data source. At this time, we will encounter that a business call will invoke multiple applications, each of which operates on the data source separately, as shown in the following figure.

In this case, there is no guarantee that all calls will be successful.

From the above example, we can see that with the development of the business, the traditional stand-alone transactions have been unable to meet the needs of our business, at this time we need distributed transactions to guarantee.

Distributed transaction

Excerpt a paragraph to explain on wiki.

A distributed transaction is a database transaction in which two or more network hosts are involved.

Let's first talk about some theoretical foundations for the implementation of distributed transactions.

Theory of distributed transaction technology

CAP theorem. In a distributed system (a collection of nodes that connect to each other and share data), when it comes to read and write operations, only two of Consistence, Availability, and Partition Tolerance are guaranteed, and the other must be sacrificed.

Excerpt from geek time starts from zero to learn architecture chapter 22 explanation

Although the theoretical definition of CAP is that only two of the three elements can be taken, when we think about it in a distributed environment, we will find that we must choose the P (partition tolerance) element, because the network itself is not 100% reliable and may fail, so partitioning is an inevitable phenomenon. If we choose CA instead of P, then when partitioning occurs, the system needs to disable writing in order to ensure C, and when there is a write request, the system returns error (for example, the current system does not allow writing), which conflicts with A because A requires that no error and no timeout be returned. Therefore, it is theoretically impossible for distributed systems to choose CA architecture, but only CP or AP architecture.

BASE theory is the abbreviation of the following three words.

Basically Available (basic availability): in the event of a failure of a distributed system, it is allowed to lose some of the available functions to ensure the availability of core functions.

Soft state (soft state): allows an intermediate state in the system that does not affect system availability, which refers to inconsistencies in CAP.

Eventually consistent (final consistency): ultimate consistency means that all node data will be consistent after a period of time.

BASE is a supplement to the AP scheme in CAP. Soft state and final consistency are used in BASE to ensure consistency after delay. BASE and ACID are opposite. ACID is a strong consistency model, while BASE sacrifices this strong consistency to allow data to be inconsistent in a short period of time and eventually consistent.

Next, let's take a look at the implementation of distributed transactions.

Distributed transaction implementation scheme

Based on database resource level

2PC two-phase commit protocol

3PC three-phase commit protocol

Based on the business level

TCC

Based on the implementation scheme at the database resource level, because there are multiple transactions, we need to have a role to manage the state of each transaction. We call this role the coordinator and the transaction participant the participant. Participants and coordinators are generally based on a specific protocol, which is now known as the XA interface protocol. Based on the thought setting of coordinator and participant, 2PC and 3PC are proposed to realize XA distributed transaction respectively.

2PC two-phase commit protocol

As the name knows, this process is mainly divided into two steps.

In the first phase, the coordinator (transaction manager) will involve the pre-commit of the transaction, when the database resources begin to be locked. Participants write undo and redo to the transaction log. In the second phase, the participant (resource manager) row commits the transaction or uses the undo log to roll back the transaction to release resources.

The whole process is shown in the following picture.

Distributed transaction commit success scenario:

Distributed transaction rollback scenario:

The advantages of this scheme are: easy to implement, supported by mainstream databases, and strong consistency. MySQL 5.5 is implemented based on XA protocol.

The corresponding scheme also has its shortcomings:

A single point problem for the coordinator. If the coordinator goes down during the submission phase and the participant has been waiting, the resource has been locked and blocked. Although the coordinator could be re-elected, the problem could not be resolved.

If the synchronization blocking time is too long, the transaction of the whole execution process is blocked until the commit is completed and resources are freed. If during the commit / rollback process, the participant has not received instructions because of the network delay, the participant has been blocked.

The data are inconsistent. In the second stage, when the coordinator downtime after sending the first commit signal, the first participant commits the transaction, and the second participant cannot commit the transaction because he has not received the coordinator signal.

Therefore, in view of the shortcomings of 2PC, an improved scheme is proposed, 3PC.

3PC three-phase commit protocol

Three-phase submission, on the basis of two-phase submission, improve two-phase. The three-stage steps are as follows.

1..CanCommit, the coordinator asks the participant if the transaction can be committed.

2.PreCommit, if all participants can commit the transaction, the coordinator issues the PreCommit command, and the participant locks the resource and waits for the final command.

All participants return confirmation messages, and the coordinator sends transaction execution notifications to each transaction, locks resources, and returns the execution.

Some participants return a denial message or the coordinator waits for a timeout. In this case, the coordinator thinks that the transaction cannot be executed properly, issues an interrupt instruction, and each participant withdraws from the standby state.

3.Do Commit, if all responses to ack are made in the second phase, Do Commit will be issued and the transaction will be finally committed, otherwise the interrupt transaction command will be issued and all participants will roll back the transaction.

All participants execute the transaction normally, and the coordinator issues the final submission instruction to release the locked resources.

Some participants failed to execute the transaction, the coordinator waited for a timeout, and the coordinator issued a rollback instruction to release locked resources.

See the picture below for details.

Compared with the two phases, the timeout mechanism is introduced to reduce transaction blocking and solve single point of failure. In the third stage, once the participant is unable to receive the coordinator signal, after waiting for the timeout, the participant executes commit by default to release resources.

Three stages still can not solve the problem of data consistency. If the coordinator issues a rollback command, but due to network problems, the participant cannot receive it within the waiting time, then the participant commits the transaction by default, while other transactions are rolled back, resulting in transaction inconsistency.

TCC

TCC transaction

In order to solve the problem of large granularity resource locking in the process of transaction operation, the industry proposes a new transaction model, which is based on the transaction definition at the business level. The locking granularity is completely controlled by the business itself. It is essentially a way of compensation. It divides the transaction running process into Try and Confirm / Cancel phases. The logic at each stage is controlled by the business code. In this way, the lock granularity of the transaction can be completely controlled freely. Services can achieve higher performance at the expense of isolation.

TCC is the abbreviation of three words Trying,Confirm,Cancel. Unlike 2PC and 3PC based on the database level, TCC is based on the application level. The three actions of TCC are:

Trying:

Complete all business checks (consistency)

Reserve necessary business resources (quasi-isolation)

Confirm:

Really execute the business

Confirm operation should be idempotent

Cancel:

Release business resources reserved during the Try phase

Cancel operation should be idempotent

As mentioned above, it sounds a little difficult to understand, but it doesn't matter if we use a real case to explain it.

Let's simulate a payment process in the mall. The user issues the order using the combination payment, that is, the balance plus red packet payment. The normal process is:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Create an order

Place an order

Call the balance system to deduct the balance

Call the red packet system to deduct the balance of red packets

Change the order status to paid

I'll pay after that.

The actual process is shown in the following figure.

But such a payment process calls multiple sub-services, we can not guarantee that all services will be successful, for example, we failed to call the red packet system to deduct the red packet system. At this time, we encounter an awkward scenario where the method exits unexpectedly due to the failure of the red packet service. At this time, the order status is in the initial state, but the user balance has been deducted. This is very unfriendly to the user experience. Therefore, in this payment process, we must have a mechanism to treat this process as a whole behavior, and we must ensure that the service invocation is either successful or failed, and become a whole transaction.

At this point, we can introduce the TCC transaction to take the whole order issuing process as a whole. After the introduction, because the deduction of the balance system failed, at this time we rolled back the order system and the red packet system. The whole process is shown in the following picture.

Due to the failure of the balance system, we need to undo all changes in this process, so we send a revocation notice to the order system and a revocation notice to the red packet system.

So after the TCC transaction is introduced into the system, we need to modify our calling process.

How to introduce TCC transaction into the system

According to the TCC transaction three steps, at this time we must transform each service into the Try Confirm Cancle three steps,

TCC TRY:

According to the above business, the order system adds a try method to change the order status to PAYING. The balance system adds a try method, which first checks whether the balance is sufficient, then deducts the balance, and then increases the deducted balance to the frozen amount. Red packet system is the same as balance system. From the transformation process, we can see that the TCC try method needs to check the business resources, and this process needs to introduce the intermediate state. Let's look at the whole process according to the picture below.

TCC Confirm:

The first step of TCC TRY if all child service calls are successful, we need to confirm each service at this time. Confirm methods are added to each service. For example, the balance system confirm method is used to set the frozen amount to 0, and the red packet system is as above. The order system changes the order status to SUCCESS. The confirm method needs to pay attention to implementing idempotents. If the order system is updated, it must be judged that the order status is in PAYING before the order can be updated. The whole process is shown in the following picture.

At this point, the TCC transaction framework must be used to drive services. After the TCC transaction manager senses the end of the TRY method, it automatically invokes the confirm method provided by each service to change the state of each service to the final state.

TCC Cancle:

If the method of freezing red packets fails during the TCC Try process, we need to undo all the previous modifications and change them to their initial state. The cancle method also needs to implement idempotency such as the confirm method as shown in the following figure:

Seeing this, we can see that TCC Try succeeds, confirm must succeed, try fails, and cancle must succeed. Because confirm is the key to updating the system to the final state. But the reality is so ruthless that the production system confirm or cancle is sure to fail, so you need the TCC framework to record the result of calling confirm. If the confirm call fails, the TCC framework needs to record it and then call it again at regular intervals.

At this point, the study of "what is a distributed transaction" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.