How to understand Java distributed transaction 07/08 Update SLTechnology News&Howtos

How to understand Java distributed transaction

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

Today, the editor will share with you how to understand Java distributed transactions. The content is detailed and the logic is clear. I believe most people still know too much about this, so share this article for your reference. I hope you can get something after reading this article. Let's take a look at it.

If a transaction invokes operations on different servers, it becomes a distributed transaction.

Consider the following scenario: when you get paid, transfer your monthly salary of RMB 1024 from Alipay to Yu'e Bao.

If the Yu'e Bao system fails after deducting ¥1024 from the Alipay account, and the Yu'e Bao account does not increase by ¥1024, there will be data inconsistencies.

The shadow of this can be found in many systems:

When placing an order, you need to insert a piece of data into the order table, and then subtract one from the inventory

If you click on the advertisement during the search, you need to record the click event first, and then inform the merchant system to deduct the advertising fee.

At the end of a distributed transaction, the atomic nature of the transaction requires that all servers participating in the transaction must commit or abandon the transaction. To achieve this, one of the servers assumes the role of coordinater, which ensures that all servers get the same results.

The way the coordinater works depends on the protocol it chooses, and two-phase commit is the most commonly used protocol for distributed transactions.

1 、 two-phase commit protocol

The two-phase commit protocol (two-phase commit protocol) is designed to allow any participant to abandon his or her own part of the transaction. Due to the requirement of transaction atomicity, if part of the transaction is abandoned, then the whole distributed transaction must also be abandoned.

In the first phase of the protocol, each participant votes on whether to abandon or commit the transaction, and once the participant requests to commit the transaction, it is not allowed to abandon the transaction. Therefore, before a participant asks to commit a transaction, it must ensure that it will eventually be able to execute its own part of the distributed transaction, even if the participant fails and is replaced halfway.

If the participant of a transaction is finally able to commit the transaction, then the participant can be said to be in the prepared state of the transaction. In order to be able to commit, each participant must save all changed objects in the transaction and their own state (prepared) to persistent storage.

In the second phase of the protocol, each participant in the transaction executes the final unified decision. If any participant votes to abandon the transaction, the final decision is to abandon the transaction. If all participants vote to commit the transaction, the final decision is to commit the transaction.

The problem is to ensure that every participant votes and reaches a common decision. When there is no failure, the protocol is quite simple. However, the protocol must work properly in the event of various failures, such as server crashes, message loss, or temporary inability of the service to communicate.

2. Implementation of two-phase submission

To implement the two-phase commit protocol, coordinators and participants in distributed transactions usually communicate according to the following interfaces:

CanCommit (trans)?

The coordinator asked the participants whether they could submit the transaction, and the participants replied to their voting results.

DoCommit (trans)

The coordinator tells the participant the part of the transaction in which it is committed.

DoAbort (trans)

The coordinator tells the participant to give up that part of its transaction.

HaveCommitted (trans, participant)

The participant uses this action to confirm to the coordinator that it has committed the transaction.

GetDecision (trans)?

When the participant does not receive a reply within a period of time after voting for Yes, the participant uses this operation to ask the coordinator about the voting result of the transaction. This operation is used to recover from a server crash or message delay.

Phase 1 (Voting phase): 1) the coordinator sends canCommit to all participants in the distributed transaction. Request 2) when the participant receives the canCommit request, it replies to the coordinator with his or her vote (Yes/No). Before voting for Yes, it holds all objects in persistent storage, ready to commit. If you vote for No, the participant will give up immediately. Phase II (submission phase): 1) the coordinator collects all votes (including its own votes). A) if there is no fault and all votes are Yes, then the coordinator will decide to commit the transaction and send a doCommit request to all participants b) otherwise, the coordinator decides to abandon the transaction and sends a doAbort request to all participants who voted Yes. 2) the waiting person who votes for Yes awaits the doCommit or doAbort request sent by the coordinator. Once the participant receives any kind of request message, it will abandon or commit the transaction based on the request. If the request is a commit transaction, he also sends a haveCommitted to the coordinator to confirm that the transaction has been committed. 3. Fault model of distributed transaction

In the process of execution in a distributed transaction, disk failure, process crash, message loss, timeout and so on may occur.

Two-phase submission is a consensus agreement in which it is impossible to reach a consensus if the process collapses. However, the two-phase commit reached a consensus under these conditions, because the collapse of the process was shielded and the crashed process was replaced by a new process. the state of the new process is set according to the information stored in the persistent storage and the information owned by other processes.

3.1. Fault model

Lampson proposed a fault model of distributed transaction, including hard disk failure, server failure and communication failure. The fault model claims that the algorithm can be guaranteed to work correctly in the event of a failure, but it can not be handled correctly for unforeseen catastrophic failures. Although errors can occur, they can be found and handled before incorrect behavior occurs. Lampson's fault model includes the following faults:

A write operation to persistent storage may fail (either because the write operation is invalid or because the wrong value is written). For example, writing data to the wrong disk block is considered a catastrophic failure. File storage may be corrupted. When reading data in persistent storage, you can determine whether the data block is corrupted by the checksum.

The server may crash occasionally. When a crashed server is replaced by a new process, its variable memory is reset and the data before the crash is lost. The new process then executes a recoverable process that sets the value of the object based on the information in the persistent store and the information obtained from other processes, including the value of the object related to the two-phase commit protocol. When a processor fails, the server also crashes so that it does not send the wrong message or write the wrong value to persistent storage, that is, it does not cause random failures. A server crash can occur at any time, especially during recovery.

There can be arbitrarily long delays in message delivery. Messages may be lost, duplicated, or corrupted. The receiver (through the checksum) can detect the damaged message. Undiscovered damaged messages and forged messages can lead to catastrophic failures.

Using this fault model of persistent storage, processors, and communications, you can design a reliable system whose components can deal with any single failure and provide a simple fault model. In particular, reliable storage (stable storage) can provide atomic write operations in the event of a write operation failure or a process crash. It is achieved by copying each block of data to two disks. At this point, a write operation is used for two disk blocks, and if one disk fails, the other good disk can also provide the correct data. Reliable processors (stable processor) use reliable storage to recover objects after a crash. Communication errors can be shielded through a reliable remote procedure call mechanism.

3.2. Timeout of two-phase commit protocol

At different stages of a two-phase protocol, the coordinator or participant will encounter a scenario in which his part of the protocol cannot be processed until the next request or reply is received.

First consider a situation where a voter votes for Yes and waits for the coordinator to send back the final decision, that is, to tell it whether to commit or abandon the transaction. In this way, the result of the participant is uncertain, and it cannot be further processed until the result of the vote is obtained from the coordinator. Participants cannot unilaterally decide what to do next, and the objects used by the transaction cannot be released for other things. The participant sends a getDecision request to the coordinator to obtain the result of the transaction and cannot enter the second phase of the two-phase protocol until the reply is received.

By the same token, if the coordinator fails, the participant will not be able to obtain the agreement until the coordinator is replaced, which may result in a long delay for the participant in the uncertain state.

The way to get the final decision without relying on the coordinator is through the collaboration of the participants. The advantage of this strategy is that it can be used in the event of a coordinator failure.

4. Fault handling for two-stage submission

When the participant fails:

When the coordinator fails:

5. Performance of two-phase commit

Assuming that everything works well, that is, when the coordinator participant does not fail and the communication is normal, the two-phase commit protocol with N participants requires N canCommit messages and responses, followed by N doCommit messages. In this way, the message overhead is proportional to 3N, and the time cost is 3 message round trips. Because the protocols can still function without haveCommitted messages (their function is to notify the server to delete outdated coordinator messages), haveCommitted messages are not taken into account in estimating protocol overhead.

In the worst case, any number of server and communication failures may occur during the execution of the two-phase commit protocol. Although the protocol cannot specify a time limit for the completion of the protocol, it can correctly handle continuous failures (service crashes or message loss) and guarantee final completion.

6. Use message queues to avoid distributed transactions

5.1. Message queuing

Because of the serious performance problems of distributed transactions, when designing highly concurrent services, we often solve the problem of data consistency through other ways.

For example, after you order and pay for the fried liver in Yaoji, a famous fried liver in Beijing, they will not directly give you the fried liver you ordered, but will give you a small ticket and ask you to take it to the shipping area and wait in line to pick it up. Why do they separate the act of paying and picking up goods? There are many reasons, one of which is to enhance their reception capacity (higher concurrency).

Back to our question, as long as this small ticket is there, you can finally get fried liver. The same is true of the transfer service. When 10,000 is deducted from the Alipay account, we only need to generate a voucher (message) that says "increase the Yu'e Bao account by 10,000". As long as the voucher (message) can be reliably preserved, we can finally increase the Yu'e Bao account by 10,000 with this voucher (message), that is, we can rely on this voucher (message) to complete the final consistency.

In this way, our above transfer becomes the following process:

Alipay sends messages to the message queue before the debit transaction is submitted. At this time, the message queue only records the message and does not send the message to Yu'e Bao.

When the Alipay debit transaction is successfully submitted, an acknowledgement is sent to the message queue. After receiving the confirmation instruction, the message queue sends the message to Yu'e Bao.

When Alipay debit transaction fails to submit, send cancellation to message queue. After receiving the cancellation instruction, message queuing cancels the message and the message will not be sent.

For such an unacknowledged message, you need to go to the Alipay system to check the status of the message and update it. (because Alipay may hang up after the debit transaction is successfully submitted, the status of the message is not updated to: "confirm sending". As a result, the message cannot be sent.

5.2. Repeat delivery

Another serious problem is the repeated delivery of messages. take our Alipay transfer to Yu'e Bao as an example, if the same message is repeated twice, then our Yu'e Bao account will increase by 20,000 instead of 10,000.

These are all the contents of the article "how to understand Java distributed transactions". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.