Analysis of how to carry out distributed transaction 07/11 Update SLTechnology News&Howtos

Analysis of how to carry out distributed transaction

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about how to analyze distributed transactions. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

What is a distributed system is a very important knowledge for back-end engineers. We will gradually understand common distributed technologies and some common distributed system concepts. At the same time, we also need to further understand zookeeper, distributed transactions, distributed locks, load balancing and other technologies, so that you can fully understand the specific combat methods of distributed technology and prepare for the real application of distributed technology.

It is well known that databases can implement local transactions, that is, in the same database, you can allow a set of operations to be performed either correctly or not at all. Special emphasis is placed on local transactions, that is, current databases can only support transactions in the same database. However, today's systems often adopt micro-service architecture, and business systems have independent databases, so there is a transaction requirement across multiple databases, which is called "distributed transaction". So how should we implement distributed transactions when the database does not support cross-database transactions? This article will first sort out the basic concepts and theoretical basis of distributed transactions, and then introduce several commonly used distributed transaction solutions. If you don't talk much nonsense, let's get started.

What is a transaction?

The transaction consists of a set of operations that we hope will all be executed correctly, and if any of the steps in this set of operations go wrong, we need to roll back the previously completed operations. That is, all operations in the same transaction are either performed correctly or none at all.

Four characteristics of transaction ACID

When it comes to transactions, we have to mention the four famous characteristics of transactions.

Atomicity requires that a transaction is an indivisible unit of execution, and either all operations in the transaction are performed or none are performed.

Consistency requires that the integrity constraints of the database are not broken before and after the transaction begins.

The execution of isolated transactions is independent of each other, they do not interfere with each other, and one transaction does not see the data of another running transaction.

Persistence requires that after a transaction is completed, the execution result of the transaction must be persisted. Even if the database crashes, the result of the transaction commit will not be lost after the database is restored.

Note: transactions can only ensure the high reliability of the database, that is, after problems occur in the database itself, the data committed by the transaction can still be recovered; if it is not the failure of the database itself, such as the hard disk is damaged, then the data committed by the transaction may be lost. This belongs to the category of "high availability". Therefore, transactions can only ensure the "high reliability" of the database, and "high availability" requires the cooperation of the whole system.

Isolation level of the transaction

Here is an extension to explain in detail the isolation of transactions.

In the four characteristics of transaction ACID, the required isolation is a strict sense of isolation, that is, multiple transactions are executed serially without any interference with each other. This does fully guarantee the security of the data, but in the actual business system, the performance of this approach is not high. Therefore, the database defines four isolation levels, and the isolation level is inversely proportional to the performance of the database. The lower the isolation level, the higher the database performance, while the higher the isolation level, the worse the database performance.

Problems in concurrent execution of transactions

Let's first take a look at the problems that may arise in the database under different isolation levels:

Update missing when two transactions executed concurrently update the same row of data, it is possible that one transaction will overwrite the update of the other transaction. Occurs when no lock operation is added to the database.

Dirty read data from one transaction to another that has not yet been committed. The data may be rolled back and invalidated. An error occurs if the first transaction is processed with invalid data.

The meaning of unrepeatable degree: a transaction reads the same row of data twice, but gets different results. It is specifically divided into the following two situations:

Virtual reading: when transaction 1 reads the same record twice, transaction 2 modifies the record so that transaction 1 reads a different record for the second time.

Illusion: transaction 1 in the process of two queries, transaction 2 inserts and deletes the table, so that the result of the second query of transaction 1 changes.

What is the difference between unrepeatable reading and dirty reading? Dirty reading reads data that has not yet been committed, while data that cannot be read repeatedly is committed data, except that the data is modified by another transaction in the process of two reads.

Four isolation levels of the database

The database has the following four isolation levels:

Read uncommitted read not committed at this level, when one transaction modifies a row of data, another transaction is not allowed to modify the row's data, but another transaction is allowed to read the row's data. Therefore, at this level, there will be no update loss, but dirty and non-repeatable reads will occur.

Read committed read commit at this level, uncommitted write transactions do not allow other transactions to access the row, so dirty reads do not occur; but transactions that read data allow other transactions to access the row's data, so they cannot be read repeatedly.

Repeatable read repeat read at this level, read transactions prohibit writing transactions, but read transactions are allowed, so there is no situation in which the same transaction reads different data twice (non-repeatable), and write transactions prohibit all other transactions.

Serializable serialization this level requires that all transactions must be executed serially, so all concurrency problems can be avoided, but it is inefficient.

The higher the isolation level, the better the integrity and consistency of the data, but the greater the impact on concurrency performance. For most applications, priority can be given to setting the isolation level of the database system to Read Committed. It can avoid dirty reading and has good concurrency performance. Although it can lead to concurrency problems such as unrepeatable reads, phantom reads, and second-class missing updates, in individual cases where such problems may occur, the application can use pessimistic or optimistic locks.

What is a distributed transaction?

So far, the transactions introduced are local transactions based on a single database, and the current database only supports single-database transactions, not cross-database transactions. With the popularity of micro-service architecture, a large business system is often composed of several subsystems, and these subsystems have their own independent databases. Often a business process needs to be completed by multiple subsystems, and these operations may need to be done in a single transaction. In micro-service systems, these business scenarios are common. At this point, we need to support cross-database transaction support by some means on the database, which is often called "distributed transaction".

Here is a typical example of a distributed transaction-the process of placing an order by a user. When our system adopts micro-service architecture, an e-commerce system is often divided into the following subsystems: commodity system, order system, payment system, points system and so on. The whole process of placing an order is as follows:

When a user browses a product through the commodity system, he takes a fancy to a certain item and clicks to place an order.

At this point, the order system will generate an order.

After the order is created successfully, the payment system provides the payment function.

When the payment is completed, the points system will add points to the user.

Steps 2, 3, and 4 above need to be completed in a transaction. For traditional monolithic applications, implementing transactions is very simple, as long as you put these three steps in a method A, and then identify the method with the @ Transactional annotation of Spring. Spring ensures that either all of these steps are completed or none of them are performed through transaction support in the database. But in this micro-service architecture, these three steps involve three systems and three databases, so we must support distributed transactions between the database and the application system through some cool techs.

CAP theory

CAP theory says that in a distributed system, only two requirements of C, An and P can be met at most.

The meaning of CAP:

C:Consistency consistency whether multiple copies of the same data are the same in real time.

A:Availability availability: within a certain period of time-the system returns a clear result is called the system available.

P:Partition tolerance partition fault tolerance distributes the same service in multiple systems, thus ensuring the downtime of one system, while other systems still provide the same services.

CAP theory tells us that in a distributed system, we can only choose two of the three conditions C, A, P at most. So the question is, which two conditions are more appropriate?

For a business system, availability and partition fault tolerance are two conditions that must be met, and the two complement each other. There are two main reasons why business systems use distributed systems:

Improve the overall performance when the business volume soars and a single server can no longer meet our business needs, we need to use a distributed system and use multiple nodes to provide the same function, so as to improve the performance of the system as a whole. this is the first reason for using distributed systems.

If the single node or multiple nodes are in the same network environment, there will be a certain risk, in case the power outage of the computer room and natural disasters occur in the area, then the business system will be completely paralyzed. In order to prevent this problem, a distributed system is adopted to distribute multiple subsystems in different regions and different computer rooms, so as to ensure the high availability of the system.

This shows that partition fault tolerance is the foundation of distributed system, if partition fault tolerance can not be satisfied, it will be meaningless to use distributed system.

In addition, availability is particularly important for business systems. Today, when talking about user experience, if the business system often has "system anomalies" and long response time, which greatly reduces the users' goodwill towards the system, in today's fierce competition in the Internet industry, competitors in the same field are not enumerated, and the intermittent unavailability of the system will immediately cause users to flow to competitors. Therefore, we can only gain system availability and partition fault tolerance at the expense of consistency. This is the BASE theory that will be introduced below.

BASE theory

CAP theory tells us a tragic but accepted fact that we can only choose two conditions among C, An and P. As for business systems, we often choose to sacrifice consistency in exchange for system availability and partition fault tolerance. However, it should be pointed out here that the so-called "sacrifice consistency" does not mean giving up data consistency completely, but sacrificing strong consistency for weak consistency. Next, let's introduce the BASE theory.

BA:Basic Available is basically available

In some cases of force majeure, the whole system can still guarantee "availability", that is, it can still return a definite result within a certain period of time. But the difference between "basic availability" and "high availability" is:

"A certain amount of time" can be appropriately extended. When a big promotion is held, the response time can be appropriately extended.

Return a downgrade page to some users and directly return a downgrade page to some users, thus relieving the pressure on the server. Note, however, that returning to the downgrade page still returns a clear result.

S:Soft State: the state of different copies of the same data that does not need to be consistent in real time.

E:Eventual Consisstency: the status of different copies of the same data is ultimately consistent. It is not necessary to be consistent in real time, but it must be guaranteed to be consistent after a certain period of time.

Acid-base balance

ACID can guarantee the strong consistency of transactions, that is, the data is consistent in real time. This is no problem in local transactions, in distributed transactions, strong consistency will greatly affect the performance of distributed systems, so distributed systems can follow the BASE theory. However, different business scenarios of distributed systems have different requirements for consistency. For example, in a transaction scenario, strong consistency is required, and you need to follow the ACID theory, while in scenarios such as sending SMS verification codes after successful registration, real-time consistency is not required, so you can follow the BASE theory. Therefore, it is necessary to find a balance between ACID and BASE according to the specific business scenario.

Distributed transaction protocol

Here are several protocols for implementing distributed transactions.

Understand 2PC and 3PC protocols

In order to solve the problem of distributed consistency, predecessors summarized many typical protocols and algorithms in the process of repeated tradeoffs between performance and data consistency. Among them, the more famous are the second-order commit protocol (2 Phase Commitment Protocol) and the third-order commit protocol (3 Phase Commitment Protocol).

2PC

The most common solution for distributed transactions is two-phase commit. In a distributed system, although each node can know the success or failure of its own operation, it cannot know the success or failure of the operation of other nodes. When a transaction spans multiple nodes, in order to maintain the ACID characteristics of the transaction, it is necessary to introduce a component as a coordinator to control the operation results of all participant nodes and finally indicate whether these nodes want to really commit the operation results.

Therefore, the algorithm idea of two-stage submission can be summarized as follows: the participants notify the coordinator of the success or failure of the operation, and then the coordinator decides whether each participant should submit the operation or abort the operation according to the feedback information of all participants.

The so-called two stages are: the first stage: the preparatory stage (voting stage) and the second stage: the submission stage (implementation phase).

The first stage: voting stage

The main purpose of this phase is to find out whether each participant in the database cluster can execute the transaction normally. The specific steps are as follows:

The coordinator sends transaction execution requests to all participants and waits for participants to feedback the transaction execution results.

After receiving the request, the transaction participant executes the transaction, but does not commit, and records the transaction log.

The participant feeds back the execution of his transaction to the coordinator while blocking the subsequent instructions of the coordinator.

The second phase: transaction commit phase

After the inquiry of the coordinator in the first phase, each participant will reply to the execution of their own transaction, at this time, there are three possibilities:

All participants returned to be able to perform the transaction normally.

One or more participants replied that the transaction execution failed.

The coordinator waits for a timeout.

In the first case, the coordinator will issue a notification to all participants to commit the transaction, as follows:

The coordinator sends a commit notification to each participant requesting that the transaction be committed.

After receiving the transaction commit notification, the participant performs the commit operation and then releases the occupied resources.

The participant returns the transaction commit result information to the coordinator.

For the second and third cases, the coordinator believes that the participant cannot execute the transaction successfully, so a transaction rollback notification is sent to each participant for the sake of the consistency of the entire cluster data. The specific steps are as follows:

The coordinator sends a transaction rollback notification to each participant, requesting that the transaction be rolled back.

After receiving the transaction rollback notification, the participant performs the rollback operation and then releases the occupied resources.

The participant returns the transaction rollback result information to the coordinator.

The two-phase commit protocol solves the problem of strong data consistency in distributed database. Its principle is simple and easy to implement, but its shortcomings are obvious. The main shortcomings are as follows:

Single point problem: the coordinator plays an important role in the whole two-phase commit process. Once the server of the coordinator goes down, it will affect the normal operation of the entire database cluster. For example, in the second phase, if the coordinator is unable to send transaction commit or rollback notifications due to failure, then the participants will always be blocked and the entire database cluster will not be able to provide services.

Synchronous blocking: in the process of two-phase commit execution, all participants need to listen to the unified scheduling of the coordinator, during which they are in a blocking state and cannot engage in other operations, which is extremely inefficient.

Data inconsistency: although the two-phase commit protocol is designed for strong consistency of distributed data, there is still the possibility of data inconsistency. For example, in the second phase, it is assumed that the coordinator issued a transaction commit notification, but because of network problems, the notification was only received by some participants and performed the commit operation, while the rest of the participants remained blocked because they did not receive the notification. At this time, there is a data inconsistency.

3PC

In view of the problems existing in the two-phase commit, the three-phase commit protocol reduces the blocking time of the whole cluster and improves the system performance by introducing a "pre-inquiry" phase and a timeout strategy. The three stages of the three-phase submission are: can_commit,pre_commit,do_commit.

The first stage: can_commit

At this stage, the coordinator will ask each participant whether the transaction can be executed normally, and the participant will reply to an estimated value according to his or her own situation. This process is light compared to the actual transaction execution. The specific steps are as follows:

The coordinator sends a transaction inquiry notification to each participant, asks if the transaction operation can be performed, and waits for a reply.

Each participant returns an estimated value according to his or her own situation, and returns certain information if it is estimated that he can perform the transaction normally, and enters the preparatory state, otherwise it returns negative information.

The second stage: pre_commit

At this stage, the coordinator will take corresponding actions according to the inquiry results of the first stage. There are three main types of inquiry results:

All participants return confirmation information.

One or more participants returned negative information.

The coordinator waits for a timeout.

In the first case, the coordinator sends a transaction execution request to all participants, as follows:

The coordinator sends transaction execution notifications to all transaction participants.

After receiving the notification, the participant executes the transaction, but does not commit.

The participant returns the transaction execution to the client.

In the above steps, if the participant waits for a timeout, the transaction is interrupted. For the second and third cases, the coordinator believes that the transaction cannot be executed properly, so he issues an abort notification to each participant, requesting to exit the standby state. The specific steps are as follows:

The coordinator sends abort notifications to all transaction participants

After receiving the notification, the participant interrupts the transaction.

The third stage: do_commit

If the second phase of the transaction is not interrupted, the coordinator of this phase will decide whether to commit or roll back the transaction based on the result returned by the transaction execution, which can be divided into three cases:

All participants can perform transactions normally.

One or more participants failed to execute the transaction.

The coordinator waits for a timeout.

In the first case, the coordinator initiates a transaction commit request to each participant, the specific steps are as follows:

The coordinator sends transaction commit notifications to all participants.

All participants perform the commit operation after receiving the notification and release the occupied resources.

The participant feedback the result of the transaction submission to the coordinator.

For the second and third cases, the coordinator believes that the transaction cannot be executed properly, so a transaction rollback request is sent to each participant, as follows:

The coordinator sends transaction rollback notifications to all participants.

All participants perform the rollback operation after receiving the notification and release the occupied resources.

The participant feedback the result of the transaction submission to the coordinator.

The above is the editor for you to share how to analyze distributed transactions, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.