Detailed explanation of distributed transaction 07/13 Update SLTechnology News&Howtos

Detailed explanation of distributed transaction

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

At present, many mature open source distributed transaction solutions can be found, such as Ali's fescar, Ant Financial Services Group's Seata,LCN (https://github.com/codingapi/tx-lcn) 2PC non-intrusive transaction. There are also such as TCC transaction implementation hmily (https://github.com/yu199195/hmily), tcc-transaction (https://github.com/changmingxie/tcc-transaction), etc.)

Seata: https://github.com/seata/seata

Fescar: https://github.com/alibaba/fescar

Tcc-transaction: https://github.com/changmingxie/tcc-transaction

Hmily: https://github.com/yu199195/hmily

LCN: https://github.com/codingapi/tx-lcn

There is a marked CAP theory for distributed transactions: C _ mai _ A _ P cannot satisfy all of them at the same time, but two at most. Cassandra, Dynamo, etc., give priority to AP by default, weaken case HBASE, MongoDB, etc., and give priority to CP by default, weakening A.

The BASE model contains three elements:

BA:Basically Available, basically available

S:Soft State, soft state, status can be out of sync for a period of time

E:Eventually Consistent, finally consistent, the final data is consistent, not strong consistent all the time.

BASE model is completely different from ACID model, which satisfies CAP theory. By sacrificing strong consistency to obtain availability, it is generally applied in the application layer of service-oriented system or big data processing system, through achieving the final consistency to meet the vast majority of business needs.

The purpose of distributed transactions is to ensure data consistency in distributed storage, while cross-database transactions will encounter a variety of uncontrollable problems, such as individual node downtime, ACID like stand-alone transactions can not be expected.

1 、 Two/Three Phase Commit

2PC, which is called two-phase submission in Chinese. In a distributed system, although each node can know the success or failure of its own operation, it cannot know the success or failure of the operation of other nodes. When a transaction spans multiple nodes, in order to maintain the ACID characteristics of the transaction, it is necessary to introduce a component as a coordinator to control the operation results of all nodes (called participants) and finally indicate whether these nodes want to actually commit the operation results. The algorithm for two-phase commit is as follows:

The first stage:

The coordinator asks all participant nodes if they can perform the submit operation.

Each participant begins the preparatory work for the execution of the transaction, such as locking resources and reserving resources.

The participant responds to the coordinator that if the preparation of the transaction is successful, the response "can be committed", otherwise the response "reject commit".

The second stage:

If all participants respond "can submit", the coordinator sends a "formal submit" command to all participants. The participant completes the formal submission, releases all resources, and then responds to "done". The coordinator collects the "complete" response from each node and ends the Global Transaction.

If one participant responds to "reject submission", the coordinator sends a "rollback operation" to all participants, releases all resources, and then responds to "rollback complete". After the coordinator collects the "rollback" response from each node, cancel the Global Transaction.

The biggest problem with the two-stage commit is item 3). If the participant does not receive a decision at the second level after the first phase is completed, then the data node will enter a "bewildered" state, which will block the entire transaction. In other words, the coordinator Coordinator is very important for the completion of the transaction, and the availability of Coordinator is the key.

Therefore, we introduce a three-paragraph commit, which is described on Wikipedia as follows, and he changes the first paragraph of the second submission break into two: ask, and then lock the resource. Finally, it is really submitted. The core idea of the three-paragraph submission is that resources are not locked when asked, and resources are not locked until everyone agrees. However, three-phase commit also has some shortcomings. In order to avoid data inconsistency completely from the protocol level, we can use Paxos or Raft algorithm.

Currently, two-phase submission and three-phase submission have the following limitations, which are not suitable for use in micro-service architecture:

All operations must be transactional resources (such as databases, message queues, EJB components, etc.) and have limitations (HTTP protocol is mostly used in micro-service architecture), which is more suitable for traditional single applications.

Because of the strong consistency, resources need to wait within the transaction, which has a great impact on performance and low throughput, so it is not suitable for high concurrency and high performance business scenarios.

2. Try Confirm Cancel (TCC)

A complete TCC service consists of a master business service and several slave business services. The master business service initiates and completes the whole business activity. The TCC model requires that the slave service provides three interfaces: Try, Confirm and Cancel.

Try: complete all business checks and reserve the necessary business resources.

Confirm: really execute the business without any business check; only use the business resources reserved in the Try phase; the Confirm operation is idempotent.

Cancel: release the business resources reserved during the Try phase; the Cancel operation is idempotent.

The whole TCC business is divided into two phases:

The first phase: the master business service invokes all slave business try operations respectively, and registers all slave business services in the activity manager. When all try operations from the business service are successfully invoked or some try operation from the business service fails, proceed to the second phase.

The second phase: the activity manager performs confirm or cancel operations based on the execution results of the first phase. If all try operations in the first phase are successful, the activity manager invokes all confirm operations from the business activity. Otherwise, all cancel operations from the business service are invoked.

Compared with 2PC:

It is located in the business service layer rather than the resource layer.

There is no separate prepare phase, and Try operations have both resource operation and preparation capabilities.

Try operations can flexibly choose the locking granularity of business resources.

The development cost is high.

Disadvantages:

It is difficult to guarantee the idempotency of Canfirm and Cancel.

This approach has many disadvantages and is usually not recommended in complex scenarios, except in very simple scenarios where it is very easy to provide a rollback Cancel and relies on very few services.

This implementation will result in a large amount of code and high coupling. And very limited, because there are many businesses can not be very simple to achieve rollback, if there are many serial services, the cost of rollback is too high.

3. Asynchronously ensure final consistency

Core ideas:

Dan Pritchett, the architect of eBay, mentioned a solution to the consistency problem of eBay distributed systems in a paper "Base:An Acid Alternative" that explains the principles of BASE. Its core idea is to asynchronously execute tasks that require distributed processing through messages or logs, which can be stored in local files, databases or message queues, and then fail to retry through business rules. it requires that the interfaces of each service are idempotent.

Local message table

The basic design idea is to split the remote distributed transaction into a series of local transactions. Without considering performance and elegant design, it can be achieved with the help of tables in a relational database.

Give a classic example of interbank transfer to describe it.

The pseudo code of the first step is as follows: deduct 100, which ensures that the credential message is inserted into the message table through the local transaction:

Begin transaction: update User set account = account-100where userId ='A 'insert into message (msgId, userId, amount, status) values

The second step is to inform the other party that 100 has been added to the bank account. Then the question arises, how to inform the other party?

There are usually two ways:

Using MQ with high timeliness, the other party subscribes to the message and listens, and automatically triggers the event when there is a message.

Regular polling and scanning is used to check the data of the message table.

In fact, the two methods have their own advantages and disadvantages. Relying solely on MQ, notification failure may occur. And too frequent regular polling, the efficiency is not the best (90% is useless). Therefore, we usually use the two methods together.

After the problem of notification has been solved, there are new problems. If the news is repeatedly consumed and more money is added to the user's account, isn't that a serious consequence? In fact, we can also record the consumption status through a "consumption status table". Before performing the "increase" operation, check whether the message (providing identification) has been consumed, and after the consumption is completed, update the consumption status table through local transaction control. In this way, the problem of repeated consumption can be avoided:

Get msgId = '123 checking if mgsId is in message_applied (msgId); if not applied: begin transaction: update User set account = account + 100where userId =' B' insert into message_applied (msgId) values ('123') commit transaction

The way of appeal is a very classic implementation, which basically avoids distributed transactions and achieves "ultimate consistency". However, there are bottlenecks in the throughput and performance of the relational database, and frequent reading and writing messages will put pressure on the database. Therefore, in the really high concurrency scenario, this scheme will also have bottlenecks and limitations.

MQ (non-transactional message)

In general, when using MQ products supported by non-transactional messages, it is difficult to manage business operations and operations on MQ in a local transaction domain. Or take the above-mentioned "inter-bank transfer" as an example, it is difficult to guarantee that the operation of MQ delivery messages will be successful after the deduction is completed. Such consistency seems difficult to guarantee.

Let's analyze the possible situation:

The operation of the database was successful, and the delivery of messages to MQ was also successful. Everyone was happy.

Failed to operate the database. Messages will not be delivered to MQ.

The operation of the database succeeded, but the delivery of the message to MQ failed, an exception was thrown, and the operation just performed to update the database will be rolled back.

From the above analysis of several cases, it seems that the problem is not big. So let's analyze the problems facing the consumer side:

After the message is dequeued, the corresponding business operation of the consumer will be executed successfully. If the business execution fails, the message cannot be invalidated or lost. You need to ensure that the message is consistent with the business operation.

Try to avoid repeated consumption of messages. If repeated consumption, it can not affect the business results.

How to ensure that the message is consistent with the business operation and not lost?

Mainstream MQ products have the ability to persist messages. If the consumer goes down or the consumption fails, you can implement the retry mechanism (some MQ can customize the number of retries).

How to avoid the problems caused by repeated consumption of messages?

Ensure the idempotency of the service interface for consumers to invoke the business.

The consumption status is recorded through the consumption log or similar status table, which is easy to judge (it is recommended to implement it on your own business, rather than relying on MQ products to provide this feature).

This approach is common, and performance and throughput are better than those using relational database message tables. If MQ itself and the business have high availability, it can theoretically satisfy most business scenarios. However, in the absence of adequate testing, direct use in the trading business is not recommended.

MQ (transaction message)

For example, if Bob transfers money to Smith, should we send a message or deduct money first?

It seems that something could go wrong. If you send a message first and the debit operation fails, there will be an extra sum of money in your Smith account. Conversely, if the deduction operation is performed first and then the message is sent, it is possible that the deduction was successful but the message was not sent, and the Smith could not receive the money. Are there any other ideas besides the exception catch and rollback methods described above?

The following takes Alibaba's RocketMQ middleware as an example to analyze its design and implementation ideas.

When RocketMQ sends Prepared messages in the first stage, it gets the address of the message, the second stage executes local transactions, and the third stage accesses the message and modifies the status through the address obtained in the first stage. Careful readers may find a problem again. What if the confirmation message fails? RocketMQ periodically scans the transaction messages in the message cluster, and when it finds the Prepared message, it will confirm to the sender whether the money of Bob has been reduced or not. If it is reduced, do you want to roll back or continue to send confirmation messages? RocketMQ decides whether to roll back or continue to send confirmation messages based on the policy set by the sender. This ensures that message delivery succeeds or fails at the same time as the local transaction. As shown below:

Almost all the well-known e-commerce platforms and Internet companies adopt similar design ideas to achieve "ultimate consistency". This approach is suitable for a wide range of business scenarios and is more reliable. However, it is more difficult to realize this kind of technology. At present, the mainstream open source MQ (ActiveMQ, RabbitMQ, Kafka) do not support transaction messages, so we need secondary development, you can refer to RocketMQ transaction messages (transactional message).

Summary:

Read a lot of articles in this area, on this basis, summarize the solution of distributed transaction consistency. Transaction consistency in distributed systems is a technical problem in itself. at present, there is not a very simple and perfect solution that can deal with all scenarios. One of the difficulties of distributed system is that some problems can only be solved through "confirmation mechanism", "retry mechanism", "compensation mechanism" and so on because of "unreliable network communication". In the case of comprehensive consideration of availability, performance, implementation complexity and other aspects, the better choice is to "asynchronously ensure final consistency", but there are some differences in the specific implementation.

References:

Https://www.cnblogs.com/luxiaoxun/p/8832915.html

Https://www.cnblogs.com/lori/p/9318892.html

Transaction processing of distributed system

Https://coolshell.cn/articles/10910.html

Use message queues and message application state tables to eliminate distributed transactions

Https://my.oschina.net/picasso/blog/35306

How many common distributed transaction solutions are introduced?

Https://www.zhihu.com/question/64921387/answer/225784480

Distributed transactions: just make a choice between consistency, throughput, and complexity

Https://mp.weixin.qq.com/s?__biz=MjM5MDE0Mjc4MA==&mid=2650994325&idx=1&sn=afe66f9cf65ec61aaaf8422a12618fb2

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.