What is the distributed transaction solution based on message queue? 04/27 Update SLTechnology News&Howtos

What is the distributed transaction solution based on message queue?

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about what the distributed transaction solution based on message queue is. The article is rich in content and analyzes and describes it from a professional point of view. I hope you can get something after reading this article.

When we were still "babbling", the teacher would often explain the affairs to us through the chestnut of money transfer, but unlike here, the teacher was talking about local affairs, while here he was dealing with distributed transactions! Let's start with a brief review of local affairs!

Local transaction

is probably familiar with local transactions, because this database engine level can support it! So also known as database transaction, database transaction four major characteristics: atomicity (A), consistency (C), isolation (I) and persistence (D), and in these four features, I think consistency is the most basic feature, the other three features exist to ensure consistency!

goes back to the classic chestnut given to us by our teacher when we were students. Account A transfers 100 yuan to account B (An and B are in the same bank). If the account of An is deducted, but the account of B does not arrive, there is a data inconsistency! In order to ensure the consistency of the data, the transaction mechanism of the database will make the two operations of An account debit and B succeed at the same time. If one operation fails, multiple operations will be rolled back at the same time. This is the atomicity of the transaction. In order to ensure the atomicity of the transaction operation, we must implement the log-based REDO/UNDO mechanism! However, atomicity is not enough, because our system is running in a multi-threaded environment, if multiple transactions are parallel, even if the atomicity of each transaction is guaranteed, there will still be data inconsistencies. For example, An account originally had a balance of 200 yuan, An account transferred 100 yuan to B account, first read the balance of An account, and then subtracted 100 yuan from this value, but between these two operations, An account transferred 100 yuan to C account, so the final result should be A minus 200 yuan. But in fact, after An account finally completed the transfer to B account, An account only lost 100 yuan, because the 100 yuan lost from An account to C account was covered! Therefore, in order to ensure the consistency in the case of concurrency, the isolation, that is, the state after the concurrent execution of multiple transactions, is equivalent to the state after their serial execution! Isolation also has a variety of isolation levels, in order to achieve isolation (ultimately to ensure consistency) the database also introduces pessimistic locks, optimistic locks, and so on. The theme of this article is distributed transactions, so local transactions are just a brief review, and one thing to keep in mind is that transactions are about ensuring data consistency!

Distributed theory

still remembers that when I first graduated, I went to an Internet company with a lot of blood. The first task my leader gave me was to add a function to modify the data on the list. Does this stumble me? I'll get it for you in a minute! Just add a "modify" button to the list, click the button pop-up box to modify it and save it. However, everything is not as smooth as I expected, after clicking Save and refresh the list, the data on the page is still displayed before the modification, as if the modification was not successful! Refresh the list after a while, and the data will display normally! This is the case after many tests! I began to panic when I hadn't seen any big scene. Did I write something wrong? In the end, I had to turn to the experienced elders in the group! He took a deep breath and told me, "after all, he is a young man who has just graduated." Let me tell you why! Our database is separated from read and write, and some of the read and write libraries are in different network partitions. Your data is updated to the write library, and when you read the data, it is read from the read library. There is a certain delay in synchronizing the data updated to the write library to the read library, that is to say, there will be a temporary data inconsistency between the read library and the write library. Wouldn't that be a bad experience? Why can't the written data be read immediately? Then how can I achieve this function? " In the face of a lot of questions from me, my colleague said impatiently, "have you ever heard of CAP theory? go and understand it yourself first." It was I who began to consult all kinds of materials to understand the secret behind this strange word!

CAP theory was put forward by Professor Eric Brewer of the University of California. This theory tells us that a distributed system can not meet the three basic requirements of consistency (Consistency), availability (Availability) and partition fault tolerance (Partition tolerance), but can only meet two of them at most.

consistency: consistency here refers to the strong consistency of data, also known as linear consistency. It refers to the characteristic of whether data can be consistent among multiple copies in a distributed environment. In other words, a read operation is performed immediately after a write operation to a certain data, and the value just written must be read. (any read operation that begins after a write operation completes must return that value, or the result of a later write operation) availability: any request received by a fault-free node must be able to respond to the result in a limited time. (every request received by a non-failing node in the system must result in a response) partition fault tolerance: if the machines in the cluster are divided into two parts and the two parts cannot communicate with each other, whether the system can continue to work properly. (the network will be allowed to lose arbitrarily many messages sent from one node to another)

in distributed systems, partition fault tolerance is basically guaranteed. In other words, there can only be a trade-off between consistency and availability. Why can't consistency and availability be established at the same time? Going back to the previous example of modifying the list, because the data will be distributed in different network partitions, there will inevitably be the problem of data synchronization, while synchronization will have network delays, exceptions and other problems, so there will be data inconsistency! If you want to ensure the consistency of the data, you must lock the operations of other read libraries when you operate on the write library. Only after the write is successful and the data synchronization is complete can the read and write be released again, so that the system loses its availability during the lock-up period. For more details on CAP theory, you can refer to this article (https://mwhittaker.github.io/blog/an_illustrated_proof_of_the_cap_theorem/), which is relatively easy to understand!

Distributed transaction

distributed transaction is in the distributed scenario, need to meet the needs of the transaction! Last article we talked about message middleware, so this article we are going to talk about distributed transactions, a combination of the two, there is a distributed transaction solution based on message middleware! Both local and distributed transactions are designed to solve the problem of data consistency! The word consistency has been mentioned many times before! Different from local transactions, distributed transactions need to ensure the consistency of data in different database tables in a distributed environment. There are many solutions for distributed transactions, such as XA protocol, TCC three-phase commit, message queue-based, and so on. This article will only cover message queue-based solutions!

local transactions talk about consistency, and distributed transactions inevitably face the problem of consistency! Going back to the initial example of inter-bank transfer, if a bank A user transfers money to a bank B user, the normal process should be:

1. Bank A shall check and verify the transferred account and deduct the amount.

2. Bank A calls the B bank transfer interface synchronously.

3. Bank B inspects and verifies the transferred account and increases the amount.

4. Bank B returns the processing result to Bank A.

Under normal circumstances, does not require high consistency, and such a design can meet the requirements. But a system like a bank would probably have gone bankrupt if it had been implemented in this way. Let's first look at the main problems with such a design:

1. Call the remote interface synchronously. If the interface is time-consuming, it will cause the main thread to block for a long time.

2. the flow can not be well controlled, and the flow peak of the A banking system may overwhelm the B banking system (of course, Bank B will certainly have its own current restriction mechanism).

3. If "step 1" has just been executed and the system goes down for some reason, bank A will deduct money from the account, but Bank B does not receive the call to the interface, which leads to the inconsistency between the data of the two systems.

4. If Bank B is unable to respond correctly to the request due to some reason after the execution of "step 3" (in fact, the transfer operation has already been performed in Bank B system and is stored in storage), Bank A will wait for the interface response to be abnormal, mistakenly thinking that the transfer failed and rollback the "step 1" operation, which will also lead to data inconsistency between the two systems.

is good at solving problems 1 and 2. Friends who are familiar with message queues should soon be able to think of introducing message middleware for asynchronous and peak-shedding processing, so they redesign a solution as follows:

1. Bank A checks and verifies the account and deducts the amount.

2. Write the request to Bank B asynchronously to the queue, and the main thread returns.

3. Start the daemon to get the pending data from the queue.

4. The background program makes a remote call to the B bank interface.

5. Bank B inspects and verifies the transferred account and increases the amount.

6. Bank B completes the callback A bank interface notification processing result.

through the above diagram, we can see that after the introduction of message queuing, the complexity of the system has increased instantly. Although it makes up for several shortcomings of our first solution, it also brings more problems, such as the availability of the message queuing system itself, the delay of the message queue, and so on! Moreover, such a design still does not solve the core problem we face-data consistency!

1. If "step 1" has just been executed and the system goes down for some reason, the A bank account will be deducted, but the message queue fails to be written, and the B bank interface cannot be called, resulting in data inconsistency.

2. If Bank B fails to transfer money successfully due to verification failure while performing "step 5", the network exception or downtime when calling back the rollback of Bank An interface will cause Bank A transfer to fail to complete the rollback, resulting in data inconsistency.

In the face of the above problems, has to upgrade the system again. In order to solve the problem of "A bank account deducted, but failed to write to the message queue", we need to use a transfer log table, or transfer flow table, which is simply designed as follows:

Field name field description tId transaction flow idaccountNo transfer account card number targetBankNo target bank code targetAccountNo target bank card number amount transaction amount status transaction status (pending, processing success, processing failure) lastUpdateTime last update time

, how do I use this flow meter? When we deduct the money in "step 1", we write an operation pipeline to the pipeline at the same time, the state is "pending", and the two operations must be atomic, that is, we must ensure that the two operations either succeed or fail at the same time through the local transaction! This ensures that as long as the transfer deduction is successful, a transfer stream with a status of "pending" will be recorded. If you fail at this step, the transfer will naturally fail and there will be no follow-up operation. It doesn't matter if the system goes down after this step and the message is not successfully written to the message queue (that is, "step 2"), because our pipelining data has been persisted! At this time, we only need to add a background thread to compensate, regularly read the data with the status of "waiting to be processed" from the transfer flow table and the last updated time is greater than a certain threshold from the current time, and put it back into the message queue for compensation. In this way, it ensures that even if the message is lost, there will be a compensation mechanism! After processing the transfer request, Bank B will call back the interface of Bank A to notify the status of the transfer, thus updating the status field in the flow table of Bank A. This perfectly solves the two shortcomings in the previous scheme. The design of the system is as follows:

so far, we have solved the problem of message loss very well, and ensured that as long as the transfer operation of Bank An is successful, the transfer request will be sent to Bank B. However, this scheme also introduces a problem that the message is put into the message queue for processing through background thread polling, and the same transfer request may be put into the message queue and consumed many times. In this way, Bank B will process the same transfer many times, resulting in inconsistent data! Then how to ensure the idempotency of the B bank transfer interface?

similarly, we can add a transfer log table, or transfer flow table, in the B banking system. Every time Bank B receives a transfer request, a transfer log record is inserted into the transfer log table when operating on the account. Similarly, these two operations must also be atomic! After receiving the transfer request, first look up and determine whether the transfer has been processed according to the unique transfer stream Id in the log table, and process it if it has not been processed, otherwise it will be returned by callback directly! The final architecture diagram is as follows:

so, the core here is that Bank An ensures logging through local transactions and polling by background threads to ensure that messages are not lost. Bank B guarantees logging through local transactions to ensure that messages are not repeatedly consumed! Bank B will notify the processing result when calling back the interface of Bank A. if the transfer fails, Bank A will roll back according to the processing result.

Of course, the best solution for distributed transactions is to avoid distributed transactions as much as possible!

This is what the distributed transaction solution based on message queue shared by the editor is. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.