How does micro-service architecture ensure data consistency 07/12 Update SLTechnology News&Howtos

How does micro-service architecture ensure data consistency

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about how to ensure data consistency in microservice architecture. Many people may not know much about it. In order to let you know more, Xiaobian summarized the following contents for you. I hope you can gain something according to this article.

1. Transaction Management for Traditional Applications 1.1 Local Transactions

Traditional standalone applications use an RDBMS as the data source, where the application opens transactions, CRUD, commits or rolls back transactions, all in local transactions, with transaction support provided directly by the resource manager (RM). Consistency of data is guaranteed in a local transaction.

1.2 Distributed Transaction 1.2.1 Two-Phase Commit (2 PCs)

As the application scales, there are situations where an application uses multiple data sources, and local transactions no longer meet the data consistency requirements. Distributed transactions arise because transactions need to be managed across multiple data sources due to simultaneous access to multiple data sources. One of the most popular is two-phase commit (2PC), where distributed transactions are managed centrally by a transaction manager (TM).

The two-phase submission is divided into a preparation phase and a submission phase.

Two-phase commit-rollback

However, two-phase commit does not fully guarantee data consistency and has synchronous blocking problems, so an optimized version of three-phase commit (3PC) was invented.

1.2.2 Three-phase submission (3PC)

three-phase commit

However, 3PC can only guarantee data consistency in most cases.

For detailed descriptions of distributed transactions 2PC and 3PC, see About Distributed Transactions, Two-Phase Commit Protocol, and Third-Phase Commit Protocol.

。Distributed transactions are not the focus of this article, so they are not expanded.

2. Transaction management under microservices

So, is distributed transaction 2PC or 3PC suitable for transaction management under microservices? The answer is no for three reasons:

Since data cannot be accessed directly between microservices, microservices usually call each other through RPC (dubbo) or Http API (SpringCloud), so it is no longer possible to use TM to uniformly manage RM of microservices.

Different microservices may use completely different types of data sources, and if a microservice uses a database that does not support transactions, such as NoSQL, transactions are impossible to talk about.

Even if the data sources used by microservices all support transactions, if many microservice transactions are managed using a large transaction, the large transaction will last several orders of magnitude longer than the local transaction. Such long-term transactions and cross-service transactions will generate many locks and data unavailability, which will seriously affect system performance.

It can be seen that traditional distributed transactions can no longer meet the transaction management requirements under the microservice architecture. Then, since the traditional ACID transaction cannot be satisfied, the transaction management under microservices must follow a new rule-BASE theory.

BASE Theory by Dan, eBay Architect

Pritchett proposed that BASE theory is an extension of CAP theory, and the core idea is that even if strong consistency cannot be achieved, the application should be able to achieve final consistency in an appropriate way. BASE means basically available

Available, Soft State, Eventual Consistency.

Basic availability: refers to the distributed system in the event of failure, allowing the loss of partial availability, that is, to ensure that the core is available.

soft state

: Allow intermediate states of the system that do not affect overall system availability. In distributed storage, there are usually at least three copies of a piece of data, and the delay to allow synchronization between copies of different nodes is the embodiment of soft state.

Final consistency: Final consistency refers to the state in which all copies of data in the system can eventually reach consistency after a certain period of time. Weak consistency is the opposite of strong consistency, and final consistency is a special case of weak consistency.

Final consistency in BASE

It is a fundamental requirement for transaction management under microservices. Transaction management based on microservices cannot achieve strong consistency, but it must ensure the highest consistency. Then, what methods can ensure the ultimate consistency of transaction management under microservices? According to the implementation principle, there are mainly two types, event notification type and compensation type, in which event notification type can be divided into reliable event notification mode and best effort notification mode, and compensation type can be divided into TCC mode and business compensation mode. All four patterns can achieve the ultimate consistency of data under microservices.

3. Ways to achieve data consistency under microservices 3.1 Reliable event notification patterns 3.1.1 Synchronous events

The design concept of reliable event notification pattern is relatively easy to understand, that is, after the completion of the master service, the result is transmitted to the slave service through an event (usually a message queue), and the slave service consumes the message after receiving it to complete the business, so as to achieve message consistency between the master service and the slave service. The first and simplest thing that can be thought of is synchronous event notification. Service processing and message sending are executed synchronously. See the code and sequence diagram below for the implementation logic.

public void trans() {

try {

// 1. operational database

bool result = dao.update(data);//Operation database fails, an exception will be thrown

// 2. Send message if database operation is successful

if(result){

mq.send(data);//throw exception if method fails

}

} catch (Exception e) {

rollback ();//if an exception occurs, rollback

}

The above logic looks seamless. If the database operation fails, the system exits directly without sending a message; if the message fails to be sent, the database rolls back; if the database operation succeeds and the message is sent successfully, the service succeeds, and the message is sent to downstream consumers. After careful consideration, there are actually two shortcomings in synchronous message notification.

Under the microservice architecture, network IO problems or server downtime problems may occur. If these problems occur in step 7 of the sequence diagram, which makes it impossible to notify the master service normally after message delivery (network problems) or unable to continue submitting transactions (downtime), the master service will consider message delivery failure and roll back the master service business. However, in fact, the message has been consumed by the slave service, which will cause data inconsistency between the master service and the slave service. See the following two timing charts for specific scenarios.

image

The event service (in this case, the messaging service) is too coupled to the business, and if the messaging service is unavailable, the business becomes unavailable. Event services should be decoupled from the business and executed asynchronously independently, or try to send a message after the business is executed. If the message fails to be sent, it will be demoted to asynchronous sending.

3.1.2 Asynchronous Events 3.1.2.1 Local Event Services

To solve the problem of synchronous events described in 3.1.1, asynchronous event notification patterns were developed, whereby business services and event services are decoupled, events occur asynchronously, and reliable event delivery is guaranteed by a separate event service.

Asynchronous Event Notification-External Event Service

The business service sends events to the event service prior to submission. The event service only logs events and does not send them. Business services notify event services after commit or rollback, and event services send events or delete events. Don't worry that the business system will fail to send acknowledgement events to the event service after submitting or rolling, because the event service will regularly obtain all the events that have not been sent and query the business system to decide whether to send or delete the event according to the return of the business system.

Although external events can decouple the business system from the event system, they also bring extra workload: external event services have two more network communication overhead (before submission and after submission/rollback) than local event services, and the business system needs to provide a separate query interface for the event system to determine the status of unsent events.

3.1.2.3 Considerations for Reliable Event Notification Mode

There are two things to note about reliable event patterns: 1. Correct delivery of events;2. Repeated consumption of events.

Asynchronous messaging services can ensure that events are sent correctly, but events may be sent repeatedly, so consumers need to ensure that the same event will not be consumed repeatedly, in short, to ensure the idempotent nature of event consumption

。

If the event itself is an idempotent status event, such as notification of order status (ordered, paid, shipped, etc.), you need to determine the order of events. Generally judged by timestamp, after consuming new messages, when receiving old messages, they are discarded without consumption. If a global timestamp is not available, consider using a globally uniform serial number.

For events that do not have idempotent, they are generally action behavior events. For example, if the deduction is 100 and the deposit is 200, the event id and event result should be persisted. The event id should be queried before the consumption event. If the event has been consumed, the execution result should be returned directly. If the event is a new message, the execution should be performed and the execution result should be stored.

3.2 Best effort notification mode

The best effort notification pattern is much easier to understand than the reliable event notification pattern. The feature of the best-effort notification type is that after the business service submits the transaction, it sends a message for a limited number of times (setting a maximum number limit), such as sending three messages. If all three messages fail to be sent, the message will not be sent any more. This can lead to loss of information. At the same time, the master service needs to provide a query interface to the slave service to recover lost messages. The best effort notification type has poor timeliness guarantee (i.e., soft state may occur for a long time), so it cannot be used for systems with high timeliness requirements for data consistency. This pattern typically uses notifications on different business platforms or for third-party business services, such as bank notifications, merchant notifications, etc., which are not expanded here.

3.3 Business Compensation Model

The biggest difference between the compensation mode and the event notification mode is that the upstream service of the compensation mode depends on the operation result of the downstream service, while the upstream service of the event notification mode does not depend on the operation result of the downstream service. The business compensation mode is a pure compensation mode. Its design concept is that the business is normally submitted when invoked, and when a service fails, all its dependent upstream services perform business compensation operations. For example, Xiaoming started from Hangzhou and went to New York on business. Now he needs to book train tickets from Hangzhou to Shanghai and plane tickets from Shanghai to New York. If Xiaoming successfully purchased the train ticket and found that the plane ticket for that day had been sold out, then instead of staying in Shanghai for another day, Xiaoming might as well cancel the train ticket to Shanghai and choose to fly to Beijing and then transfer to New York, so Xiaoming cancelled the train ticket to Shanghai. In this example, the train ticket from Hangzhou to Shanghai is purchased for service a, and the plane ticket from Shanghai to New York is purchased for service b. The business compensation mode is to compensate service a when service b fails. In this example, the train ticket from Hangzhou to Shanghai is cancelled.

The compensation model requires that every service provide compensation excuses, and this compensation is generally incomplete compensation

Even if the compensation operation is performed, the cancelled train ticket record still exists in the database and can be traced (usually with the believed status field "cancelled" as a mark). After all, the online data that has been submitted generally cannot be physically deleted.

The biggest disadvantage of service compensation mode is that the soft state time is relatively long, that is, the timeliness of data consistency is very low, and multiple services may often be in the case of data inconsistency.

3.4 TCC/Try Confirm Cancel mode

TCC mode is an optimized business compensation mode, which can achieve full compensation

After compensation, there is no record of compensation, as if nothing had happened. At the same time, the soft state time of TCC is very short, because TCC is a two-phase mode (recall 1.2.1 for those who have forgotten the concept of two-phase), and the second-phase confirmation operation is performed only when the first phase (try) of all services is successful, otherwise the compensation (Cancel) operation is performed, and no real business processing is performed in the try phase.

TCC mode

The specific process of TCC mode is divided into two stages:

Business services complete all business checks and reserve necessary business resources

If Try succeeds in all services, then the Confirm operation is executed. The Confirm operation does not do any business check (because it has already been done in the try), but only uses the business resources reserved in the Try phase for business processing; otherwise, Cancel operation is performed, and Cancel operation releases the business resources reserved in the Try phase.

This may be vague. Let me give you a concrete example. Xiaoming transferred RMB 100 online from China Merchants Bank to Guangfa Bank. This operation can be regarded as two services: service a transfers 100 yuan from Xiaoming's account of China Merchants Bank, and service b transfers 100 yuan from Xiaoming's account of Guangfa Bank.

Service a (Xiaoming transferred RMB 100 from China Merchants Bank):

try: update cmb_account set balance=balance-100, freeze=freeze+100 where

acc_id=1 and balance>100;

confirm: update cmb_account set freeze=freeze-100 where acc_id=1;

cancel: update cmb_account set balance=balance+100, freeze=freeze-100 where

acc_id=1;

Service b (Xiaoming remits RMB 100 to Guangfa Bank):

try: update cgb_account set freeze=freeze+100 where acc_id=1;

confirm: update cgb_account set balance=balance+100, freeze=freeze-100 where

acc_id=1;

cancel: update cgb_account set freeze=freeze-100 where acc_id=1;

Details:

In the try phase of a, the service does two things: 1: business check, here is to check whether the money in Xiaoming's account is more than 100 yuan;2: reserve resources and transfer 100 yuan from the balance to frozen funds.

A confirm phase, where no business checks are performed because the try phase has already been done, and frozen funds will be deducted because the transfer has been successful.

A cancellation phase, release reserved resources, both 100 yuan frozen funds, and restore to the balance.

b. The try phase is carried out, resources are reserved, and 100 yuan is frozen.

b. In the confirmation stage, the resources reserved in the try stage are used to transfer 100 yuan of frozen funds to the balance.

b cancel stage, release the reserved resources of try stage, subtract 100 yuan from frozen funds.

As you can see from the simple example above, the TCC pattern is more complex than the pure business compensation pattern, so each service needs to implement two interfaces, Cofirm and Cancel, in terms of implementation.

3.5 summary

The following table compares these four commonly used patterns:

After reading the above, do you have any further understanding of how microservices architecture ensures data consistency? If you still want to know more knowledge or related content, please pay attention to the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.