What are the methods to ensure the data consistency of the server distributed system 07/09 Update SLTechnology News&Howtos

What are the methods to ensure the data consistency of the server distributed system

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what are the methods to ensure the data consistency of the server distributed system". The content of the explanation in this article is simple and clear, and it is easy to learn and understand. let's study and learn "what are the methods to ensure the data consistency of the server distributed system?"

Six schemes to ensure data consistency in distributed system (reprint)

The origin of the problem

In e-commerce and other services, the system is generally composed of multiple independent services, how to solve the data consistency in distributed invocation?

The specific business scenarios are as follows, for example, for a business operation, if services A, B, and C are called at the same time, either succeed or fail at the same time. A, B, C may be remote services developed by different departments and deployed on different servers.

In distributed systems, if we don't want to sacrifice consistency, CAP theory tells us that we have to give up usability, which is obviously unacceptable. In order to discuss the problem, let's briefly introduce the basic theory of data consistency.

Strong consistency

When the update operation is complete, the access of any subsequent processes or threads returns the latest updated value. This is the most user-friendly, that is, what the user wrote last time, what will be guaranteed to read next time. According to CAP theory, this implementation requires the sacrifice of usability.

Weak consistency

The system does not guarantee that access to continued processes or threads will return the latest updated values. After the data is successfully written, the system does not promise to read the latest written value immediately, nor does it promise how long it will be read.

Final consistency

A specific form of weak consistency. The system ensures that without subsequent updates, the system will eventually return the value of the last update operation. Under the premise of no failure, the time of inconsistent windows is mainly affected by communication delay, system load and the number of replicas. DNS is a typical final consistency system.

In engineering practice, in order to ensure the availability of the system, Internet systems mostly convert strong consistency requirements into final consistency requirements, and ensure the final consistency of data through the implementation of idempotent guarantee. However, in e-commerce and other scenarios, there are some differences between the solutions to data consistency and common Internet systems (such as MySQL master-slave synchronization). The group discussion is divided into the following six solutions.

1. Evading distributed transactions-- Business Integration

The business integration solution mainly adopts the method of integrating the interface into local execution. Taking the problem scenario as an example, service A, B, and C can be integrated into a service D to the business, which is then transformed into a local transaction. For example, service D contains local services and service E, and service E is the integration of local services A ~ C.

Pros: solves (circumvents) distributed transactions.

Disadvantages: it is obvious that the originally planned split of the business, and coupled together, business responsibilities are not clear, not conducive to maintenance.

Because of the obvious shortcomings of this method, it is usually not recommended.

two。 Classic Scheme-eBay Mode

The core of this scheme is to asynchronously execute tasks that require distributed processing through message logs. Message logs can be stored in local text, database, or message queues, and retried automatically or manually through business rules. Manual retry is more applied to the payment scenario, through the reconciliation system to deal with post-event problems.

The core of the message logging scheme is to ensure the idempotency of the service interface.

Considering the failure of network communication, data packet loss and other reasons, if the interface can not guarantee idempotency, the uniqueness of data will be difficult to guarantee.

The main ideas of eBay are as follows.

Base: an alternative to Acid

This scenario, published to ACM by Dan Pritchett, an architect of eBay, in 2008, is a classic article that explains BASE principles, or ultimate consistency. This paper discusses the basic differences between BASE and ACID principles in ensuring data consistency.

If ACID provides consistent choices for partitioned databases, how do you achieve availability? The answer is

BASE (basically available, soft state, eventually consistent)

The availability of BASE is achieved by supporting local failures rather than global system failures. Here's a simple example: if users are partitioned on five database servers, the BASE design encourages a similar approach, where a user database failure affects only the 20% of users on this particular host. There is no magic involved here, but it does lead to higher perceived system availability.

One of the most common scenarios described in this article is that if a transaction is generated, you need to add a record to the transaction table and modify the amount of the user table. These two tables belong to different remote services, so it involves the issue of distributed transaction consistency.

In this paper, a classical solution is proposed, which completes the main modification operation and the message of updating the user table in a local transaction. At the same time, in order to avoid the problems caused by repeated consumption of user table messages and achieve the idempotency of multiple retries, an update record table updates_applied is added to record the messages that have been processed.

The execution pseudo code of the system is as follows

Based on the above method, in the first stage, through the transaction guarantee of the local database, the transaction table and message queue are added.

In the second stage, the message queue is read out (but not deleted), and the relevant records are detected by updating the record table updates_applied. The records that are not executed will modify the user table, then add an operation record to updates_applied, and then delete the queue after the transaction is executed successfully.

Through the above methods, the final consistency of the distributed system is achieved. For more information about eBay's solution, please refer to the link at the end of the article.

3. Distributed transaction Scheme of Qunar Network

With the continuous expansion of the scale of business, e-commerce websites are generally faced with the road of split. Is to split the original single application into multiple subsystems with different responsibilities. For example, in the past, the functions for users, customers and operations may be put into one system, but now it is divided into order center, agent management, operation system, quotation center, inventory management and other subsystems.

What is the first thing you have to face in splitting?

The original monomer applications have all the functions together, and the storage is also together. For example, if the operator wants to cancel an order, then directly update the status of the order table, and then update the inventory table on ok. Because it is a single application, the library together, these can be in a transaction, by the relational database to ensure consistency.

But it is different after the split, and different subsystems have their own storage. For example, the order center only manages its own order library, while inventory management also has its own library. Then when the operation system cancels the order, it invokes the service of the order center and inventory management through interface calls, instead of directly operating the library. This involves a "distributed transaction" problem.

There are two ways to solve distributed transactions

1. Asynchronous messages are preferred.

As mentioned above, using the asynchronous message Consumer side requires idempotency.

There are two ways of idempotency, one is that business logic guarantees idempotence. For example, if you receive a message that payment is successful, the order status becomes payment completion. If the current status is payment completion, then receiving another payment success message means that the message is repeated and is processed successfully as a message.

On the other hand, if the business logic cannot guarantee idempotence, add a de-duplicating table or similar implementation. For the producer side to put a message library on the same instance of the business database, sending messages and business operations are in the same local transaction. When sending a message, the message is not sent immediately, but a message record is inserted into the message library, and then the message is sent asynchronously when the transaction is committed. If the message is sent successfully, the message in the message library will be deleted. If you encounter a message queuing service exception or network problem, the message will stay here, and another service will continue to scan these messages out and resend them.

two。 Some businesses are not suitable for asynchronous messages, and all participants in the transaction need to get the results synchronously. The implementation of this situation is similar to the above, with a transaction record library on top of the same instance of each participant's local business library.

For example, A calls BMageC synchronously. A updates the local transaction record status when the local transaction is successful, as does B and C. If there is a failure of A to call B, the failure may be that B really failed, or it may be that the call timed out and the actual B succeeded. A central service compares the transaction records of the three parties and makes a final decision. Suppose the transaction records of the three parties are A success, B failure, and C success. Then there are two ways to make a final decision, depending on the specific scenario:

Retry B until B succeeds, and information such as call parameters is recorded in the transaction record table.

Perform compensation operations for An and B (one feasible way of compensation is to roll back).

Make a special description for scenario b: for example, B is the withholding inventory service, which fails for some reason when it is called for the first time, but when it is retried, the inventory has become 0 and cannot be retried successfully. At this time, only An and C can be rolled back.

Then some people may think that putting a message library or transaction record database in the same instance of the business database will invade the business, and the business should also care about this library, is it a reasonable design?

In fact, we can rely on the means of operation and maintenance to simplify the intrusion of development. Our method is to let DBA preinitialize the library on all MySQL instances of the company, and operate the library transparently behind the back through the framework layer (message client or transaction RPC framework). Business developers only need to care about their own business logic and do not need to access the library directly.

To sum up, the fundamental principles of the two approaches are similar, that is, to convert a distributed transaction into multiple local transactions, and then rely on ways such as retry to achieve final consistency.

4. Mogujie's distributed consistency Scheme in the process of transaction creation

General process for deal creation

We abstract the transaction creation process into a series of scalable function points, each of which can have multiple implementations (there is a combination / mutual exclusion relationship between specific implementations). The process of transaction creation is completed by stringing each function point together according to a certain process.

The problems faced by

The implementation of each function point may rely on external services. So how do you ensure that the data is consistent across services? For example, if the call to the locked coupon service timed out, what should I do if I am not sure whether the coupon has been locked successfully? For example, if the coupon lock is successful, but the deduction of inventory fails, how to deal with it?

Scheme selection

Excessive dependence on services will lead to increased management complexity and stability risks. Just imagine if we rely on 10 services, 9 of them are successful, and the last one fails, will the first 9 be rolled back? The cost is still very high.

Therefore, under the premise of splitting a large process into multiple small local transactions, for non-real-time, non-strong consistency related business writing, after the successful execution of the local transaction, we choose the scheme of sending message notification and asynchronized execution of related transactions.

Message notification often does not guarantee 100% success, and it is unknown whether the receiver's business will be able to execute successfully after message notification. The former problem can be solved by retry; the latter can be guaranteed by using transaction messages.

However, the transaction message framework itself will bring intrusiveness and complexity to the business code, so we choose to decouple the system based on DB event change notification to MQ. Through the ACK mechanism when subscribers consume MQ messages, we ensure that the messages must be consumed successfully and achieve the final consistency. Because the message may be resent, the business logic processing of the message subscriber should be idempotent.

So at present, there are only business scenarios that require real-time synchronization and strong consistency requirements. During the process of transaction creation, coupon locking and inventory deduction are two typical scenarios.

To ensure data consistency among multiple systems, at first glance, a distributed transaction framework must be introduced to solve the problem. However, the introduction of a very heavy similar two-phase commit distributed transaction framework will bring a sharp increase in complexity; in the field of e-commerce, absolute strong consistency is too idealized, we can choose quasi-real-time final consistency.

In the transaction creation process, we first create an invisible order, and then send a waste order message to MQ in response to the call exception (failure or timeout) when locking the coupon and deducting inventory synchronously. If the message fails, the local will do a time-ladder asynchronous retry; after receiving the message, the coupon system and inventory system will determine whether a business rollback is needed, thus ensuring the final consistency of multiple local transactions in quasi-real-time.

5. Distributed Service DTS Scheme of Alipay and Ant Financial Cloud

There is also a xts scheme of Alipay commonly used in the industry, which is improved by Alipay on the basis of 2PC. The main ideas are as follows, most of the information is quoted from the official website.

Brief introduction of distributed transaction Service

Distributed transaction service (Distributed Transaction Service, DTS) is a distributed transaction framework, which is used to ensure the final consistency of transactions in a large-scale distributed environment. From the architecture, DTS is divided into two parts: xts-client and xts-server. The former is a JAR package embedded in client applications, which is mainly responsible for writing and processing transaction data, while the latter is an independent system, which is mainly responsible for the recovery of abnormal transactions.

Core characteristics

The transaction model of traditional relational database must follow the ACID principle. In the single database mode, the ACID model can effectively protect the integrity of data, but in a large-scale distributed environment, a business often spans multiple databases. How to ensure the data consistency among these multiple databases requires other effective strategies. In the JavaEE specification, 2PC (2 Phase Commit, two-phase commit) is used to deal with transactions across DB environments, but 2PC is an anti-scalable pattern, that is, participants need to hold resources until the end of the entire distributed transaction. In this way, when the business scale reaches tens of millions of levels, the limitations of 2PC will become more and more obvious, and the system scalability will become very poor. Based on this, we adopt the idea of BASE to implement a set of distributed transaction scheme similar to 2PC, which is DTS. DTS not only fully ensures high availability and high reliability in distributed environment, but also takes into account the requirements of data consistency. Its most important feature is to ensure data consistency (Eventually consistent).

Simply put, the DTS framework has the following features:

Final consistency: there will be temporary inconsistencies in the transaction process, but by restoring the system, the transaction data can achieve the ultimate consistent goal.

The protocol is simple: DTS defines a standard two-phase interface similar to 2PC, and the business system only needs to implement the corresponding interface to use the transaction function of DTS.

Independent of RPC service protocol: in SOA architecture, one or more DB operations are often packaged as a Service,Service and Service to communicate through the RPC protocol. The DTS framework is built on the SOA architecture and is independent of the underlying protocols.

Independent of the underlying transaction implementation: DTS is an abstract concept based on the Service layer and has nothing to do with the underlying transaction implementation, that is to say, in the scope of DTS, whether it is a relational database MySQL,Oracle, or KV storage MemCache, or column storage database HBase, as long as the operation is packaged as an DTS participant, it can be accessed into the DTS transaction scope.

The following is the flow chart of the distributed transaction framework

Realize

A complete business activity consists of a master business service and several slave business services.

The main business service is responsible for initiating and completing the entire business activity.

Provide TCC-type business operations from business services.

The business activity manager controls the consistency of business activities, registers operations in business activities, confirms confirm operations for all two-phase transactions when the activity commits, and invokes cancel operations for all two-phase transactions when the business activity is cancelled. "

Compared with 2PC protocol

No separate Prepare phase to reduce the cost of the agreement

The system has high fault tolerance and simple recovery.

6. Data consistency Scheme of Rural Information Network

1. E-commerce business

The company's payment department provides payment services to business units by accessing other third-party payment systems. Payment service is a Dubbo-based RPC service.

For the business department, the order payment of the e-commerce department needs to be called

The payment interface of the payment platform to process the order

At the same time, we need to call the interface of the integration center and add points to the user according to the business rules.

From the business rules, it is necessary to ensure the real-time and consistency of business data at the same time, that is, points must be added to the success of payment.

Our approach is synchronous invocation, which first deals with the local transaction business. Considering that the integral business is relatively single and the impact of the business is lower than that of payment, the integration platform provides the interface for adding and withdrawing.

The specific process is to first call the points platform to increase user points, and then call the payment platform for payment processing. If the processing fails, the catch method calls the withdrawal method of the points platform to withdraw the points orders processed this time.

(click on the picture to zoom in full screen)

two。 User information change

The user information of the company is uniformly maintained by the user center, and the changes of user information need to be synchronized to each business subsystem, and the business subsystem deals with their respective business according to the changed content. User Center, as the producer of MQ, adds notifications to MQ. APP Server subscribes to the message, synchronizes local data information, and then handles related businesses such as APP withdrawal and offline.

We use asynchronous message notification mechanism. At present, we mainly use ActiveMQ, a subscription method based on Virtual Topic, to ensure a single consumption of subscriptions to a single business cluster.

Thank you for your reading. The above is the content of "what are the methods to ensure the data consistency of the server distributed system?" after the study of this article, I believe you have a deeper understanding of the methods to ensure the data consistency of the server distributed system, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.