How to use Seata Saga to design more flexible financial applications 07/01 Update SLTechnology News&Howtos

How to use Seata Saga to design more flexible financial applications

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

How to use Seata Saga to design more flexible financial applications, many novices are not very clear about this. In order to help you solve this problem, the following editor will explain it in detail. People with this need can come and learn. I hope you can get something.

Seata means: Simple Extensible Autonomous Transaction Architecture, is an one-stop distributed transaction solution that provides AT, TCC, Saga, and XA transaction modes. Let's take a look at Saga patterns in Seata.

The pain Point of Financial distributed Application Development

An obvious problem with distributed systems is that a business process needs to combine a set of services. This kind of thing is even more obvious in micro-services, because it requires the guarantee of business consistency. That is, if one step fails, either roll back to the previous service call or keep retrying to ensure that all steps are successful.

In the financial sector, the business process under the micro-service architecture tends to be more complex, and the process is very long. For example, it is normal for an Internet micro-loan business process to transfer more than a dozen services, and the exception handling process is even more complicated. Students who have worked in financial business development will feel very physical.

So we are faced with some pain points in the process of financial distributed application development:

Business consistency is difficult to guarantee

Most of the business we come into contact with (for example, systems in the channel layer, product layer, and integration layer), in order to ensure the ultimate consistency of the business, we often use the "compensation" method to do it. Without a coordinator to support it, the development is relatively difficult. Every step has to deal with all the previous "rollback" operations in catch, which will form an "arrow-shaped" code with poor readability and maintainability. Or retry the abnormal operation. If the retry is not successful, you may have to retry asynchronously or even manually. All these bring great burden to developers, which are inefficient and error-prone.

Business status is difficult to manage

There are many business entities and many states of entities. Often, the status of the entity is updated to the database after completing a business activity. There is no state machine to manage the transition process of the whole state, which is not intuitive and error-prone, causing the business to enter an incorrect state.

It is difficult to guarantee idempotency.

The idempotency of the service is the basic requirement in the distributed environment. In order to ensure the idempotency of the service, it is often necessary for the service developers to design it one by one. If it is realized by using the unique key of the database or by the distributed cache, there is no unified scheme. Developers have a heavy burden and are easy to omit, resulting in capital losses.

It is difficult for business monitoring, operation and maintenance, and lacks unified error protection ability.

Business execution monitoring is generally done by printing logs and then checking them based on the log monitoring platform. In most cases, there is no problem, but if the business goes wrong, these monitoring lack the business context at that time and are not friendly to troubleshooting. You often need to check the database again. At the same time, the printing of the log also depends on the development, which is easy to miss. For compensation affairs, there is often a need for "error guard trigger compensation" and "worker trigger compensation" operations, and there is no unified error guard and handling specification, which requires developers to develop one by one and bear a heavy burden.

Theoretical basis

In some scenarios, when we have a strong need for data consistency, we will adopt a distributed transaction scheme that requires the use of "two-phase commit" at the business tier. In other scenarios, we don't need such a strong consistency, so we just need to ensure the final consistency.

For example, Ant Financial Services Group currently uses the TCC model in the financial core system, which is characterized by high consistency requirements (business isolation), short processes and high concurrency.

In many businesses above the financial core (such as systems in the channel layer, product layer and integration layer), these systems are characterized by ultimate consistency, multiple processes, long processes, and may have to invoke the services of other companies (such as financial networks). This is if each service is expensive to develop three methods: Try, Confirm and Cancel. If there are services from other companies in the transaction, it is impossible to require the services of other companies to follow the development model of TCC. At the same time, long processes and too long transaction boundaries will affect performance.

We all know ACID for transactions, and we are also very familiar with CAP theory, which can only satisfy two of them at most, so, in order to improve performance, there is a variant of ACID, BASE. ACID emphasizes consistency (C in CAP), while BASE emphasizes usability (An in CAP). We know that, in many cases, we cannot achieve a strongly consistent ACID. Especially when we need to span multiple systems, and these systems are not provided by one company. BASE systems tend to design more resilient systems. In a short period of time, even if there is a risk of data asynchrony, we should allow new transactions to occur, and later we will deal with the transactions that may have problems in the business by way of compensation to ensure final consistency.

So we will make a choice in the actual development, for more business systems above the financial core can use compensation transactions, compensation transaction processing put forward the Saga theory 30 years ago, with the development of micro-services, it has been gradually concerned by everyone in recent years. At present, it is also recognized by the industry that Saga is a solution for long transactions.

Https://github.com/aphyr/dist-sagas/blob/master/sagas.pdf

Http://microservices.io/patterns/data/saga.html

Community and industry solutions Apache Camel Saga

Camel is an open source product that implements the EIP (Enterprise Integration Patterns) enterprise integration model. It is based on an event-driven architecture and has good performance and throughput. It added Saga EIP in version 2.21.

Saga EIP provides a way to define a series of Action about association through camel route, which are either executed successfully or rolled back. Saga can coordinate distributed services or local services of any communication protocol and achieve global final consistency. Saga does not require the whole processing to be completed in a short time, because it does not occupy any database locks, and it can support requests that need to be processed for a long time. From a few seconds to a few days, Camel's Saga EIP is a Microprofile-based LRA (Long Running Action), as well as a distributed service that supports the coordination of any communication protocol and any language implementation.

The implementation of Saga does not lock the data, but defines its "compensation operation" for the operation, triggering the "compensation operation" of the operation that has already been performed when the normal process goes wrong, rolling back the process. "compensation operation" can be defined as Java or XML DSL (Definition Specific Language) on the Camel route.

Here is an example of Java DSL:

Cdn.nlark.com/yuque/0/2019/png/226702/1572853428625-e17b7e50-9353-40ee-a1e5-c276618a9214.png ">

XML DSL example:

Eventuate Tram Saga

The Eventuate Tram Saga framework is a Saga framework for Java microservices using JDBC / JPA. Like Camel Saga, it uses Java DSL to define compensation operations:

Apache ServiceComb Saga

ServiceComb Saga is also a data consistency solution for micro-service applications. Compared to TCC, in the try phase, Saga commits the transaction directly, and the subsequent rollback phase is completed through a reverse compensation operation. Different from the previous two, it uses Java annotations + interceptors to define "compensation" services.

Architecture

Saga is composed of alpha and * * omega * *, where:

Alpha acts as a coordinator and is mainly responsible for managing and coordinating transactions

Omega is an agent embedded in micro-service, which is responsible for intercepting network requests and reporting transaction events to alpha.

The following figure shows the relationship between alpha,omega and microservices:

Use the example

Ant Financial Services Group's practice

Ant Financial Services Group is using TCC mode distributed transactions on a large scale, which is mainly used in scenarios such as financial core that require high consistency and high performance. In the higher-level business system, because the process is multi-process and long, and the cost of developing TCC is relatively high, most will balance the use of Saga mode to achieve business consistency. Due to historical reasons, different BU has its own set of "compensation" transaction solutions, basically two:

One is when a service needs to "retry" or "compensate" when it fails, insert a record in the database before executing the service, record the status, query the database record and "retry" or "compensate" when there is an exception, and delete the record when the business process executes successfully.

The other is to design a state machine engine and a simple DSL to orchestrate the business process and record the business state. The state machine engine can define the "compensation service". When there is an exception, the state machine engine invokes the "compensation service" in the reverse direction to roll back. At the same time, there will be an "error guard" platform to monitor the business pipelining that fails to execute or compensate, and constantly "compensate" or "retry".

Scheme comparison

There are generally two solutions for community and industry, one is basic state machine or process engine to orchestrate process and compensation definition through DSL, the other is based on Java annotation + interceptor to realize compensation, so what are the advantages and disadvantages of these two solutions?

Advantages and disadvantages of the mode state machine + DSL1. Can use visual tools to define business processes, standardization, high readability, can achieve the function of service choreography 2. Improve the communication efficiency between business analysts and program developers 3. Business state management: the process is essentially a state machine, which can well reflect the flow of business state 4. Improve the flexibility of exception handling: you can achieve "forward retry" or "backward compensation" after outage recovery. It can naturally be executed using asynchronous processing engines such as the Actor model or SEDA architecture, increasing the overall throughput by 1. 5%. The business process is actually composed of JAVA program and DSL configuration, and the program is separated from the configuration, so it is cumbersome to develop. If it is to transform the existing business, it will be highly intrusive to the business. Engine to achieve high cost interceptor + java note 1. The program and annotations are together, the development is simple, and the learning cost is 2. 5%. Convenient access to existing business 3. Based on the dynamic agent interceptor, the implementation cost of the framework is 1. 5%. The framework can not provide asynchronous processing modes such as Actor model or SEDA architecture to improve system throughput by 2. 5%. The framework cannot provide business state management 3. It is difficult to achieve "retry forward" after downtime recovery because the thread context Seata Saga scheme cannot be restored.

For a brief introduction to Seata Saga, take a look at the Seata Saga official website documentation.

Seata Saga adopts the scheme of state machine + DSL for the following reasons:

The scheme of state machine + DSL is more widely used in practical production.

Can be executed using an asynchronous processing engine such as the Actor model or SEDA architecture to improve overall throughput

Usually, the business system above the core system will be accompanied by the requirement of "service orchestration", and the service choreography has the requirement of transaction final consistency, so it is difficult to separate the two. The state machine + DSL solution can meet these two requirements at the same time.

Because Saga mode does not guarantee isolation in theory, in extreme cases, the rollback operation may not be completed due to dirty writing. For example, in an extreme case, user An is recharged first in a distributed transaction, and then user B is deducted from the balance. If user An is successfully recharged, user A consumes the line before the transaction is committed, and if the transaction is rolled back, there is no way to compensate. Some business scenarios allow the business to succeed in the end, and you can continue to retry to complete the following process if the rollback cannot be rolled back. The solution of state machine + DSL can achieve the ability to "forward" restore the context to continue execution, so that the business can eventually execute successfully and achieve the goal of final consistency.

In the case of no guarantee of isolation: the business process design should follow the principle of "prefer long money to short money". Long money means that the customer has less money and the organization has more money, and the credit of the organization can give the customer a refund, on the contrary, it is a short money. the less line may not be recovered. Therefore, it must be deducted first in the design of the business process.

State definition language (Seata State Language)

Define the process of service invocation through the state diagram and generate the json state language definition file

A node in the state diagram can invoke a service, and the node can configure its compensation node.

The state diagram json is driven by the state machine engine. When an exception occurs, the state engine reverses the compensation node corresponding to the successful node to roll back the transaction

Note: whether or not to compensate when an exception occurs can also be decided by the user.

It can realize service orchestration requirements and support functions such as single item selection, concurrency, asynchronism, sub-state machine, parameter transformation, parameter mapping, service execution status judgment, exception capture and so on.

Suppose there is a business process to transfer two services, first inventory deduction (InventoryService), and then balance deduction (BalanceService) to ensure either simultaneous success or rollback in a distributed system. Both participant services have a reduce method for inventory deduction or balance deduction, and a compensateReduce method for compensation deduction operations. Take InventoryService as an example to take a look at its interface definition:

The state diagram corresponding to this business process:

Corresponding JSON:

The stateful language refers to AWS Step Functions to some extent.

A brief introduction to the attribute of "State Machine"

Name: represents the name of the state machine, which must be unique

Comment: description of the state machine

Version: state machine definition version

StartState: the first "status" to run at startup

States: state list, which is a map structure. Key is the name of "state" and must be unique within the state machine.

Brief introduction of status attribute

Type: the type of "status", such as:

ServiceTask: perform the task of invoking the service

Choice: single conditional routing

CompensationTrigger: triggering compensation process

Succeed: the state machine ends normally

Fail: the state machine ends abnormally

SubStateMachine: invoking child state machines

ServiceName: service name, usually the beanId of the service

ServiceMethod: service method name

CompensateState: the compensated "state" of the "state"

Input: the input parameter list of the calling service, which is an array corresponding to the parameter list of the service method, $. It means to use an expression to take parameters from the context of the state machine and express the SpringEL used. If it is a constant, you can write the value directly.

Output: assigns the parameters returned by the service to the state machine context, which is a map structure. Key is the key when placed above the state machine (the state machine context is also a map), and $. Is a SpringEL expression that takes a value from the return parameters of the service, and # root represents the entire return parameters of the service

Status: service execution state mapping. The framework defines three states: SU success, FA failure and UN unknown. We need to map the execution state of the service to these three states to help the framework judge the consistency of the whole transaction. It is a map structure. Key is a conditional expression, which is generally judged by the return value of the service or the exception thrown. The default is SpringEL expression to judge the service return parameters. Starting with $Exception {means to judge the exception type. Value maps the service execution state to this value when this conditional expression is true.

Catch: the route after the exception is caught

Next: the "status" of the next execution after the completion of the service execution

Choices: in the "state" of the Choice type, the optional branch list. The Expression in the branch is the SpringEL expression, and Next is the next "state" executed when the expression is established.

ErrorCode: error code for "status" of Fail type

Message: error message for "status" of type Fail

For more detailed explanations of the state language, please see the Seata Saga official website documentation.

Principle of state machine engine

The state diagram in the diagram executes stateA, then stataB, and then stateC.

The execution of "state" is based on the event-driven model. After the stataA execution is completed, the routing message is generated and put into the EventQueue, and the event consumer takes the message from the EventQueue and executes the stateB.

When the entire state machine starts, Seata Server is called to open the distributed transaction, produce the xid, and then record the "state machine instance" startup event to the local database

When a "state" is executed, Seata Server is called to register the branch transaction, produce the branchId, and then record the "state instance" start execution event to the local database

When a "status" execution is completed, the "status instance" execution end event is recorded to the local database, and then Seata Server is called to report the status of the branch transaction.

When the entire state machine execution is complete, the "state machine instance" execution completion event is recorded to the local database, and then Seata Server is called to commit or roll back the distributed transaction

State machine engine design

The design of the state machine engine is mainly divided into three tiers, the upper layer depends on the lower layer, from the bottom up are:

Eventing layer:

To implement an event-driven architecture, events can be pressed into and consumed by the consumer side. This layer does not care what the event is and what the consumer side executes, and it is implemented by the upper layer.

ProcessController layer:

Because the upper layer Eventing drives the execution of an "empty" process execution, the behavior and routing of "state" are not realized, and are implemented by the upper layer.

Based on the above two layers, any "process" engine can be customized and extended. The design of these two layers refers to the design of the internal financial network platform.

StateMachineEngine layer:

Implement the behavior and routing logic of each state of the state machine engine

Provide API, state machine language repository

Practical experience of Service Design under Saga Mode

Here are some practical experiences of microservice design in Saga mode. Of course, this is the recommended practice, which does not mean that it must be followed 100%. There is also a "bypass" solution if it is not followed.

The good news: the Seata Saga pattern has no task requirements for the interface parameters of microservices, which makes the Saga pattern available for integrating services from legacy systems or external institutions.

Allow null compensation

Null compensation: the original service was not executed, the compensation service was executed

Cause of occurrence:

Original service timeout (packet loss)

Saga transaction triggers rollback

If the original service request is not received, the compensation request is received first.

Therefore, the service design needs to allow null compensation, that is, if the business primary key to be compensated is not found, the compensation success is returned and the original business primary key is recorded.

Anti-suspension control

Suspension: the compensation service is executed before the original service

Cause of occurrence:

Original service timeout (congestion)

Saga transaction rollback, triggering rollback

Congested original service arrives

Therefore, it is necessary to check whether the current business primary key already exists in the business primary key recorded by the empty compensation, and if so, deny the execution of the service.

Idempotent control

Both the original service and the compensation service need to be idempotent. Because the network may time out, you can set a retry strategy to avoid repeated updates of business data through idempotent control when retry occurs.

In many cases, we do not need to emphasize strength, we design more resilient systems based on BASE and Saga theory to achieve better performance and fault tolerance in the distributed architecture. There is no silver bullet in distributed architecture, only solutions suitable for specific scenarios. In fact, Seata Saga is a product with the capabilities of "service orchestration" and "Saga distributed transactions". To sum up, its applicable scenarios are:

Suitable for "long transaction" processing under micro-service architecture

Suitable for "service orchestration" requirements under micro-service architecture

Suitable for business systems with a large number of composite services above the financial core system (such as systems at channel layer, product layer, integration layer)

It is suitable for scenarios where legacy systems or services provided by external institutions need to be integrated in business processes (these services are immutable and cannot be modified).

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.