How to analyze the principle, structure and characteristics of GTS 07/15 Update SLTechnology News&Howtos

How to analyze the principle, structure and characteristics of GTS

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about how to analyze the principle, structure and characteristics of GTS. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

Label

Distributed transaction, GTS, Global Transaction Service, flexible transaction, TCC, XA two-phase commit protocol

Global transaction Service (Global Transaction Service, referred to as GTS) is a new distributed transaction processing scheme introduced by Ali, and there is a relative lack of in-depth analysis of its data. The goal of this paper is to analyze the technical route of GTS and clarify its advantages and constraints. The article refers to the patents, product documents and related web pages published by GTS. There must be inaccuracies in the article. Students are welcome to clap bricks and correct them.

I. the goal of GTS

GTS is a distributed transaction solution for Internet transaction scenarios.

Three factors restricting distributed transaction

Distributed transaction is one of the key problems in Internet transaction scenarios. Different from search, social networking and online analysis applications, e-commerce and payment are typical transaction scenarios, data errors will bring serious consequences, and there are high requirements for data consistency and availability. The Internet environment has brought a huge amount of data capacity, connections and visits, which can not be dealt with by a single database node, which has become the bottleneck of the whole system. In order to solve the problem that a single database becomes a bottleneck, the linear expansion of database capability is realized by data splitting. Data splitting is the way of using sub-database and sub-table, storing data in multiple database nodes, and using distributed database platform to solve the problem of database bottleneck. In the distributed database environment, a transaction will span multiple databases and face the problem of distributed transaction processing.

Distributed transaction solutions face the challenges of application flexibility, data consistency and performance. At present, there are a variety of mature schemes, each of which is a trade-off between these three aspects.

The three factors that restrict each other are:

Application flexibility: whether the way the application accesses the data needs to be modified, and to what extent.

Consistency: whether the data is strongly consistent or ultimately consistent (allowing intermediate inconsistencies).

System performance: the impact of distributed transactions on overall performance.

Existing distributed processing schemes

The existing mature distributed solutions include XA two-phase commit, reliable message and TCC pattern. XA two-phase commit belongs to strong consistent transaction, reliable message and TCC mode belong to flexible transaction.

XA two-phase submission

XA refers to the specification of distributed transaction processing proposed by X/Open organization. The XA specification mainly defines the interface between Transaction Manager (TM) and Resource Manager (RM). The structure is shown in the following figure.

The process of XA protocol can be roughly divided into three steps:

Step 1:APP creates a global transaction to TM, and TM returns the global transaction number to APP.

Step 2:APP uses the global transaction number to access the resources of RM (when RM is a database, resource access is a SQL operation). When RM receives access for the first time, it registers with TM using this global transaction number, and TM returns the transaction branch transaction number.

Step 3:APP sends a global transaction commit request to the TM, and the TM communicates with the RM participating in the transaction to carry out commit processing. After all, the result is returned to the APP.

The commit processing between TM and RM adopts two-phase commit protocol. In the first phase, TM requests "prepare" operations for all RM participating transactions, reaching a consensus on distributed transaction consistency. Transaction participants must complete all constraint checks and ensure that the data required for subsequent commit or discard has been persisted. In the second stage, according to the previous consensus of commit or abandonment, the RM of all participants is requested to complete the corresponding operation.

The process of committing a transaction needs to be coordinated among multiple resource nodes, and the release of lock resources by each node must wait until the transaction is finally committed, so two-phase commit takes more time than one-phase commit to execute the same transaction. When the transaction concurrency reaches a certain number, there will be a large backlog of transactions or even deadlocks, and the system performance and processing throughput will decline seriously.

Reliable information

A possible structure for reliable messages is shown in the following figure.

Description:

Before the business transaction is submitted, the business processing service requests to send a message to the real-time message service, which only records the message data and does not actually send it.

After the business transaction is submitted, the business processing service confirms the transmission to the real-time message service. The real-time message service actually sends the message only after the confirmation instruction is received.

After the business transaction is rolled back, the business processing service cancels sending to the real-time messaging service.

The message status confirmation system periodically finds the message sent without acknowledgement or rollback, and inquires the message status from the business processing service, which determines whether the message is valid according to the message ID or message content.

Through the asynchronous transaction of the message, we can ensure the success or failure of the business data operation and the sending of the message at the same time, and maintain the final consistency of the transaction.

By using the way of reliable message, when the distributed transaction is implemented between two transactions, the final consistency of the transaction and the rollback of the transaction can be well satisfied, but if there are more than two transaction operations in a transaction context, developers are required to record the operation log of the whole transaction process, the rollback of each transaction branch and the accurate scheduling of the whole process.

TCC mode

The TCC pattern provides a framework for global transaction execution. Developers only need to implement the rollback of each transaction branch and do not need to record the operation log of the entire transaction process. The structure of the TCC schema is shown in the following figure.

Description:

A complete business activity consists of a master business service and several slave business services.

The main business service is responsible for initiating and completing the entire business activity.

Provide TCC-type business operations from business services.

The business activity manager controls the consistency of business activities, registers operations in business activities, confirms confirm operations of all TCC operations when business activities are submitted, and invokes cancel operations of all TCC operations when business activities are cancelled.

The TCC service consists of two phases:

The first phase: the master business service invokes all slave business try operations respectively, and registers all slave business services in the activity manager. When all try operations from the business service are successfully invoked or some try operation from the business service fails, proceed to the second phase.

The second phase: the activity manager performs confirm or cancel operations based on the execution results of the first phase.

If all try operations in the first phase are successful, the activity manager invokes all confirm operations from the business activity. Otherwise, all cancel operations from the business service are invoked.

Summary

Reliable message and TCC pattern improve the performance by avoiding the long-term locking of data resources by XA two-phase commit, and achieve the final consistency by implementing the transaction mechanism outside the database, but at the expense of application flexibility, developers need to implement the details of transaction checking and rollback, so they are faced with the problem of spending a lot of energy to ensure the correctness of the application.

The goal of GTS is that when the performance overhead is acceptable, GTS uniformly handles the fault recovery and concurrency control of global transactions, shielding the details of transaction processing from application development, so as to improve the flexibility of applications and the consistency of data.

II. The technical route of GTS

GTS adopts the technical route of optimization based on XA architecture, while retaining the flexibility of XA architecture, by decoupling the first stage of XA commit from the second stage, the commit process is transformed into the first stage of local transaction commit + the second stage of asynchronous cleaning, so as to improve system performance. At the same time, the rollback and concurrency control of global transactions are realized by maintaining application-level log and lock information in GTS.

GTS scheme believes that the fundamental reason for the inefficient performance of XA is the use of blocking protocol. In the first phase of distributed transaction commit, wait for the slowest transaction branch to complete, even if there is no lock conflict, the database connection of each transaction branch will still be suspended and the resources occupied can not be released to prevent data inconsistency caused by releasing resources before the global transaction commits. For large-scale Internet enterprises with extremely high business traffic, it is difficult to accept the huge performance overhead brought by the two-phase commit protocol of XA.

The GTS schema contains exactly the same components as XA, as shown in the following figure.

The global transaction process of GTS is the same as that of XA, and also includes three steps: global transaction registration, data access and global transaction commit, but it is different from XA in the internal processing of the second and third steps:

In the second step of data access, when each transaction branch completes the data operation, the global transaction information (lock and log information) is stored in the table of the current database.

In the third step, the mode of one-stage local transaction commit and two-phase asynchronous clean-up is adopted in the global transaction commit. First, commit local transactions to each database, and release system resources such as database connections, and then send a global transaction commit request to TM. After receiving the request, TM immediately returns to success. The subsequent actual work of TM is to clean up the global transaction information of each database using global transaction identifiers.

GTS and XA adopt different implementation mechanisms for fault recovery and concurrency control of global transactions:

The XA two-phase protocol realizes the rollback and concurrency control of global transactions based on the log and lock information of the database kernel. Because the local transaction is committed and the connection is released directly in the GTS one-stage local transaction commit, the log and lock table of the database kernel are no longer valid for the global transaction. In the second step, GTS stores the log and lock information in the table, and when the transaction is committed locally, the log and lock information is persisted for concurrency control and fault recovery of the global transaction.

The fault recovery of GTS is only UNDO operation without REDO operation, and the log table stores the information needed by UNDO, including row record identification, global transaction number, mirror query statement, front image of operation and afterimage of operation. When a failure occurs, for the database that has been submitted locally, find the modified record from the UNDO table, record the pre-operation image and post-operation image, and use the mirror query statement to read the current value of the record from the database. If the current value is the same as the image after the recording operation, the image before the operation is directly used for recovery, otherwise the alarm is given and manual processing is carried out.

The locking information of the record is stored in the global lock table of GTS. The granularity of lock is row (record), and the type of lock includes shared lock and mutex lock. For the same record, the rule of locking is that there is no conflict between shared lock and shared lock, shared lock and mutex lock, mutex lock and mutex lock. Lock query for insert (INSERT), modify (UPDATE), delete (DELETE), update mode (SELECT... FOR UPDATE) operation with mutex. Lock query for shared mode (SELECT... LOCK IN SHARE MODE) operation plus shared lock. If there is no lock conflict, add a row to the GTS lock table to indicate that the lock is successful.

The default isolation level of GTS is read uncommitted (dirty data), using SELECT. FOR UPDATE and SELECT... LOCK IN SHARE MODE, which raises the query isolation level to read submitted.

Third, the architecture and processing flow architecture of GTS

The following figure describes a possible implementation architecture for GTS.

Like XA architecture, GTS architecture consists of three parts: application, transaction manager and resource manager. The resource manager consists of transaction branch processing module, mirror query construction module, concurrency control module, recovery control module, and GTS transaction information (GTS lock table and GTS log table) stored in the database.

Transaction branch processing module: it is the external interface of the resource manager and completes the invocation of internal modules.

Mirror query construction module: the mirror query statement corresponding to the recordset is generated from Insert, Update and Delete statements. For example, if the table_name table contains two fields, column1 and column2,column1, as the primary key, the mirror query statement is select column1 and column2 from table_name where column1=v1.

Concurrency control module: based on GTS transaction lock table, maintain read and write concurrency control. The lock table is defined as follows:

Field name Field Type Field description ID Integer self-increment Primary key TABLE_NAME string Table name KEY_VALUE Integer data Row IDXID string Global transaction Identification XLOCK Integer Mutex Mark SLOCK Integer shared Lock Mark BRANCH_ID Integer transaction Branch ID

Recovery control module: fault recovery based on GTS log table. The log table is defined as follows:

Field name field type field description ID integer self-increment primary key GMT_CREATE time creation time GMT_MODIFIEDdatetime modification time XID integer global transaction IDBRANCH_ID integer branch transaction IDROLLBACK_INFOlongblob query statement, front image and back image STATUS integer state SERVER string branch DB IP main process sequence diagram

The sequence diagrams (a possible implementation) of four operations, namely, insert/delete/update operation, read committed operation, commit operation and rollback operation are described respectively.

Sequence diagram of insert/delete/update operation flow

Read the submitted operation flow sequence diagram

Submit operation flow sequence diagram

Rollback operation flow sequence diagram

Ali official case

The GTS product website gives the most typical case of money transfer in transaction transactions.

The data of users An and B are located in two different sub-databases of a DRDS instance. 50 processes are used to transfer money from A to 3 concurrently. Each process transfers money 10 times. The amount of each transfer is randomly generated between 1 and 10. 3% of the network exception is simulated during the transfer process, and the total amount of An and B money is ensured by using GTS transactions.

It can be seen from the code that the application of stand-alone transaction can be promoted to distributed transaction by adding a sql statement that opens GTS, which shows good application flexibility. In the test, the transfer transaction was executed 500 times, with 490 successes and 10 failures. The total amount of the inquiry account is correct 10 seconds after the end of the transfer.

In the 2017 Cloud Congress GTS product introduction, the comparison between using GTS and not using transaction (1PC) testing is given. In the figure below, the performance loss of GTS is 10% lower than that of 1PC, which is much smaller than that of 2PC, showing excellent performance.

IV. Advantages and constraints of GTS

Compared with the distributed transaction based on message queuing and TCC compensation mode, GTS has better application flexibility and data consistency when the performance is satisfied.

Flexibility: database applications basically achieve zero modification, at the same time, based on the XA model, it can easily support a variety of RM, such as message queue database.

Data consistency: GTS's default transaction isolation level is read uncommitted, which achieves the maximum performance of distributed transactions, but may read dirty data. For applications with high consistency requirements, if performance permits, committed read statements (for update, lock in share mode) can be used to raise the isolation level to read committed.

According to the characteristics of the implementation mechanism of GTS, there are the following constraints in the application scenario: the number of locking operation records should not be too large, the operation conflicts should not be too many, and the locking time should not be too long. If the above constraints are violated, the internal GTS will occupy too many resources, lock conflicts and rollbacks will increase, resulting in performance degradation. The core transaction scenarios in e-commerce, logistics, finance and retail industries have the characteristics of high concurrency, high performance, small single operation data set and sensitive transaction response time. GTS schemes have extensive and good application prospects in such scenarios.

The above is the editor for you to share how to analyze the principle, architecture and characteristics of GTS, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.