How to realize the characteristic Analysis of Kafka transaction 07/11 Update SLTechnology News&Howtos

How to realize the characteristic Analysis of Kafka transaction

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article is about how to achieve Kafka transaction characteristics analysis, the editor feels very practical, so share with you to learn, I hope you can learn something after reading this article, do not say much, follow the editor to have a look.

Characteristic background

Message transaction refers to a series of production and consumption operations that either complete or fail, similar to database transactions. This feature is not supported in version 0.10.2 and has not been supported since version 0.11. Huawei Cloud DMS is the first to provide a dedicated version of Kafka 1.1.0, which supports message transaction features.

What is the purpose of supporting transaction messages? Message transaction is a scheme to realize distributed transaction, which can ensure the ultimate consistency of data in distributed scenarios. For example, in the most commonly used transfer scenario, Xiao Wang transfers money to Xiao Ming, and the actual operation is that Xiao Wang's account minus the corresponding amount, and Xiao Ming's account increases the corresponding amount. Under the premise of sub-database and sub-table, the two accounts are stored in different databases. At this time, distributed transactions are needed to ensure database consistency, and transactions in a single database cannot guarantee atomicity across databases. If Xiao Wang's account deducts money first, and then sends a message to Xiaoming's database to notify the increase of money, in the absence of a transaction message, whether it is deducting money first or sending notice to increase money first, there will be data inconsistency. because the atomicity of the two cannot be guaranteed. With transaction messages, it can be guaranteed that sending notifications and local transactions (deduction) are an atomic operation, and local transactions and sending notifications can succeed or fail at the same time to ensure data consistency.

In addition to the final consistency of the data, the message Exactly once semantics is also implemented. The so-called Exactly once semantics is one of the most difficult to implement in message passing semantics, including At most once: at most once (no repetition, but data may be lost); At least once: delivered at least once (not lost, but will lead to repetition) and Exactly once: just once (no loss, no repetition), that is, idempotent. The idempotency of Kafka ensures that production implements Exactl once semantics for only one partition, which requires multiple partitions to implement this semantics, and message transactions need to be introduced to ensure atomicity.

Introduction to distributed transactions

At present, the mainstream of system architecture is distributed architecture and micro-service architecture. Under this architecture, the data source is not a single database, and business logic often needs to implement atomic operations in multiple databases. Powerful local transactions in a single database cannot guarantee multi-node atomic operations. At this point, distributed transactions are needed to ensure data consistency. At present, there are several widely used distributed transaction solutions:

1. XA transaction: two-phase / three-phase commit

XA is a distributed transaction specification proposed by the X/Open organization. The XA specification mainly defines the interface between (global) transaction manager (Transaction Manager) and (local) resource manager (Resource Manager). The XA interface is a bi-directional system interface that forms a communication bridge between the transaction manager (Transaction Manager) and one or more resource managers (Resource Manager). The key to implementing XA transactions is the two-phase and three-phase commit protocols.

The two-phase commit protocol (Two-phase Commit,2PC) is often used to implement distributed transactions. It is generally divided into two roles: coordinator C and several transaction participants Si, where the transaction participant is the specific database, and the coordinator can be on the same machine with the transaction participant.

The two-phase commit protocol mainly consists of two stages: the first stage is the preparation phase (prepare), and the second stage is the submission phase. In the preparation phase, the transaction coordinator sends prepare messages to the transaction participants, who process the local transaction without committing, and then return the transaction status to the transaction coordinator. In the submission phase, according to the execution request of each participant in the preparation phase, the coordinator determines whether the transaction is committed or rolled back, and sends commands to each participant.

The main problem of the two-phase commit protocol is that in the process of submission execution, all participants need to listen to the unified scheduling of the coordinator, during which they are blocked and can not engage in other operations, which is extremely inefficient. In particular, when the coordinator sends a submission notification to some participants and then downtime, other participants will block.

In view of the problems existing in the two-phase commit, the three-phase commit protocol adds a pre-commit phase between the prepare and commit phases. The Prepare phase only asks the participants and does not do the transaction, while in the pre-commit phase each participant executes the local transaction without committing. The Commit phase is to submit directly. This avoids the second phase when the coordinator delays sending commit or rollback notifications, and the participants can commit or roll back themselves after the timeout to avoid blocking the transaction (this is because it has been confirmed that each participant is executable after the prepare phase, and the third phase can be executed directly at the end). There are also many problems in three-phase commit, and the data consistency can not be completely guaranteed, which requires the use of Paxos algorithm.

2. TCC compensatory transaction solution

TCC corresponds to Try, Confirm and Cancel respectively, which means as follows:

-Try: Reserve business resources

-Confirm: confirm the execution of business operations and transactions

-Cancel: cancels the execution of business operation

TCC solves the atomicity problem of cross-application business operations, and is very practical in scenarios such as combined payment and account split. TCC actually refers the two-phase commit of the database layer to the application layer, which is an one-stage commit for the database, which avoids the problem of poor 2PC performance in the database layer. TCC requires business to provide use, complex development and high cost.

3. Transaction message

The distributed transaction is completed based on the transaction message of message middleware. Transaction messages ensure that local execution transactions and message delivery are atomic: first send a message to the message middleware, then execute the local transaction, and then send a commit confirmation to the message middleware when the local transaction succeeds. Then the message can be perceived by other business consumers, thus ensuring atomicity.

Kafka message transaction

01 basic concepts

To support transactions, version 0.11.0 of Kafka introduces the following concepts:

1. Transaction coordinator: similar to the consumer group load balancing coordinator, each production side that implements the transaction is assigned to a transaction coordinator (Transaction Coordinator).

two。 An internal Kafka Topic is introduced as the transaction Log: similar to the Topic of the consumption management Offset, the transaction Topic itself is persistent, and the log information records the transaction status information, which is written by the transaction coordinator.

3. Introduce control messages (Control Messages): these are special messages generated by the client and written to the topic, but are not visible to the consumer. They are used to let broker tell the consumer whether the previously pulled message has been atomically submitted.

4. Introduction of TransactionId: different production instances use the same TransactionId to represent the same transaction, which can be sent idempotently across Session data. When a new Producer instance with the same Transaction ID is created and working, the old Producer with the same Transaction ID will no longer work, avoiding transaction deadlock.

5.Producer ID: each new Producer is assigned a unique PID during initialization, and this PID is not visible to the user. It is mainly introduced to provide idempotency.

6.Sequence Numbler . (for each PID, each of the data sent by the Producer corresponds to a monotonously increasing Sequence Number starting at 0.

7. Each producer adds an epoch: used to identify the epoch of the same transaction Id in a transaction, which is incremented each time the transaction is initialized, so that the server can know whether the producer request is an old request.

8. Idempotency: ensures that messages sent to a single partition will only be sent once and there will be no duplicate messages. Add an idempotent switch enable.idempotence, which can be used independently with transactions, that is, you can turn on idempotents but not transactions.

02 transaction process

1. Find the transaction coordinator

The producer first initiates a request to find the transaction coordinator (FindCoordinatorRequest). The coordinator will be responsible for assigning a PID to the producer. Similar to the coordinator of the consumer group.

2. Obtain produce ID

After knowing the transaction coordinator, the producer needs to send an initialization pid request (initPidRequest) to the coordinator. There are two situations for this request:

● without transactionID

In this case, you can directly generate a new produce ID and return it to the client.

● with transactionID

In this case, kafka gets the corresponding PID based on transactionalId, which is stored in the transaction log (figure 2a above). This ensures that the same TransactionId returns the same PID for recovering or terminating previously outstanding transactions.

3. Start the transaction

The producer starts the transaction by calling the beginTransaction interface, where only the internal state is recorded as the beginning of the transaction, but the transaction coordinator believes that the transaction starts only when the producer starts sending the first message.

4. Coordination process of consumption and production

This step is the process of consuming and generating each other to complete the transaction, which involves multiple requests:

● adds partition to transaction request

When the producer has a new partition to write data, an AddPartitionToTxnRequest is sent to the transaction coordinator. The coordinator processes the request, and the main thing to do is to update the transaction metadata information and write it to the transaction log (transaction Topic).

● production request

The producer sends data to the partition by calling the send interface, and these requests add pid,epoch and sequence number fields.

● adds consumption offset to transactions

The producer sends the Offset information of a partition to the transaction coordinator through the new snedOffsets ToTransaction interface. The coordinator adds partition information to the transaction.

● transaction commit offset request

When the producer invokes the transaction commit offset interface, a TxnOffsetCommitRequest request is sent to the consumer group coordinator, who stores the offset in _ _ consumer-offsets Topic. The coordinator verifies that the producer is allowed to initiate the request based on the requested PID and epoch. The consumption offset is visible only when the transaction is committed.

5. Commit or roll back the transaction

The user commits or rolls back the transaction by calling the commitTransaction or abortTranssaction method.

● EndTxnRequest

When the producer completes the transaction, the client needs to explicitly call to end the transaction or roll back the transaction. The former makes the message visible to the consumer, while the latter marks the production data as Abort, making the message invisible to the consumer. Whether committed or rolled back, an EndTnxRequest request is sent to the transaction coordinator and PREPARE_COMMIT or PREPARE_ABORT information is written to the transaction log (5.1a).

● WriteTxnMarkerRequest

This request is sent by the transaction coordinator to the Leader of each TopicPartition in the transaction. After receiving the request, each Broker will write COMMIT (PID) or ABORT (PID) control information to the data log (5.2a).

This information is used to tell the consumer which transaction the current message is and whether the message should be accepted or discarded. For uncommitted messages, the consumer caches the transaction's message until it is committed or rolled back.

Note here that if the transaction also involves _ _ consumer_offsets, that is, if there is an operation of consuming data in the transaction and the consumed Offset is stored in _ _ consumer_offsets, the Transaction Coordinator also needs to send WriteTxnMarkerRequest to the Leader of each Partition of the internal Topic to write COMMIT (PID) or COMMIT (PID) control information (to the left of 5.2a).

● writes final commit or rollback information

When the commit and rollback information is written to the data day, the transaction coordinator will write the final commit or termination information to the transaction log to indicate that the transaction has been completed (figure 5.3). At this time, most messages related to the transaction can be deleted (after the mark will be removed when the log is compressed), we only need to keep the transaction ID and its timestamp.

The above is how to achieve the analysis of Kafka transaction characteristics, the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.