What is the Exactly-once principle of Flink? 04/21 Update SLTechnology News&Howtos

What is the Exactly-once principle of Flink?

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "what is the Exactly-once principle of Flink". In daily operation, I believe that many people have doubts about the Exactly-once principle of Flink. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "what is the Exactly-once principle of Flink?" Next, please follow the editor to study!

Flink

There are three locations in Flink that require end-to-end precision processing at one time:

Source side: when the data enters Flink from the previous stage, you need to ensure that the message is consumed accurately at one time.

Flink internal side: we have learned that using the Checkpoint mechanism, the state is saved and can be restored in case of failure to ensure the consistency of the internal state. If you don't know it, you can read my previous article:

The Cornerstone of Flink Reliability-detailed Analysis of checkpoint Mechanism

Sink side: when sending the processed data to the next stage, you need to ensure that the data can be sent to the next stage accurately.

Prior to Flink 1.4, precise one-time processing was limited to Flink applications, that is, all Operator was saved and managed entirely by Flink state. However, after Flink processes the data, most of the results need to be sent to external systems, such as Sink to Kafka. Flink does not guarantee accurate one-time processing in this process.

A landmark feature has been officially introduced in Flink 1.4: the two-phase commit Sink, the TwoPhaseCommitSinkFunction function. The SinkFunction extracts and encapsulates the common logic in the two-phase commit protocol, and since then Flink works with specific Source and Sink (such as Kafka version 0.11) to achieve precise primary processing semantics (EOS, that is, Exactly-Once Semantics).

End-to-end precise once processing semantics (EOS)

The following applies to Flink 1.4 and later

For the Source side: the precise one-time processing on the Source side is relatively simple. After all, the data falls into the Flink, so the Flink only needs to save the offset of the consumption data. For example, if you consume the data in the Kafka, Flink uses the Kafka Consumer as the Source, and you can save the offset. If a subsequent task fails, the offset can be reset by the connector during recovery to re-consume the data to ensure consistency.

For the Sink side: the Sink side is the most complex, because the data is landed on other systems, and once the data leaves the Flink, the Flink cannot monitor the data, so the precise one-time processing semantics must also be applied to the external systems where Flink writes data, so these external systems must provide a means to allow these write operations to be submitted or rolled back. At the same time, make sure that it can be used in harmony with Flink Checkpoint (Kafka version 0.11 has implemented precise once processing semantics).

Let's take the combination of Flink and Kafka as an example. Flink reads data from Kafka, and the processed data is written into Kafka.

The first reason why we take Kafka as an example is that most Flink systems read and write data with Kafka systems. The second and most important reason is that Kafka 0.11 officially released the support for transactions, which is a necessary condition for Flink applications interacting with Kafka to achieve end-to-end precision semantics.

Of course, Flink's support for this precise one-time processing semantics is not limited to combining with Kafka, and any Source/Sink can be used, as long as they provide the necessary coordination mechanisms.

Combination of Flink and Kafka

Stage I: voting stage

The coordinator sends a VOTE_REQUEST message to all participants.

When the participant receives the VOTE_REQUEST message, sends a VOTE_COMMIT message to the coordinator in response, telling the coordinator that he is ready to submit, and if the participant is not ready or encounters other failures, a VOTE_ABORT message is returned, telling the coordinator that the transaction cannot be committed at this time.

Phase 2: submission phase

The coordinator collects voting information from various participants. If all participants agree that the transaction can be committed, then the coordinator decides the final commit of the transaction, in which case the coordinator sends a GLOBAL_COMMIT message to all participants to notify the participants for local commit; if any one of the participants returns a message of VOTE_ABORT, the coordinator cancels the transaction and broadcasts a GLOBAL_ABORT message to all participants informing all participants to cancel the transaction.

Each participant who submits the voting information waits for the coordinator to return the message. If the participant receives a GLOBAL_COMMIT message, the participant submits the local transaction, otherwise, if the GLOBAL_ABORT message is received, the participant cancels the local transaction.

Application of two-phase commit protocol in Flink

Flink's idea of two-phase submission:

We start the Flink program to consume Kafka data, and finally to Flink to Sink the data to Kafka, to analyze the precise one-time processing of Flink.

When Checkpoint starts, JobManager injects the checkpoint demarcation line (checkpoint battier) into the data stream, and the checkpoint barrier is passed between operators, as shown below:

Flink accurate one-time processing: checkpoint barrier and offset preservation

Slink side: starting from the Source side, each internal transform task will store its status in Checkpoint when it encounters checkpoint barrier (checkpoint demarcation line). When the data is processed to the Sink end, the Sink task first writes the data to the external Kafka, which belongs to pre-committed transactions (which cannot be consumed yet). During the Pre-commit pre-commit phase, Data Sink must pre-commit its external transactions while saving the state to the back end, as shown in the following figure:

Note: Flink is stored in Checkpoint by each TaskManager coordinated by JobManager, and the Checkpoint is stored in StateBackend (status backend). The default StateBackend is memory-level, or it can be changed to file-level persistent storage.

The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.