How to implement stateful Computing with Flink 04/28 Update SLTechnology News&Howtos

How to implement stateful Computing with Flink

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Flink how to achieve stateful computing, I believe that many inexperienced people do not know what to do, so this paper summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.

Streaming computing is divided into two cases: stateless and stateful. Stateless computing looks at each independent event. Storm is a stateless computing framework. There is no relationship between each message and before and after. One is one. For example, we receive data from power system sensors and alarm when the voltage exceeds 240v. This is stateless data. But if we need to judge multiple voltages at the same time, such as three-phase circuits, and we judge that all three-phase voltages are higher than a certain value, then we need to save and calculate the state. Because these three records were sent separately.

Storm needs to implement stateful computing on its own, such as with the help of custom memory variables or redis and other systems to ensure that it can judge and implement stateful computing in the case of low latency, but Flink does not need this, and as a new generation of stream processing system, Flink attaches great importance to it.

Consistency

In fact, it is the correctness of message delivery. In flow processing, consistency is divided into three levels.

At-most-once: once at the most, it may be lost.

At-least-once: at least once, it may be repeated, and the calculation may affect the result many times.

Exactly-once: it happens to be guaranteed once, and the result is the most accurate.

The systems that first guaranteed exactly-once (Storm Trident and Spark Streaming), but paid a high price in terms of performance and expressiveness. In order to ensure exactly-once, these systems cannot apply logic to each record individually, but process multiple records (batches) at the same time, ensuring that the processing of each batch is either successful or failed. This results in having to wait for the completion of a batch of record processing before getting the results. As a result, users often have to use two flow processing frameworks (one to guarantee exactly-once and the other to do low-latency processing for each element), which makes the infrastructure more complex.

However, Flink solves this problem.

Checkpoint mechanism

Checkpointing is one of the most valuable innovations of Flink because it enables Flink to guarantee exactly-once without sacrificing performance.

The core function of Flink checkpoints is to ensure that the state is correct, even if there is a program interruption. With this basic point in mind, let's use an example to see how checkpoints work. Flink provides users with tools to define states. For example, the following Scala program groups the first field (a string) of the input record and maintains the count state of the second field.

Val stream: DataStream [(String, Int)] =.

Val counts: DataStream [(String, Int)] = stream

.keyby (record = > record._1)

.mapWithState ((in: (String, Int), count: Option [Int]) = >

Count match {

Case Some (c) = > ((in._1, c + in._2), Some (c + in._2))

Case None = > (in._1, in._2), Some (in._2))

})

The program has two operators: the keyBy operator is used to group records according to the first element (a string), repartition the data according to the key, and then send the records to the next operator: the stateful map operator (mapWithState). After receiving each element, the map operator adds the data of the second field of the input record to the existing total, and then transmits the updated element.

Six records in the input stream are separated by a checkpoint barrier (checkpoint barrier), and all map operator states are 0 (counting has not yet started). All records with key a will be processed by the top-level map operator, all records with key b will be processed by the map operator in the middle layer, and all records with key c will be processed by the underlying map operator.

If the input stream is from the messaging system Kafka, this isolated position is the offset.

The checkpoint barrier flows between operators like ordinary records. When the map operator processes the first three records and receives the checkpoint barrier, they write the state to stable storage asynchronously.

When there is no failure, the overhead of Flink checkpoints is minimal, and the speed of checkpoint operations is determined by the available bandwidth of stable storage.

If the checkpoint operation fails, Flink discards the checkpoint and continues to execute normally because a subsequent checkpoint may succeed.

In this case, the Flink retopology (and may get new execution resources), reverses the input stream to the previous checkpoint, then restores the state value and continues the calculation from there.

Flink reverses the input stream to the location of the previous checkpoint barrier while restoring the state value of the map operator. Then, Flink starts reprocessing from here. This ensures that after the record is processed, the state value of the map operator is consistent with that when there is no fault.

The official name of the Flink checkpoint algorithm is Asynchronous Barrier Snapshot (asynchronous barrier snapshotting).

Save point

Status version control

The checkpoint is automatically generated by Flink and is used to reprocess the record when a fault occurs, thus correcting the state. Flink users can also consciously manage stateful versions through another feature called savepoint.

A save point works exactly the same way as a checkpoint, except that it is manually triggered by the user through the Flink command line tool or the Web console, rather than automatically triggered by Flink, and the user can restart the job from the save point without having to start from scratch. Another understanding of SavePoint is that it saves a version of the application state at an explicit point in time.

In the figure, v.0 is a running version of an application. We triggered the SavePoint at T1 and T2, respectively. Therefore, you can return to these two time points at any time and restart the program. More importantly, you can start the modified version of the program from the save point. For example, you can modify the code of your application (let's say the new version is v.1), and then run the changed code from T1.

Update the version of the Flink application with the SavePoint. The new version can be executed from a save point generated by the old version.

End-to-end consistency

In this application architecture, stateful Flink applications consume data from message queues and then write the data to the output system for query.

The input data comes from Kafka, so how to ensure exactly-once in the process of transferring state content to the output storage system? This is called end-to-end consistency. There are essentially two ways to implement it, and which one depends on the type of output storage system and the requirements of the application.

(1) the first method is to buffer all output in the sink link, and when the sink receives the checkpoint record, the output is "atomically submitted" to the storage system. This method ensures that there are only consistent results in the output storage system, and there is no duplicate data. In essence, the output storage system participates in the checkpoint operation of Flink. To do this, the output storage system needs to have the ability to "atomic commit".

(2) the second approach is to eagerly write data to the output storage system, keeping in mind that the data may be "dirty" and need to be reprocessed in the event of a failure. If a failure occurs, you need to roll back all the output, input, and Flink jobs to overwrite the "dirty" data and delete the "dirty" data that has been written to the output. Note that in many cases, no deletion actually occurs. For example, if the new record only overwrites the old record (rather than adding it to the output), the "dirty" data exists only briefly between checkpoints and will eventually be overwritten by the revised new data.

Depending on the type of output storage system, Flink and its corresponding connectors can work together to ensure end-to-end consistency and support multiple isolation levels.

After reading the above, have you mastered how Flink implements stateful computing? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.