Introduction of Flink State Management and Fault tolerance Mechanism 07/02 Update SLTechnology News&Howtos

Introduction of Flink State Management and Fault tolerance Mechanism

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Author: Shi Xiaogang

This article comes from the Flink Meetup conference held in Beijing on August 11, 2018. It is shared by Shi Xiaogang, who is currently engaged in the research and development of Blink at the big data team site of Ali, and is now mainly responsible for the research and development of Blink state management and fault tolerance related technologies.

The main contents of this paper are as follows:

Stateful stream data processing

Stateful interface in Flink

Implementation of state management and fault tolerance mechanism

Ali related work introduction; I. Stateful stream data processing 1.1 what is stateful computing

The result of a computing task depends not only on input, but also on its current state. In fact, most calculations are stateful. Wordcount, for example, gives some word to calculate its count, which is a very common business scenario. Count as an output, in the process of calculation to constantly add the input to the count, then count is a state.

1.2. Traditional stream computing systems lack effective support for program state.

Storage and access of state data

Backup and recovery of state data

Partition and dynamic expansion of state data

Cdn.xitu.io/2019/4/25/16a532cb9435dfda?w=1171&h=609&f=png&s=95901 ">

In traditional batch processing, the data is divided into chunks to complete, and then each Task processes one shard. When the sharding execution is complete, aggregating the output is the final result. In this process, the demand for state is relatively small.

For stream computing, there are very high requirements for State, because the input in the streaming system is an unlimited stream, which will run for a long time, or even for days or months. In this process, the state data needs to be well managed. Unfortunately, in traditional stream computing systems, the support for state management is not perfect. For example, storm does not have any support for program state. An optional solution is to implement the state data in storm+hbase. The state data is stored in Hbase. When calculating, the state data is read from Hbase again, updated and written in it. In this way there will be the following questions

The tasks of the streaming computing system and the data storage of Hbase may not be on the same machine, resulting in poor performance. This often leads to remote access, network and storage.

Backup and recovery are difficult because Hbase is not rolled back, and it is difficult to achieve Exactly onces. In a distributed environment, if the program fails and can only restart Storm, then the data of Hbase cannot be rolled back to the previous state. For example, in the scenario of advertising billing, Storm+Hbase does not work. The problem is that the money may be overcharged. The solution is Storm+mysql, and the consistency problem can be solved through mysql rollback. But the architecture can become very complex. Performance can also be poor, so commit is required to ensure data consistency.

For storm, the division of state data and dynamic expansion are also very difficult to do. A very serious problem is that all users will do these tasks repeatedly on strom, such as search and advertising, thus limiting the business development of the department. 1.3.Flink 's rich stateful access and efficient fault-tolerant mechanism

Flink was aware of this problem when it was first designed and provides rich state access and fault tolerance mechanisms. As shown in the following figure:

State management in Flink 2.1. According to the division and expansion of data, Flink is roughly divided into two categories:

Keyed States

Operator States

2.1.1.Keyed States

The use of Keyed States

Flink also provides Keyed States with a variety of data structure types

Dynamic expansion of Keyed States

2.1.2.Operator State

The use of Operator States

Operator States's data structure is not as rich as Keyed States, and now it only supports List.

Multiple expansion methods of Operator States

The dynamic extension of Operator States is very flexible. Three kinds of extensions are available, which are described below:

ListState: when concurrency changes, every List on concurrency is taken out, then these List are merged into a new List, and then evenly distributed to the new Task according to the number of elements.

UnionListState: compared to ListState, it is more flexible to leave the partition to the user. When concurrency is changed, the original List will be spliced together. Then give it to the user without dividing it.

BroadcastState: for example, when large tables and small tables do Join, small tables can be broadcast directly to the partitions of large tables, and the data on each concurrency is exactly the same. The update is the same. When you change the concurrency, just COPY the data to the new Task.

The above are the three expansion methods provided by Flink Operator States, and users can choose according to their own needs.

Using Checkpoint to improve the Reliability of programs

Users can open checkpoint according to the configuration in the program, and after a given time interval, the framework will back up the status of the program according to the time interval. When a failure occurs, Flink restores all Task states together to the Checkpoint state. Where to start the re-execution.

Flink also provides a variety of guarantees of correctness, including:

AT LEAST ONCE

Exactly once

Backing up program state data saved in State

Flink also provides a mechanism to allow these states to be put in memory. When doing Checkpoint, it is up to Flink to complete the recovery.

Recover from the running state of a stopped job

When the component is upgraded, the current job needs to be stopped. At this point, you need to recover from the previously stopped jobs. Flink provides two mechanisms to resume jobs:

Savepoint: it is a special checkpoint, but unlike checkpoint, which is triggered periodically from the system, it is triggered by the user through commands. The storage format is also different from checkpoint. The data will be stored in a standard format. No matter what the configuration is, Flink will recover from this checkpoint. It is a very good tool for version upgrade.

External Checkpoint: an extension of the existing checkpoint, that is, after an internal Checkpoint, an extra copy of checkpoint data will be stored in a directory given by the user

III. Implementation of state management and fault-tolerant mechanism

Here is a look at the implementation of state management and fault tolerance mechanism. Flink provides three different StateBackend

MemoryStateBackend

FsStateBackend

RockDBStateBackend

Users can choose according to their own needs. If the amount of data is small, it can be stored in MemoryStateBackend and FsStateBackend. If the amount of data is large, it can be put in RockDB.

HeapKeyedStateBackend and RockDBKeyedStateBackend are introduced below.

First, HeapKeyedStateBackend

Second, RockDBKeyedStateBackend

The execution process of Checkpoint

The execution flow of Checkpoint is implemented according to the Chandy-Lamport algorithm.

Alignment of Checkpoint Barrier

Full Checkpoint

When backing up data on each node, full Checkpoint only needs to facilitate the data once and then write it to external storage, which will affect the backup performance. On this basis, the optimization is made.

Incremental Checkpoint of RockDB

RockDB's data is updated to memory, and when memory is full, it is written to disk. The incremental mechanism persists the newly generated file COPY, while the previously generated file does not need to COPY into the persistence. In this way, the amount of data in COPY is reduced and performance is improved.

four。 Ali related work introduces the growth path of 4.1.Flink in Ali.

Ali began to investigate Flink in 2015, launched the Blink project in October 2015, and improved some optimization and improvement of Flink under large-scale production. Double 11 in 2016 adopted the Blink system to provide services for search, recommendation and advertising services. Blink has become Ali's real-time computing engine in May 2017.

4.2. Ali's work related to state management and fault tolerance

The work that is being done is based on some optimizations in refactoring Window based on State, and Ali is also improving the function. Follow-up will include the functional improvement of asynchronous Checkpoint, and further communication and cooperation with the community. Help Flink community to improve related work.

For more information, please visit the Apache Flink Chinese Community website.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.