In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Editor to share with you a sample analysis of Flink State management, I hope you will gain something after reading this article. Let's discuss it together.
State usage scene
Deweighting
For example, the upstream system data may be duplicated, and when it falls to the downstream system, it is hoped that all the duplicated data will be removed. To repeat, you need to know which data has come and which has not come yet, that is, to record all the primary keys, and when a piece of data arrives, you can see whether it exists in the primary key.
Window calculation
For example, count how many times the Nginx log API is accessed per minute. The window is calculated once a minute. Before the window is triggered, such as the window from 08:00 to 08:01, the data in the first 59 seconds needs to be put into memory, that is, the data in this window needs to be retained until 8:01 a minute later. Then output the triggered data in the entire window. Untriggered window data is also a state.
Machine learning / deep learning
For example, the training model and the parameters of the current model are also a kind of state, machine learning may have a data set every time, and it is necessary to learn on the data set and give feedback to the model.
Access to historical data
For example, compared with yesterday's data, we need to access some historical data. If you read it from the outside each time, the consumption of resources may be relatively large, so you also want to put these historical data into the state for comparison.
Ideal state management
Easy to use
Flink provides rich data structures, various forms of state organization and simple extension interfaces to make state management easier to use.
High efficiency
Real-time jobs generally require lower latency and faster recovery in the event of a failure; when the processing capacity is insufficient, it can scale out without affecting the processing performance of the job itself when processing backups
Reliable
Flink provides state persistence, including unlost semantics and automatic fault tolerance, such as HA, which automatically pulls up when a node dies without human intervention.
Flink status Managed State & Raw State
From the way of state management, Managed State is managed by Flink Runtime, automatically stored and recovered, and optimized in memory management; while Raw State needs to be managed and serialized by users, Flink does not know what structure the data stored in State is, only users know it, and need to be serialized into a storable data structure eventually.
In terms of state data structures, Managed State supports known data structures, such as Value, List, Map and so on. Raw State only supports byte arrays, and all states have to be converted to binary byte arrays.
In terms of recommended usage scenarios, Managed State can be used in most cases, while Raw State is recommended when Managed State is insufficient, such as when custom Operator is needed.
Keyed State & Operator State
Keyed State can only be used in the operator of KeyedStream, that is, there is no way to use KeyedStream without keyBy in the whole program.
Operator State can be used for all operators, often for Source. Since there is no Key in Operator State, when concurrency changes, you need to choose how to redistribute the state. There are two built-in distribution methods: one is to distribute evenly, and the other is to merge all State into full State and redistribute it to each instance.
Keyed State is accessed through RuntimeContext, which requires that Operator is a Rich Function. Operator State needs to implement the CheckpointedFunction or ListCheckpointed interface on its own. In terms of data structures, Keyed State supports data structures such as ValueState, ListState, ReducingState, AggregatingState and MapState;, while Operator State supports relatively few data structures, such as ListState.
Commonly used Keyed State
ValueState stores a single value, such as Wordcount, and using Word as its Key,State is its Count. The single value in this may be a numeric value or a string. As a single value, there may be two kinds of access interfaces, get and set. Update (T) / T value () is represented on State.
The status data type of MapState is Map, and there are put, remove, and so on on State. It should be noted that the key in MapState is not the same as the key in Keyed state.
ListState status data type is List, access interfaces such as add, update, etc.
ReducingState and AggregatingState are the same parent class as ListState, but the state data type is a single value, because instead of appending the current element to the list, the add method updates the current element directly into the result of Reducing.
The difference between AggregatingState is that in the access interface, add (T) and T get () in ReducingState have the same type of element going in and out, but in AggregatingState input IN, the output is OUT.
State saving and recovery
The preservation of Flink state mainly depends on the Checkpoint mechanism. Checkpoint will regularly make distributed snapshots and back up the state in the program.
MemoryStateBackend
The first kind of Checkpoint storage is memory storage, that is, MemoryStateBackend. The construction method is to set the maximum StateSize and choose whether to make an asynchronous snapshot. This storage state itself is stored in the memory of the TaskManager node, that is, the execution node. Because there is a memory limit, a single State maxStateSize defaults to 5m, and you need to pay attention to maxStateSize.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 253
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.