In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
A StateBackEnd consists of the following parts:
1. CheckpointStreamFactory construct flow is used to write Checkpoint data
Different StateBackEnd will have different implementations, returning different CheckpointStateOutputStream implementations, such as FsStateBackEnd will construct a file stream, and MemoryStateBackEnd will construct ByteArraOutputStream
CheckpointStateOutputStream is included as an IO proxy in KeyedStateCheckpointOutputStream and OperatorStateCheckpointOutputStream.
KeyedStateCheckpointOutputStream and OperatorStateCheckpointOutputStream need to record additional state, respectively. KeyedStateCheckpointOutputStream needs to record the starting position of each keyGroup in the stream, OperatorStateCheckpointOutputStream needs to record the starting position of each partition in the stream, and this information will be reflected in the corresponding StreamStateHandle.
CheckpointStateOutputStream defines closeAndGetHandle method returns an implementation of StreamStateHandle, this handle will be serialized and passed to JobManager, JobManager will save the handle as part of the snapshot, then when restoring data, you can get InputStream reading data through the handle in reverse.
See AbstractStreamOperator.snapshotState for details
InternalTimerServiceSerializationProxy.write -> HeapInternalTimerService.snapshotTimersForKeyGroup
KeyedStateBackEnd.snapshot OperatorStateBackEnd.snapshot
2.KeyedStateBackEnd
KeyedStateBackEnd is created when StreamTask is created, so a Task corresponds to a KeyedStateBackEnd.
KeyedStateBackEnd defines how to register and generate various States including ListState, MapState, ValueState, Aggregating State, FoldingState, Reducing State
KeyedStateBackEnd currently has two implementations: HeapKeyedStateBackend and RocksDBKeyedStateBackend. HeapKeyedStateBackend stores the state in an internal StateTable, and each State name corresponds to an Entry StateTable in the StateTable, which contains ternary information: Key, Namespace, Value. Key and Value are easy to understand, Namespace currently seems to be only used for Window operators, recording the current Window information, if there is no Window will give a default namespace (VoidNamespace.INSTANCE). RocksDBKeyedStateBackend generates a RocksDB column family based on StateDescription, and then reads and writes directly to RocksDB at each State get/set *
Asynchronous Snapshot: HeapKeyedStateBackend and RocksDBKeyedStateBackend both support asynchronous Snapshot, which is to start a separate thread to write State data to CheckpointStateOutputStream. However, there are requirements for data structure, because the state table itself may continue to change during the snapshot process. So you need to take a snapshot of the data at the beginning of the snapshot. HeapKeyedStateBackend internally uses CopyOnWriteStateTable to ensure thread safety, so that the data of the data snapshot will not corrupt. RocksDBKeyedStateBackend is similar in idea. Snapshot starts by calling RocksDB.snapshot, and then asynchronously writes State data to CheckpointStateOutputStream via threads.
Increment State Snapshot: RocksDBKeyedStateBackend Unique features. Refer to RocksDBIncrementalSnapshotOperation for specific implementation. Here's a quick comparison between RocksDB Full SnapshotOperation and RocksDB IncrementalSnapshotOperation. RocksDBFullSnapshotOperation reads all the KV data in Snapshot in its entirety and writes out all the kvMetadata and kvData to the stream. StateHandle returned is KeyGroupsStateHandle, consistent with HeapKedStateBackend. RocksDB IncrementalSnapshotOperation iterates through all files in the RocksDB checkpoint directory. Each time a Checkpoint is made, RocksDBKeyedStateBackend records the RocksDB ssd file corresponding to the current CheckPointId. In this way, when doing a new Checkpoint, you can compare files to obtain whether there is a new data file. The original data file does not need to be written, but directly returns a PlaceholderStreamStateHandle. Checkpoint does not traverse the KV to write, but directly to the stream to write the RocksDB data file data. StateHandle returned is IncrementalKeyedStateHandle which contains a handle to a set of RocksDB data files.
The data recovery process also needs to distinguish between full/incremental. Corresponding to RocksDBFullRestoreOperation and RocksDBIncrementalRestoreOperation respectively
3.OperatorStateBackEnd
OperatorState. Currently there is only one implementation: DefaultOperatorStateBackend. Construct a PartitionableListState (of ListState). This is an implementation of In Memory. Add operation appends to
A List of memory. Snapshot process is similar to KeyedStateBackEnd, so I won't repeat it here.
StateBackend class structure:
State recovery process:
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.