In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "what is the role of windows in flink". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what is the role of windows in flink".
Window
Window computing is one of the commonly used data calculation methods in streaming computing. By dividing the data stream into different windows according to a fixed time or length, and then aggregating the data, the statistical results within a certain time range are obtained. for example, counting the number of clicks on a website in the last 5 minutes, at this time, the click data is constantly generated, and the data is limited to a fixed time range through the 5-minute window. You can aggregate the bounded data in this range to get the number of site hits in the last 5 minutes.
Code interface rules
Stream.keyBy (...) / / keyed type dataset. Window (...) / specify window allocator type [.trigger (...)] / / specify trigger type (optional) [.evictor (...)] / / specify evictor (optional) [.allowedLateness (...)] / / specify whether to delay data processing (optional) [.sideOutputLateData (...)] / / specify Output Lag (optional) .reduce/aggregate/fold/apply () / / specify the window calculation function [.getSideOutput (...)] / / output data according to Tag (optional)
Operator
Windows Assigner: specifies the window type that defines how data flows are assigned to one or more windows
Windows Trigger: specify the time when the window is triggered, and define what conditions the window meets to trigger calculation
Evictor: for data culling
Lateness: marks whether the late data is processed and whether the calculation is triggered when the late data arrives in the window.
Output Tag: the tag outputs the tag, and then the data in the window is output according to the tag through getSideOutput.
Windows Function: defines the logic of data processing on the window, such as sum operations on the data.
Keyed and Non-Keyed window
When calculating using the window, Flink will have different Window Assigner according to whether the upstream dataset is of KeyedStream type (partitioning the dataset by Key).
If the upstream dataset is of type KeyedStream, call the Windwo () method of DataStream API to specify Windows Assigner, and the data will be calculated separately in parallel in different Task instances according to Key, and finally the statistical results for each Key will be obtained.
If it is of type Non-Keyed, the WindowsAll () method is called to specify the Windows Assigner, and all the data is routed into a Task by the window operator to calculate and get the result.
It is recommended that the data be processed by KeyedStream, so as to start parallel computing and speed up efficiency.
Window Assigner
Flink supports two types of windows, one is time-based, the window size is constrained by the start and end timestamps, and the other is based on the number and defines the window size according to a fixed number.
According to the different ways of Windows Assigner data distribution, Windows is divided into four categories: scroll window (Tumbling Windows), sliding window (Sliding Windows), session window (Session Windows) and global window (Global Windows).
Scroll window
The rolling window is divided according to the fixed time or size, and the elements between the window and the window do not overlap each other, so it is suitable for the window calculation of fixed time size and period statistics of a certain index.
DataStream API provides Tumbling windows based on EventTime and ProcessTime time types, the corresponding Assigner is TumblingEventTimeWindows and TumblingProcessTimeWindows, the window size is specified by child labor of (), the time unit is Time.milliseconds (x), Time.seconds (x) or Time.minutes (x), or it can be a combination of different time units.
In the following example, the window time is divided according to 10s, and the window time is an equal fixed time range from [1lav 00rig 00.000-1RV 00RV 09.999] to [1RV 00R 10.000-1R 00R 19.999].
Val inputStream:DataStream [T] =. / / define EventTime Tumbling Windowsval tumblingEventTimeWindows=inputStream.keyBy (_ .id) / / define EventTime scroll window by using TumblingEventTimeWindows. Window (TumblingEventTimeWindows.of (Time.seconds (10) .process (...) / define window function / / define ProcessTime Tumbling Windowsval tumblingProcessingTimeWindows = inputStream.keyBy (_ .id) / / define Evnet Time scroll window through TumblingProcessTimeWindows. Window (window (Times.seconds (10)). Process (...) / / define window function sliding window
Sliding window is a common type of window, which is characterized by increasing window sliding time (Slide Time) on the basis of scrolling window and allowing window data to overlap. This kind of window does not move forward according to the Windows Size like the scroll window, but slides forward according to the set Slide Time. The size of data overlap between windows is determined by WindowsSize and Slide time. Window overlap occurs when Slide Time is less than WindowsSize, and when Slide Size is greater than WindowsSize, windows are discontinuous, and data may not be calculated in any window.
DataStream API is based on different time types of Assigner for Sliding Windows, including EventTime-based SlidingEventTimeWindows and Process Time-based SlidingProcessingTimeWindows.
The example is as follows, specifying Windows Size as 1h Slide Time as 10m.
Val inputStream:DataStream [T] =. / / define EventTime Sliding Windowsval slidingEventTimeWindows=inputStream.keyBy (_ .id) / / define EventTime scroll window by using SlidingEventTimeWindows. Window (SlidingEventTimeWindows.of (Time.hours (1), Time.minutes (10) .process (...) / define window function / / define ProcessTime Sliding Windowsval slidingProcessTimeWindows = inputStream.keyBy (_ .id) / / define Evnet Time scroll window through SlidingProcessTimeWindows. Window (SlidingProcessTimeWindows.of (Time.hours (1)) Time.minutes (10)) .process (...) / / define window function session window
Aggregate the highly active data in a certain period of time into a window for calculation. The trigger condition of the window is Session Gap, which means that if there is no active access to the data within the specified time, the task window ends and the window calculation is triggered.
Note: if the data is uninterrupted, it will cause the window to remain untriggered.
Unlike sliding and scrolling windows, Session Windows does not need to define Windows Size and Slide Time, but only needs to define session gap to specify the time for inactive data to go online.
Session Windows is suitable for scenarios where non-continuous data processing or periodic data generation occurs. You can create SessionWindows based on EventTime and ProcessTime in DataStream API, corresponding to Assigner EventTimeSessionWindow and ProcessTimerSessionWindows, respectively.
The example code is as follows:
Val inputStream:DataStream [T] =. / / define EventTimeSession Windowsval eventTimeSessionWindows=inputStream.keyBy (_ .id) / / define EventTime scroll window by using EventTimeSessionWindows. Window (EventTimeSessionWindows.withGap (Time.milliseconds (10) .process (...) / define window function / / define ProcessTimeSession Windowsval processTimeSessionWindows = inputStream.keyBy (_ .id) / / define Evnet Time scroll window through ProcessTimeSessionWindows. Window (window (Time.milliseconds (10)). Process (...) / / define window function
Flink supports dynamically adjusted Session Gap. You need to implement the SessionWindowTimeGapExtractor interface, copy the extract method, complete the extraction of Session Gap, and then pass the created Session Gap extractor into the ProcessiongTimeSessionWindows.withDynamicGap () method.
Val inputStream:DataStream [T] =. / / define EventTimeSession Windowsval eventTimeSessionWindows=inputStream.keyBy (_ .id) / / define the EventTime scroll window by using EventTimeSessionWindows (EventTimeSessionWindows.withDynamicGap (/ / instantiate SessionWindowTimeGapExtractor interface new SessionWindowTimeGapExtractor [String] {override def extract (element:String): Long= {/ / dynamically specify and return Session Gap}}) .process (.) / / set Semantic window function / / define ProcessTimeSession Windowsval processTimeSessionWindows = inputStream.keyBy (_ .id) / / define Evnet Time scroll window through ProcessTimeSessionWindows (ProcessTimeSessionWindows.withDynamicGap (/ / instantiate SessionWindowTimeGapExtractor interface new SessionWindowTimeGapExtractor [String] {override def extract (element:String): Long= {/ / dynamically specify and return Session Gap}}) .process (.) / / define window function global window
The global session window allocates all the same key data to a single window for calculation. The window has no start and end time, and the window needs to trigger the calculation with the help of Triger. If not specified, the calculation will not be triggered.
Use the global window very carefully, you must be clear about what your statistics are in the entire window, and specify the corresponding trigger and the corresponding data cleaning mechanism, otherwise the data will remain in memory all the time.
Val inputStream:DataStream [T] =... val globalWindows = inputStream.keyBy (_ .id) .window (GlobalWindows.create ()) / / define Global Windows.process () summary through GlobalWindows
The four windows defined by flink are easy to be confused with time window and event window. They are window definitions of different dimensions and require special attention.
The older you get, the lonelier you get, and cherish the good people around you.
Thank you for your reading, the above is the content of "what is the role of windows in flink". After the study of this article, I believe you have a deeper understanding of what the role of windows in flink is, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.