Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Introduction and usage of flink1.2 version time and water level line

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains "the introduction and usage of flink1.2 version time, water level line". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Now let the editor to take you to learn the "flink1.2 version time, water level line introduction and usage" bar!

Water level line

Water level line is a mechanism of flink to deal with delay data, which is mainly fault-tolerant for delay data in set time. The essence of water level line is timestamp, and the calculation formula is: the maximum time of current event-data delay time. (a little confused after watching it several times)

Personal understanding:

The watermark is the logical time note for receiving the data and the basis for dealing with the delay data. The delay data correction is realized by Timestamps with the generation time of the data itself.

Watermarks in Category sequence events

Ideally, the watermark, that is, the event events of the data elements are ordered, and the Watermark timestamp is generated with the event time installation order of the data elements, where the watermark time and time time are consistent.

Watermarks in disordered events

In reality, data elements often do not access Flink according to their production order, but frequently deal with disorder or tardiness, which requires watermark to deal with. When event 8 and event 11 enter the system at the same time, the flink system will calculate their watermark according to the set delay value. When the two events arrive in an operator, the virtual time of the matching event time matches the watermark and triggers the calculation of the response.

Watermarks in parallel data flow

Watermark is generated in Source Operator and independently in the child Task of each Operator.

If a watermark updates the current event time of an operator Task at the same time, Flink selects the minimum watermark to update. When the water mark in a Window operator Task is greater than the end time of Window, the window calculation is triggered immediately.

Time concept

The most important feature of streaming processing is that the data has the attribute of time. According to the location where time is generated, Flink divides time into three concepts: event generation time (Event Time), event access time (Ingestion TIme) and event processing time (Processing Time).

Event generation time: the time consumed by the process in which data is generated from a terminal or system.

Data access time: the time when the data is connected to DataSource.

Event processing time: the host time obtained during processing.

Event Time

Timestamps and Watermark exist in pairs, and when used, both must be specified

Watermark

Watermark sets the default 200ms for Watermark in Flink to generate once, or you can specify it manually. The code is as follows:

/ 1. Create flink running environment StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment (); env.setParallelism (3); / / set parallelism env.setRuntimeMode (RuntimeExecutionMode.AUTOMATIC); / / processing mode setting: the time interval between streams or batches / / generating watermark (per n milliseconds), and set the time interval for periodically generating water level lines. When the data flow is large, if each event produces a watermark, it will affect performance. / / env.getConfig () .setAutoWatermarkInterval (1000); / / automatic watermarking interval is not set for version 12. Timestamps is specified by default.

Here, take the scrolling window as an example. The next time the knowledge is shared in the window, the data is first institutionalized. The data structure is "yyyy-MM-dd HH:mm:ss | type | num". The processing code is as follows:

SingleOutputStreamOperator formatData = text.map (new MapFunction () {/ / data format conversion private static final long serialVersionUID = 1L; @ Override public Tuple3 map (String value) throws Exception {Tuple3 data = new Tuple3 (); String [] dataTmp = value.split ("\\ |"); data.f0 = dataTmp [0]; data.f1 = dataTmp [1]; data.f2 = Integer.parseInt (dataTmp [2]); return data;}})

Set Timestamps and maximum delay

SingleOutputStreamOperator orderDSWithWatemark=formatData .assignTimestampsAndWatermarks (/ / set watermark watemark = maximum event time-maximum delay or disorder time WatermarkStrategy.forBoundedOutOfOrderness (Duration.ofSeconds (3)) / specify maximum maxOutOfOrderness disorder time, that is, maximum delay time / disorder time. WithTimestampAssigner ((data,timestamp)-> Long.parseLong (DateUtil.dateToUTC (Watermarks)) * 1000) / / time is in milliseconds)

Set window size and processing logic

SingleOutputStreamOperator result=orderDSWithWatemark.keyBy (one-> one.f1) .window (TumblingEventTimeWindows.of (Time.seconds (10)) / / set window size / / .allowedLateness (Time.seconds (1)) / / delay processing time / / .sideOutputL ateData (lateOutputTag) / / side output .reduce (new ReduceFunction () {/ / processing logic private static final long serialVersionUID =-6695049408336015245L @ Override public Tuple3 reduce (Tuple3 value1, Tuple3 value2) throws Exception {Tuple3 data = new Tuple3 (); data.f0 = value2.f0; data.f1 = value1.f1; data.f2 = value1.f2 + value2.f2; System.out.println (data); return data;}}); result.print (Scroll event time); env.execute (); Summary

Time and water level line are difficult to understand and important concepts in flink, I also have little knowledge, and then slowly deepen in the process of using, the basic logic is to establish their own time tag for the data, and complete the collection, calculation and output of the data within the event through the time range (window) and data delay, so as to complete a more accurate real-time event data calculation.

Technology is a presentation of requirements, the basic nature of overlap, programming language, technical framework are, the most important details of the optimization and the overall use of simple, stable and powerful functions.

At this point, I believe you have a deeper understanding of "flink1.2 version time, introduction and usage of water level line". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report