Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to understand watermark

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about how to understand watermark. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

There are often comments from ball friends or Wechat group friends that they don't understand what's going on with watermark, so today the editor is going to post a detailed explanation.

First of all, take window-based computing. Window size size and sliding interval slide are processed based on the time dimension. For example, Spark Streaming is based on the processing time, that is, the local time of the machine where the task is processed. Using this time to process data, we naturally can not pay attention to whether events are out of order in the time dimension and whether they are lagging data. Then in order to ensure the order of the data and the processing lag data, the processing time can not be used for processing.

The lucky thing is that when we collect data, we often give the data a collection time, so if the window size size and sliding interval slide are based on this time, then we can sense whether the events are ordered and lagged in the time dimension. In this case, watermark is introduced. One of its functions is to drive the flow calculation forward, and the other is to deal with delayed data as a basis, that is, how much delay data can be allowed.

1. Time concept

In terms of the concept of time, there are three concepts of time for streaming programs:

Processing time

Injection time

Event time

Among them, the injection time can be regarded as a special form of event time, but it should be noted that the injection time can not deal with disordered events and lag events, so the watermark mechanism can not be used naturally.

two。 Event time and watermark

A stream processor that supports event time needs a way to measure the progress of event time. For example, the operation of an hour window windows needs to be notified when the event time has exceeded an hour, so that the operator can close the window in progress.

For example, in a program, the current event time of the operator may lag slightly behind the processing time (caused by the delay in event transmission). In addition, a streaming program may take only a few seconds to process data with an event span of several weeks, such as by quickly processing some historical data that has been cached in kafka topic (or another message queue).

Watermark is used in Flink to measure the progress of event time. Watermark, as part of the data flow, carries a timestamp t. A Watermark (t) declares that the event time has arrived t, which means that there is no event time T1

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report