What is the multi-partition watermark mechanism of kafka? 07/16 Update SLTechnology News&Howtos

What is the multi-partition watermark mechanism of kafka?

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about the multi-partition watermark mechanism of kafka, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.

The background of watermark dependency, which can be:

Event time, you must understand watermark based on the concept of event time.

Watermark can be understood more vividly as time series heartbeat, drive flow, rather than delay time, such as 6s, this is not watermark.

Watermark is like a heartbeat, driving the stream to process based on event time. Watermark is similar to a sequential heartbeat because it carries a timestamp t. Watermark (t) means that the current event time of the operator has reached time t, and any event with a timestamp less than t will not be sent again, that is, an event with a timestamp less than t should be discarded.

The above can be said to be for single parallelism of the flow, there is only one pipeline, how can not play the color. Watermark is added at the source or other operators of each parallelism, and then flows forward. If there is no shuffle in the streaming program, then there is nothing to say. Each individual instance is done separately, which is to be understood in conjunction with the previous article:

Talk about Flink's runtime in combination with Spark

If there is a shuffle, that is, if there are multiple inputs in an operator, the current event time will be the minimum event time.

Kafkasource

When kafka is used as a data source, if you consume multiple topic or multiple partitions, then because partition consumption is carried out in parallel, it will break the data nature of the data in each partition, which is determined by the nature of the client, unless you are a consumer corresponding to a partition. In this case, you can use the

Kafka-partition-aware watermark generator, which generates watermark for each partition within each kafka consumer, and eventually the watermark for each partition is merged like Stream shuffle watermark's merge mechanism.

Code example

FlinkKafkaConsumer09 kafkaSource = new FlinkKafkaConsumer09 ("myTopic", schema, props)

KafkaSource.assignTimestampsAndWatermarks (new AscendingTimestampExtractor () {

@ Override

Public long extractAscendingTimestamp (MyType element) {

Return element.eventTimestamp ()

}

});

DataStream stream = env.addSource (kafkaSource)

After reading the above, do you have any further understanding of kafka's multi-partition watermark mechanism? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.