What should the DataStream of flink learn? 04/26 Update SLTechnology News&Howtos

What should the DataStream of flink learn?

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail what you should learn about the DataStream of flink. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.

As a popular stream processing engine at present, it is necessary to learn flink well, but many people like to ask whether flink will surpass spark. I don't think it will in the short term, and the field of spark batch processing is very efficient and reliable. However, as developers of big data, spark and flink do not have to choose one of them, but both have to be mastered, so there is no need to ask Langjian, it is meaningless to skip spark and learn flink directly.

The processing model of flink can be divided into event-driven processing model and time-based processing model, and the time-based processing model can be divided into event-based time and processing time (injection time is a special event time).

1.runtime

We must first have a good understanding of flink's runtime mechanism, topology related issues such as parallelism setting, task partition principle, task chain principle, slot sharing mechanism and so on.

About the runtime of flink, you can refer to the article in front of the wave tip.

Talk about Flink's runtime in combination with Spark

To understand this, you can actually refer to the difference between the running model of Spark Streaming and Structured Steaming and flink. Can refer to

Spark Streaming VS Flink

Structured Streaming VS Flink

This makes it easy for us to understand the internal operation principle of flink, data flow mode, shuffle mechanism, state management and so on, which is helpful to data tilt tuning, parallelism setting, monitoring and alarm system design. Finally, we can make a more stable application.

two。 Event handling

The event-driven processing model, which is typical of real-time processing, is not as good as spark's streaming engine because it is microbatch processing based on processing time (of course, structured Streaming also supports a processing model based on event time).

For the event handling of flink, in addition to the runtime mentioned above, we also need to understand the event time mechanism of flink datastream, watermark generator, parallelism principle, shuffle partition, data flow principle, state management and timeout key state deletion and so on. This makes it easy to understand the internal flow of data in the flink runtime, the state in the flink task storage process, and then data skew or not, state expired deletion and mainly data tilt and state management, which is tuned by the flink task.

Of course, flink still has a lot of coquettish operations, such as the following articles:

Flink's magical shunt-sideoutput-can realize data shunt processing.

Flink iterative operation last text-iterative flow-iterative computation processing.

Flink aggregates the entire window at once-ProcessWindowFunction-

That is, the underlying api, such as process, can make a more elaborate walk on the state and time, and can even implement its own session window.

Flink Asynchronous IO lesson 1-Asynchronous IO can achieve more efficient dimension table join operations.

These tips are still necessary to master.

3. Window function

It is mainly divided into the window function based on event time and the window function based on processing time. The window function is divided into session window function, sliding time window and scrolling time window. The more coquettish operation is the lower-level window handling functions and window handling mechanisms, namely ProcessFunction and ProcessWindowFunction, so that we can get a deeper level of state and time.

The rest is the join operation of the window:

Scroll window join

Sliding window join

Session window join

Inerval join

If there is an event time, there must be event delay processing. How to deal with the window function delay event based on the event time is a headache. Of course, for coding, delay events can be well handled, such as combining side output, watermark, delay time and so on.

4. Marginal ecology

The commonly used flink edge ecology, the data source is kafka, batch processing is data on hdfs, and then sink is hbase,mysql,hbase,mongodb.

5. Realization case

Below, Longtip shares the relevant source code on the planet, interested players can refer to and read:

Org.datastream.KafkaProducer

This class is mainly used to produce test data.

Watermark, custom trigger window handling mechanism can be found in the case codes in the following two directories:

Org.datastream.trigger

Org.datastream.watermark

Org.datastream.windows

For join operations, flink does not support datastream and static dataset join operations. For ordinary window join, please refer to the following source code

If you want to join with static datasets, you can achieve synchronous and asynchronous join operations. Langjian implements synchronous flatmap-based and asynchronous IO-based join operations, which basically meet the needs of enterprise development.

Sideoutput side output, which can achieve the function of data diversion, is also very easy to use, mainly used when dealing with delayed data and ordinary data diversion.

Iterative output, mainly the iterative output of batch processing and stream processing. There are a total of three code cases, and the machine actually learns which lib packages have more.

Source is mainly kafka,sink to implement three kinds of redis,mysql,hbase, these three are more commonly used.

There are also more important configurations, such as checkpoint, timestamp allocator, event time, processing time, automatic fault recovery and other practical requirements.

This is the end of this article on "what should flink's DataStream learn?". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.