Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to analyze the Application and implementation of Flink window

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article introduces you how to analyze the application and implementation of Flink window, the content is very detailed, interested friends can refer to, hope to be helpful to you.

Overall thinking and learning path

When we come across a new technology, how should we learn and apply it? In my personal opinion, there is such a learning path, it should be divided into two parts: application and implementation. First of all, we should start with its application, and then go deep into its implementation.

The application is mainly divided into three parts, first of all, we should understand its application scenarios, such as some window usage scenarios. Then, let's take a closer look at its programming interface, and finally delve into some of its abstract concepts. Because of a framework or a technology, there must be programming interfaces and abstract concepts to make up its programming model. We can familiarize ourselves with its application by looking at the document. After we have a preliminary understanding of these three parts of the application, we can understand some of its implementation by reading the code.

The implementation part is also divided into three stages, first of all, from the workflow, you can continue to drill down through the API level to understand its workflow. Next is its overall design pattern, usually for some frameworks, if we can build a more mature ecology, there must be some unique features in the design pattern, so that it has a good expansibility. The last part is its data structure and algorithm, because in order to deal with large amounts of data and achieve high performance, its data structure and algorithm must be unique. We can do some in-depth understanding.

The above is probably a path for us to learn. From the perspective of implementation, we can feed back to the application, usually in the application, there will be some confusion when we first come into contact with a certain concept. When we have some understanding of the implementation, these doubts in the application will be easily solved.

Why should we care about implementation?

For example:

After reading this example, we may be a little confused:

Why doesn't ReduceFunction calculate the aggregate value of each key? When the key cardinality is very large, how to effectively trigger each key window calculation? How are the intermediate results of window calculations stored and when are they cleaned up? How does window computing tolerate late data?

When you understand the implementation part and then come back to the application part, you may have an enlightening feeling.

Application scenario and programming Model

Typical architecture of real-time data warehouse

The first simplest architecture of ■ is that the Kafka data of the ODS layer is written to the Kafka of the DW layer after being processed by the ETL of Flink, and then aggregated into the MySQL of the ADS layer through Flink aggregation to do such a real-time report presentation.

Disadvantages: because MySQL stores limited data, the time granularity of aggregation should not be too fine, and the combination of dimensions should not be too many.

The second architecture of ■ introduces the OLAP engine compared to the first, and it does not use Flink to do aggregation, but through Druid's Rollup to do aggregation.

Cons: because Druid is a storage and query engine, not a computing engine. When the amount of data is huge, such as tens of billions or hundreds of billions of data every day, it will aggravate the import pressure of Druid.

The third architecture of ■, on the basis of the second, uses Flink to do aggregation calculation to write to Kafka, and finally to Druid.

Disadvantages: when the window granularity is long, the resulting output will be delayed.

The fourth architecture of ■ combines Flink aggregation and Druid Rollup on the third basis. Flink can do mild aggregation, and Druid can do Rollup aggregation. The advantage is that Druid can see the aggregate results of Flink in real time.

Window application scenario

■ aggregate statistics: read data from Kafka, do aggregate calculation for 1 or 5 minutes according to different dimensions, and then write the results to MySQL or Druid.

■ record merging: merge multiple Kafka data sources within a certain window, and write the results to ES. For example, some behavior data of users can be merged for each user to reduce the amount of data written downstream and the writing pressure of ES.

■ dual-stream join: for dual-stream join scenarios, if full join is used, the cost will be very high. Therefore, we should consider doing join based on windows.

Window abstract concept

■ TimestampAssigner: timestamp allocator, if we are using EventTime time semantics, we need to tell the Flink framework through TimestampAssigner which field of the element is the event time for later window calculation.

■ KeySelector: Key selector, which is used to tell the Flink framework which dimensions are aggregated.

■ WindowAssigner: a window allocator that determines which data is allocated to which windows.

■ State: state, which is used to store the elements within the window, or, if there is an AggregateFunction, the intermediate result of incremental aggregation.

■ AggregateFunction (optional): incremental aggregate function, mainly used to do incremental calculation of the window to reduce the storage pressure of State in the window.

■ Trigger: trigger to determine when to trigger the calculation of the window.

■ Evictor (optional): the driver is used to filter data that meets the expulsion criteria before (after) the window function is calculated.

■ WindowFunction: window function, which is used to calculate the data in the window.

■ Collector: collector, which is used to send the calculation results of the window downstream.

The red parts in the image above are all modules that can be customized. By customizing the combination of these modules, we can achieve advanced window applications. At the same time, Flink also provides some built-in implementations that can be used to do some simple applications.

Window programming interface

Stream .assignTimestampsAndWatermarks (…)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report