Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of programming Model in Flink

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article shares with you the content of sample analysis of programming models in Flink. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

Flink is an open source big data streaming framework, it can batch and stream processing at the same time, with fault tolerance, high throughput, low latency and other advantages.

Dataset type:

Infinite data set: infinite continuous integrated data set

Bounded dataset: a limited set of data that will not change

Common infinite datasets are:

Real-time interactive data between user and client

Log generated in real time by the application

Real-time transactions in financial markets

...

What are the data operation models?

Streaming: computing runs continuously as long as the data is in production

Batch processing: runs calculations within a predefined time, freeing up computer resources when completed

Flink can deal with bounded data sets as well as unbounded data sets. It can process data in streams or in batches.

What is Flink?

From bottom to top:

1. Deployment: Flink supports local operation, can run on independent clusters or clusters managed by YARN or Mesos, or can be deployed on the cloud. 2. Running: the core of Flink is the distributed streaming data engine, which means that data is processed one event at a time. 3 、 API:DataStream 、 DataSet 、 Table 、 SQL API . 4. Extension library: Flink also includes a dedicated code base for complex event handling, machine learning, graphics processing and Apache Storm compatibility.

Flink data flow programming model

Level of abstraction Flink provides different levels of abstraction to develop streaming or batch applications

The bottom layer provides a stateful flow, which embeds the DataStream API through procedural functions, which allows users to freely handle events from one or more stream data and use consistent, fault-tolerant states. In addition, the user can register the event time and handle the event callback, so that the program can implement complex calculations.

DataStream / DataSet API is the core API provided by Flink. DataSet deals with bounded data sets and DataStream deals with bounded or unbounded data streams. Users can convert / calculate the data through various methods (map / flatmap / window / keyby / sum / max / min / avg / join, etc.).

Table API is a table-centric declarative DSL in which tables can change dynamically (when expressing stream data). Table API provides operations such as select, project, join, group-by, aggregate, etc., but is more concise to use (less code).

You can switch seamlessly between tables and DataStream/DataSet, and allow programs to mix Table API with DataStream and DataSet.

The highest level of abstraction provided by Flink is SQL. This layer of abstraction is similar to Table API in syntax and expressiveness, but represents the program in the form of SQL query expressions. The SQL abstraction interacts closely with Table API, while SQL queries can be executed directly on tables defined by Table API.

Flink programs and data flow structure

The structure of the Flink application is shown in the figure above:

Source: data source, there are about four types of Flink source in streaming and batch processing: source based on local collections, source based on files, source based on network sockets, and custom source. Common custom source are Apache kafka, Amazon Kinesis Streams, RabbitMQ, Twitter Streaming API, Apache NiFi and so on. Of course, you can also define your own source.

Transformation: various operations of data conversion, such as Map / FlatMap / Filter / KeyBy / Reduce / Fold / Aggregations / Window / WindowAll / Union / Window join / Split / Select / Project, etc. There are many operations, which can convert the data into the data you want.

Sink: receiver, the location where Flink will send the converted data, you may need to store it. The common Sink types of Flink are as follows: write file, print out, write socket, custom sink. Common custom sink are Apache kafka, RabbitMQ, MySQL, ElasticSearch, Apache Cassandra, Hadoop FileSystem, etc. Similarly, you can also define your own sink.

Thank you for reading! This is the end of this article on "sample Analysis of programming models in Flink". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report