How to understand Flink Relational API 07/09 Update SLTechnology News&Howtos

How to understand Flink Relational API

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces you how to understand Flink relational API, the content is very detailed, interested friends can refer to, hope to be helpful to you.

Before coming into contact with relational API, users usually use DataStream and DataSet API to write Flink programs, which all provide rich processing power. Take DataStream as an example, it has the following advantages:

Expressive flow processing, including but not limited to: transforming data, updating status, defining windows, aggregations, event time semantics, statefulness and ensuring correctness, etc.

Highly customized window logic: allocators, triggers, ejectors, allowed delays, etc.

Asynchronous Icano interface to improve connectivity with external systems

ProcessFunction gives users access to low-level operations such as timestamps and timers.

But it also has some barriers to use that make it not suitable for all users:

Writing DataStream programs is not always easy: stream processing technology is developing rapidly, and some concepts emerge one after another, such as time, status, windows, etc.

Requires special knowledge and skills: continuous stream computing applications require special requirements and Java/Scala programming experience

Users want to focus more on their business logic, so Flink provides a more expressive API-- relational API. Relational API has many benefits:

It is declarative, users only need to tell them what they want, the system decides how to calculate, and users do not have to specify specific implementation details.

Queries can be optimized and executed efficiently, while UDF in the underlying API is difficult to optimize and requires manual tuning

The general public (especially those in the field of data analysis) are far more familiar with SQL than specific programming languages.

Relational API is actually a general term for Table API and SQL API:

Table API: provides Java&Scala SDK with an API similar to the LINQ (language Integrated query) pattern (since version 0.9.0)

SQL API: support for standard SQL (since version 1.1.0)

As a unified API layer, relational API can not only query and generate a limited result set on the table of Batch schema, but also run and produce the result flow continuously on the table of Streaming schema, and the queries on the tables of both schemas have the same syntax and semantics. The most important concept is that Table,Table is closely integrated with DataSet and DataStream. Both DataSet and DataStream can be easily converted into Table, and it is also easy to convert back. The following code snippet shows an example of writing a Flink program using relational API:

Val tEnv = TableEnvironment.getTableEnvironment (env)

/ / configure the data source

Val customerSource = CsvTableSource.builder ()

.path ("/ path/to/customer_data.csv")

Field ("name", Types.STRING) .field ("prefs", Types.STRING)

.build ()

/ / Register the data source as a Table

TEnv.registerTableSource ("cust", customerSource)

/ / define your table program (Table API and SQL API can be mixed in a Flink program)

Val table = tEnv.scan ("cust") .select ('name.lowerCase (), myParser (' prefs))

Val table = tEnv.sql ("SELECT LOWER (name), myParser (prefs) FROM cust")

/ / convert to DataStraem

Val ds: DataStream [Customer] = table.toDataStream [Customer]

The relational API architecture is based on the basic DataStream and DataSet API, and its overall hierarchical relationship is shown in the following figure:

They provide an equivalent set of features and can be mixed in the same program, both tightly integrated with Flink's core API. From the image above, there are two kinds of API in the upper layer and two basic (DataSet, DataStream) API as the back end. Does this mean the transformation path of the four combinations at the time of implementation? In fact, Flink does not implement transformation, SQL parsing, execution plan generation, optimization and other operations on its own, it transfers some "bad" tasks to Apache Calcite. The overall structure is shown below:

Apache Calcite is a SQL parsing and query optimization framework (this definition is from the perspective of Flink concerns, Calcite is officially defined as a dynamic data management framework), has been selected by many projects to parse and optimize SQL queries, such as Drill, Hive, Kylin and so on.

Let's interpret the architecture diagram above. From the top, we can see that some information related to Table,Table, such as schema, data fields and types, can be created from DataSet, DataStream, Table Source and other channels. Information such as schema, data fields and types are registered and stored in Calcite Catalog. This information will provide metadata for Table & SQL API. Moving on, the query built by Table API and SQL will be translated into a common logical plan representation, which will be used as input to the Calcite optimizer. The optimizer combines logical plans with specific back-end (DataSet, DataStream) rules for translation and optimization, resulting in different plans. The plan is to generate specific back-end programs through the code generator. The execution of the back-end program will return DataSet or DataStream.

This architecture diagram shows the overall architecture of Flink relational API and is the basis for our subsequent analysis of this module.

On how to understand Flink relational API to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.