Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the code framework of structured Kafka sql

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

What this article shares with you is about the code framework of structured Kafka sql. The editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article. Let's take a look at it.

A typical application of structured flows is to read kafka streams continuously. The implementation mechanism starts with the readStream of SparkSession, and readStream is DataStreamReader:

Def readStream: DataStreamReader = new DataStreamReader (self)

Let's start with DataStreamReader. As you can imagine, eventually a RDD must be generated to continuously read the kafka stream data.

Example:

/ / Create DataFrame representing the stream of input lines from connection to localhost:9999val lines = spark.readStream .format ("socket") .option ("host", "localhost") .option ("port", 9999) .load ()

There are two steps: find TableProvider;, find SupportRead, and generate StreamingRelationV2.

Finally, calling Dataset.ofRows with StreamingRelationV2 returns DataFrame,DataFrame as Dataset [Row].

Let's first take a look at what the TableProvider interface and the SupportRead interface are.

TableProvider

The TableProvider interface could not be found where it is defined.

KafkaSourceRDD

First take a look at the kafkaSourceRDD class, which is the basic class, the most basic RDD for reading kafka data. The input parameter contains an offsetRange, which represents the interval range for reading kafka data. If it is Kafka.lastest, it can indicate that the kafka is read permanently.

Since it is RDD, then the most important method is the compute method, the code is not parsed very simple, is to use the API of Kafka to read the data of the kafka partition to form RDD.

KafkaSource

As the name implies, KafkaSource is the reader of Kafka.

The parent class of KafkaSource is Source, and the most important methods are getOffset and getBatch.

GetBatch returns DataFrame, so how does getBatch return DataFrame? Look at the code to know that the original is to create KafkaSourceRDD to achieve the purpose of generating DataFrame. So you can think of KafkaSource as a form of encapsulation of KafkaSourceRDD.

KafkaSourceProvider

The provider class for all Kafka readers and writers . This class is used to generate a variety of Kafka readers and writers, it is important to take a look at the definition of this class:

Private [kafka010] class KafkaSourceProvider extends DataSourceRegister

With StreamSourceProvider

With StreamSinkProvider

With RelationProvider

With CreatableRelationProvider

With TableProvider

With Logging

Inherits a lot of features or interfaces. For example: StreamSourceProvider, TableProvider, RelationProvider and so on. Let's take a look at the features related to reading here, but not those related to writing (for the same reason).

(1) createSource

The createSource method returns Source. You can see that the code actually returns KafkaSource,KafkaSource, which is mentioned earlier, so I won't talk about it here.

(2) createRelation

CreateRelation returns BaseRelation, but actually returns KafkaRelation.

KafkaRelation inherits BaseRelation, overrides the parent class's buildScan method, and the buildScan method returns KafkaSourceRDD as RDD [Row].

(3) KafkaTable

KafkaTable inherits Table and inherits the SupportsRead feature, which defines:

Class KafkaTable (includeHeaders: Boolean) extends Table with SupportsRead with SupportsWrite

It is difficult to sleep over and over again to see how to generate ContinuousStream, mainly the method toContinuousStream, and the returned ContinuousStream is KafkaContinuousStream.

(4) KafkaContinuousStream

KafkaContinuousStream inherits from ContinuousStream, specifically looking at the code, in the end, it all calls the API of Kafka to read the data, the only difference is the external manifestation.

The above is what the code framework of structured Kafka sql looks like, and the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report