Lesson 84: in-depth Analysis of StreamingContext, DStream, Receiver 07/13 Update SLTechnology News&Howtos

Lesson 84: in-depth Analysis of StreamingContext, DStream, Receiver

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

This lesson is divided into four parts, the first part of the StreamingContext function and source code analysis; the second part of the DStream function and source code analysis; the third part of the Receiver function and source code analysis; the last part of the StreamingContext, DStream, Receiver combined to analyze its process.

First, StreamingContext function and source code analysis:

1. Through the Spark Streaming object jssc, create the main entry of the application, connect it to Driver, and receive the data service port 9999 to write the source data.

2. The main functions of Spark Streaming are:

The entrance to the main program

Various ways to create DStream are provided to receive various incoming data sources (for example: Kafka, Flume, Twitter, ZeroMQ, simple TCP sockets, etc.)

When you instantiate a Spark Streaming object through the constructor, you can specify master URL, appName, or pass in a SparkConf configuration object, or a SparkContext object that has been created

Pass the received data stream into the DStreams object

Start the current application's flow computing framework through the start method of the Spark Streaming object instance or end the current application's flow computing framework through the stop method

Second, DStream function and source code analysis:

1. DStream is the template of RDD, DStream is abstract, and RDD is abstract.

2. The specific implementation subclasses of DStream are shown below:

3. Take the socketTextSteam method of the StreamingContext instance as an example, and the result of its execution returns the DStream object instance. The process of calling the source code is shown below:

Socket.getInputStream acquires data, and while loops to store data (memory, disk)

Third, Receiver function and source code analysis:

1. Receiver represents the input of data and receives externally input data, such as fetching data from Kafka

2. Receiver runs on the Worker node

3. When Receiver grabs the data on the Kafka distributed message framework on the Worker node, the concrete implementation class is KafkaReceiver

4. Receiver is an abstract class, and the implementation subclass of grabbing data is shown below:

5. If none of the above implementation classes can meet your requirements, you can define the Receiver class yourself, and you only need to inherit the Receiver abstract class to implement the business requirements of your subclasses.

Fourth, StreamingContext, DStream, Receiver combined with process analysis:

(1) inputStream represents the data input stream (such as Socket, Kafka, Flume, etc.)

(2) Transformation represents a series of operations on data, such as flatMap, map, etc.

(3) outputStream represents the output of data, such as the println method in wordCount:

After the data flows in, it is finally executed based on RDD. When processing the incoming data, DStream will Transformation,StreamingContext and generate DStreamGraph according to Transformation, and DStreamGraph is the template of DAG, which is hosted by the framework. When we specify a time interval, the Spark Streaming framework automatically triggers Job, so when developers write Spark code (such as flatMap, collect, print), it does not cause job to run. Job runs as

Automatically generated by the Spark Streaming framework.

Summary:

Using Spark Streaming can handle various data source types, such as database, HDFS, server log log, network flow, which is more powerful than you can imagine, but most of the time you can't use it. The real reason is that you don't know about Spark and spark streaming itself.

Note:

Source: DT_ big data DreamWorks (IMF legendary Action Top Secret course)

For more private content, please follow the Wechat official account: DT_Spark

If you are interested in big data Spark, you can listen to the Spark permanent free open course offered by teacher Wang Jialin at 20:00 every evening, address YY room number: 68917580

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.