Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to decrypt Spark Streaming

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "how to decrypt Spark Streaming". In daily operation, I believe many people have doubts about how to decrypt Spark Streaming. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the doubts of "how to decrypt Spark Streaming"! Next, please follow the editor to study!

1. Decrypt Spark Streaming Job architecture and operation mechanism

First, by running the example of online word statistics, observe the log information output by Spark Streaming on the console.

The following code listens for client connection requests on port 9999 and then keeps sending words to the client.

Start SocketServer first, and then start the program for counting words online in SparkStreaming, as follows

The running process is summarized as follows

1 ReceiverTracker will be started after the RecurringTimer StreamingContext is started. According to the batchDuration time specified during creation, the RecurringTimer timer will be started, and JobGenerator messages will be sent at intervals of Interval. JobGenerator, JobScheduler and BlockGenerator will be started.

2 RecevierSupervisorImpl ReceiverTracker receives the registration message from Receiver (Stream 0), and then the Receiver launches Receiver to receive the data.

3Jing socket Server connects to localhost:9999 and starts to receive data, and stores the received data to BlockManager through BlockGenerator.

4JobScheduler receives the JobGenerator messages sent periodically, then submits a Job,DStreamGraph to obtain data from ReceiverTracker to generate RDD,DAGScheduler scheduling Job execution, and asks TaskSchedulerImpl to send TaskSet to Executor for Executor to execute.

5 send the result to Driver,DAGScheduler and JbScheduler to print the Job completion and time-consuming information, and finally output the word count result in the console.

You can see that Job is constantly generated and running over time, so how is Job generated in Spark Streaming?

Inside the StreamingContext calling the start method, it actually starts the start method of JobScheduler to loop the message. Inside the start of JobScheduler, it constructs JobGenerator and ReceiverTracker, and calls the start methods of JobGenerator and ReceiverTracker.

1JobGenerator continues to generate Job according to batchDuration after startup.

2 after the Receiver is started in the Spark cluster (actually, the ReceiverSupervisor is started first in the Executor), after the Receiver receives the data, it will store the data into the BlockManager of the Executor through ReceiverSupervisor, and send the Metadata information of the data to the ReceiverTracker of the Driver, and manage the received metadata information through ReceivedBlockTracker within the ReceiverTracker.

Each BatchInterval produces a specific Job. In fact, the Job here is not the Job in SparkCore, but the DAG of RDD generated based on DStreamGraph. From the Java point of view, it is equivalent to an instance of Runnable interface. In this case, you need to submit it to JobScheduler to run Job, and use separate threads in the thread pool in JobScheduler.

To submit the Job to the cluster to run (in fact, the RDD-based Action triggers the running of the real job in the thread)

Why use thread pools?

1. Jobs are constantly generated, so in order to improve efficiency, we need thread pools. This is similar to executing Task through thread pool in Executor.

2. It is possible to set the FAIR fair scheduling mode of Job, which also requires multithreading support.

2. Decrypt Spark Streaming fault-tolerant architecture and running mechanism

Fault tolerance is divided into Driver level fault tolerance and Executor level fault tolerance.

Fault tolerance at the Executor level is specifically the security of receiving data and the security of task execution. In terms of receiving data security, one way is that Spark Streaming receives data by default to MEMORY_AND_DISK_2. In the memory of two machines, if the Executor on one machine dies, immediately switch to the Executor on the other machine. This method is generally very reliable and there is no switching time. Another way is WAL (Write Ahead Log). When the data arrives, the data is logged through the WAL mechanism, and if there is a problem, the data is recovered from the log record, and then the data is saved in Executor, and then replicated by other replicas, which has an impact on performance. Kafka storage is generally used in production environments, and Spark Streaming can be played back from Kafka when it receives data loss. In terms of security in task execution, we rely on the fault tolerance of RDD.

The fault tolerance at the Driver level is specifically the template generated by DAG, that is, the metadata information stored in DStreamGraph,RecevierTracker and the progress of Job stored in JobScheduler, as long as it is through checkpoint. Each Job is checkpoint before it is generated, and then checkpoint after Job generation. If something goes wrong, it will be recovered from the checkpoint.

At this point, the study on "how to decrypt Spark Streaming" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report