Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the running process of Spark Streaming?

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "how the operation process of Spark Streaming is". Many people will encounter such a dilemma in the operation of actual cases, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Understand spark streaming through a simple example below

Object OnlineForeachRDD2DB {def main (args: Array [String]) {/ * step 1: create the configuration object SparkConf of Spark and set the configuration information of the run time of the Spark program. * for example, use setMaster to set the URL of the Master of the Spark cluster to which the program is to be linked. If it is set to local, it means that the Spark program runs locally. It is especially suitable for beginners whose machine configuration conditions are very poor (for example, * only 1G of memory) * * / val conf = new SparkConf () / / create SparkConf object conf.setAppName ("OnlineForeachRDD") / / set the name of the application. You can see the name / / conf.setMaster ("spark://Master:7077") / / in the monitoring interface where the program is running. The program sets the batchDuration interval in the Spark cluster conf.setMaster ("local [6]") / / to control the frequency of Job generation and to create the entry for Spark Streaming execution val ssc = new StreamingContext (conf, Seconds (5) val lines = ssc.socketTextStream ("Master", 9999) val words = lines.flatMap (_ .split (")) val wordCounts = words.map (x = > (x) ReduceByKey (_ + _) wordCounts.foreachRDD {rdd = > rdd.foreachPartition {partitionOfRecords = > {/ / ConnectionPool is a static, lazily initialized pool of connections val connection = ConnectionPool.getConnection () partitionOfRecords.foreach (record = > {val sql = "insert into streaming_itemcount (item,count) values ('" + record._1 + "," + record._2 + ")" val stmt = connection.createStatement () Stmt.executeUpdate (sql) }) ConnectionPool.returnConnection (connection) / / return to the pool for future reuse} / * inside StreamingContext calls the start method, it actually starts the Start method of JobScheduler to loop the message, constructs JobGenerator and ReceiverTacker inside the start of JobScheduler *, and calls the start methods of JobGenerator and ReceiverTacker: * 1 After JobGenerator starts, it will continue to generate Job * 2 Job * 2 ReceiverTracker first launch Receiver in Spark Cluster (actually start ReceiverSupervisor in Executor). After Receiver receives * data, it will store the data to Executor through ReceiverSupervisor and send the Metadata information of the data to ReceiverTracker in Driver. In ReceiverTracker * the received metadata information will be managed through ReceivedBlockTracker * each BatchInterval will generate a specific Job. In fact, the Job here is not the Job referred to in Spark Core, it is only the DAG of RDD * generated based on DStreamGraph. From the Java point of view, it is equivalent to an instance of Runnable interface. To run Job, you need to submit it to JobScheduler. Why use a thread pool to find a separate thread in JobScheduler to submit the Job to the cluster to run (actually the RDD-based Action in the thread triggers the real job to run)? * 1, jobs are constantly generated, so in order to improve efficiency, we need thread pools; this is similar to executing Task through thread pools in Executor; * 2, it is possible to set FAIR fair scheduling for Job, and multithreading support is also needed at this time; * * / ssc.start () ssc.awaitTermination ()}

This is the end of the content of "how Spark Streaming runs". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report