In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Today, I will talk to you about how to analyze Spark Streaming in Spark. Many people may not know much about it. In order to make you understand better, the editor has summarized the following for you. I hope you can get something according to this article.
Overview
Spark Streaming is a scalable, high-throughput, fault-tolerant real-time data stream processing engine of Spark API. Spark can obtain data from Kafka, Flume, Kinesis or TCP inputs, and then use complex expressions such as map,reduce,join and window to calculate the data. The calculated data can be pushed to the file system, database, and real-time dashboard. In addition, you can also use Spark ML and graph computing to handle real-time data streams.
After receiving the real-time data, Spark Streaming cuts them in batches, and then gives them to Spark for batch processing.
Spark Streaming provides a high-level abstract DStream for discretized data streams, and all incoming data streams are processed as DStreams. Internally, DStream is a sequential RDD.
Start quickly
The first example is how to calculate the number of occurrences of words from TCP input
First, we create a JavaStreamingContext object, which is the main entry for all Streaming functions, and then create a StreamingContext object with two threads, batch every 1 second.
Import org.apache.spark.*; import org.apache.spark.api.java.function.*; import org.apache.spark.streaming.*; import org.apache.spark.streaming.api.java.*; import scala.Tuple2; SparkConf conf = new SparkConf () .setMaster ("local [2]") .setAppName ("NetworkWordCount"); JavaStreamingContext jssc = new JavaStreamingContext (conf, Durations.seconds (1))
Create a TCP data source that listens on local 9999
JavaReceiverInputDStream lines = jssc.socketTextStream ("localhost", 9999)
We cut the received data according to the spaces.
JavaDStream words = lines.flatMap (x-> Arrays.asList (x.split (")) .iterator ())
Count the words
JavaPairDStream pairs = words.mapToPair (s-> new Tuple2 (s, 1)); JavaPairDStream wordCounts = pairs.reduceByKey ((i1, i2)-> i1 + i2); wordCounts.print ()
The string is flattened-> mapped-> deduplicated, and the print function is called to print the data to the console.
Jssc.start (); / / Start the computation jssc.awaitTermination (); / / Wait for the computation to terminate
Finally, start the whole calculation process
In order to complete this experiment, we also need to use nc as Server for cooperation.
Nc-lk 9999
Spark provides examples that you can use. / bin/run-example streaming.JavaNetworkWordCount localhost 9999 to experience WordCount
After reading the above, do you have any further understanding of how to analyze Spark Streaming in Spark? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.