In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article is to share with you about the working principle of SparkStreaming in the introduction to Spark2.x. The editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.
The general meaning of the translation on the official website is as follows:
SparkStreaming is an extension of the core SparkApi, supporting scalable, high-throughput, fault-tolerant real-time data stream processing. Data can be obtained from many sources, such as Kafka, Flume, Kinesis, or TCP sockets, and can be processed using complex algorithms represented by high-level functions such as map, reduce, join, and window. Finally, the processed data can be pushed to the file system, database, and active dashboard. In fact, you can apply Spark's machine learning and graphics processing algorithms to data streams.
How it works: SparkStreaming accepts real-time input data streams and divides the data into batches, which are then processed by Spark engine to generate the final result stream in batches.
DStream is the basic abstraction provided by the SparkStreaming stream. It represents a continuous data stream, either an input data stream received from a source or a processed data stream generated by transforming the input stream. Internally, DStream is represented by a series of consecutive RDD, and RDD is Spark's abstraction of immutable distributed data sets. Each RDD in the DStream contains data from an interval, as shown in the following figure.
Any operation applied to the DStream is converted to an operation on the underlying RDD. For example, in the previous example of converting a row stream to a word, the flatMap operation is applied to each RDD in the row DStream to generate the RDD of the word DStream. This is shown in the following figure.
These underlying RDD transformations are calculated by the Spark engine. The DStream operation hides most of these details and provides developers with a more advanced API. These operations are discussed in detail in later sections.
Comparative Analysis of three streaming processing frameworks: SparkStreaming, Flink and Storm
SparkStreamingFlinkStorm Throughput
High throughput, high throughput and low throughput real-time performance
Second delay low delay, millisecond (100 millisecond) low delay, millisecond (tens of milliseconds) out of order, delay processing
None
Flink supports out-of-order and delay processing through warterMarker watermarking. This spark does not have none.
Guaranteed times
Exactly-onceexactly-onceat-least-once
Dynamic adjustment of parallelism
Fault tolerance is not supported
Checkpoint based on RDD
Checkpoint based on distributed Snapshot
Ack Mechanism based on Record record
This is how SparkStreaming works in the introduction to Spark2.x. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.