In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
The contents of this issue:
Decryption Spark Streaming operation mechanism
Decrypt Spark Streaming framework
Spark Streaming is a subframework of Spark, but it is more like an application running on Spark Core. Spark Streaming runs a lot of job at startup, and the job cooperate with each other.
Job revolves around two aspects:
1. Job that calculates the input data stream
two。 The framework itself needs to run the Job, such as Receiver startup.
SparkStreaming itself is a very complex application, and if you know SparkStreaming like the back of your hand, it will be very easy to write any application.
Let's take a look at the architecture diagram of Spark:
There are four popular frameworks on Spark core: SparkSQL, stream computing, machine learning, and graph computing.
In addition to stream computing, most of the other frameworks encapsulate some algorithms or interfaces on SparkCore. For example, SparkSQL encapsulates the SQL syntax, and the main function is to parse the SQL syntax into the underlying API of SparkCore. Machine learning encapsulates a lot of mathematical vectors and algorithms. There are no major updates to GraphX at the moment.
Only a thorough understanding of SparkStreaming can be of great help to us in writing applications.
When based on Spark Core, it is based on RDD programming, while based on SparkStreaming is based on DStream programming. DStream adds a time dimension to RDD:
Private [streaming] var generatedRDDs = new HashMap [Time, RDD [T]] ()
The compute of DStream needs to be passed a time parameter, and the corresponding RDD is obtained through time, and then the RDD is calculated.
/ * * Method that generates a RDD for the given time * / def compute (validTime: Time): option [RDD [T]]
When we look at the running log of SparkStreaming, we can see that it is almost consistent with the operation of RDD:
When SparkStreaming Job is running, it first generates the Graph of DStream and converts DStream Graph to RDD Graph at a specific time. Then run RDD's job. As shown below:
If we regard RDD as a spatial dimension, then DStream adds the spatio-temporal dimension of time dimension to RDD.
We can imagine that in a two-dimensional space, the X axis is time, and the Y axis is the operation on RDD, which is the logic of the whole job formed by the dependency of so-called RDD. Over time, job instances are generated one by one.
Therefore, SparkStreaming needs to provide the following functions:
Template DStreamGraph generated by RDD Graph is required
Need a time-based job controller
InputStream and OutputStream are required to represent the input and output of data
To submit a specific job to Spark Cluster, because SparkStreaming is constantly running job, it is more prone to problems, so fault tolerance is essential (the fault tolerance of a single job is based on Sparkcore, and SparkStreaming also provides the fault tolerance of its own framework).
Transaction processing, the data must be processed, and will only be processed once. That is to say, every time you process the data, you should know the boundaries of the data. Especially in the case of a collapse.
Note:
1. DT big data DreamWorks Wechat official account DT_Spark
2. IMF 8: 00 p.m. Big data actual combat YY live broadcast channel number: 68917580
3. Sina Weibo: http://www.weibo.com/ilovepains
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.