Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Three frameworks of streaming Computing: Storm, Spark and Flink

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

We know that big data's computing model is mainly divided into batch computing (batch computing), streaming computing (stream computing), interactive computing (interactive computing), graph computing (graph computing) and so on. Among them, streaming computing and batch computing are the two main big data computing modes, which are suitable for different big data application scenarios.

At present, there are three mainstream streaming computing frameworks: Storm, Spark Streaming and Flink. The basic principles are as follows:

Apache Storm

In Storm, we need to design a real-time computing structure, which we call topology. The topology is then submitted to the cluster, where the master node (master node) is responsible for assigning code to the work node (worker node) and the worker node is responsible for executing the code. In a topology, there are two roles, spout and bolt. Data is passed between spouts, which sends the data flow as a tuple tuple, while bolt is responsible for transforming the data flow.

Apache Spark

Spark Streaming, an extension of the core Spark API, does not process one data stream at a time as Storm does. Instead, it segments the data flow at intervals before processing it. Spark is an abstraction for continuous data streams, which we call DStream (Discretized Stream). DStream is a small batch RDD (flexible distributed dataset), while RDD is a distributed dataset, which can be converted by arbitrary functions and sliding data windows (window computing) to achieve parallel operations.

Apache Flink

A computing framework for stream data + batch data. Batch data is regarded as a special case of stream data with low latency (millisecond) and can ensure that the message transmission will not be lost or repeated.

Flink creatively unifies streaming and batch processing. When viewed as a stream, the input data stream is * *, while batch processing is treated as a special stream, but its input data stream is defined as bounded. The Flink program consists of two basic building blocks, Stream and Transformation, where Stream is an intermediate result data and Transformation is an operation, which calculates one or more input Stream and outputs one or more result Stream.

The comparison of the three computing frameworks is as follows:

Reference article:

Streaming Big Data: Storm, Spark and Samza

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report