What are the Storm data flow models? 07/06 Update SLTechnology News&Howtos

What are the Storm data flow models?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what are the Storm data flow models". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's ideas to study and learn "what are the Storm data flow models"?

Storm is an open source real-time computing system that provides a series of basic elements for computing:

1 Topology

2 Stream

3 spout

4 bolt

When we submit our topology, once you submit your topology to your cluster, unless you show to stop the task

The topology in the middle of the cluster will always be running.

The computing task Topology is a diagram connected by different Spouts and bolts through data flow Stream. The following is a schematic diagram of the structure of Topology.

These include

1: Spout: message source in Strom, which is used to produce messages (data) for Topology. Generally, data is read from external data sources. In our real environment, we use kafka-Storm streaming interface, so the Spout we use is: kafkaSpout.

2 Bolt, a message handler in Storm, is used to process messages for Topology. Bolt can perform the following operations:

2.1: filterin

2.2: aggregation

2.3: query the database

Finally, Topology will be submitted to the Storm cluster to run, or you can stop the operation of topology by command and return the occupied resources to the Storm cluster.

Storm data flow model

The model of data flow is the abstraction of data in Storm. It is the meta-ancestor of time-unbounded tuple. In topology, Spout is the source of bolt.

Bolt is a consumer of Spout, responsible for Topology transmitting from a specific data source. Stream,bolt can accept any number of Stream inputs, and then process the data. If necessary, bolt can also send a new Stream to the next level of Bolt for processing.

The following is a data flow relationship between Spout and Bolt within Topology:

Each computing component (Spout and bolt) in topology has a degree of parallelism, which can be specified when creating Topology. Storm allocates threads corresponding to the number of parallelism in the cluster to execute this component at the same time.

So, there is a question: since there are multiple task threads to run for a Spout, or Bolt, how do you send tuple meta-ancestors between two components?

Storm provides several data flow distribution strategies to solve this problem. When Topology is defined, you need to specify for each bolt what kind of Stream to accept as its input.

Currently, there are seven kinds of Stream Grouping available in Storm

Shuffle Grouping 、

Fields Grouping 、

All Grouping 、

Global Grouping 、

Non Grouping 、

Direct Grouping 、

Local or shuffle grouping

A scenario that cannot be supported by Storm

If you read this, you can recall that when each of our business logic was held by a Topolo

Data can only be processed between different computing components (spout/bolt) in a publish-subscribe manner within Topology, while Stream in

There is no flow between Topology.

Many times, you need to write all your business logic into one of your Topology. Please don't forget that Stream cannot flow between topology.

This means that a process of business logic cannot communicate with another business process.

Let's assume that there is such a Topology1, in the whole process of Topology, through the preliminary filter,join bolt,Business1.

Bolt, where Filter Bolt is used to filter data and join Bolt is used to aggregate data streams, as shown in the following figure:

At present, this Topology has been submitted to the cluster, so if we need a new business logic, and

This Topology is characterized by a common data source with Topology1, and the pre-processing process is the same.

So how does Storm meet this demand at this time?

1 first: kill drops the original topology, then implements the computing logic of bussiness Bolt, and repackages to form a new

After the jar package of the topology computing task is submitted to the Storm cluster to run again, our structure figure is as follows:

In such a process, the processing from different data sources, after processing, after join, is sent to the two business logic processing Bolt.

The drawbacks of the first way:

Topology needs to be redeployed and the state will be lost. And you need to modify your own topology structure, losing the guarantee of stability.

2: the second way:

The same data source is consumed by two processes. It undoubtedly increases the load pressure on External Data Source, and it will cause two copies of our transmitted data to be transmitted in the cluster. Once the factor of data repeated reading is more than 2, then the calculation of Slot for Storm will be wasted seriously.

3 the third way

Ok, after looking at the above two ways, you may propose the following solution to implement different Topology through message middleware such as kafka.

Spout shares the data source, and this can be done.

3.1: [reliable transmission of messages]

Message rewind return, etc.

For the access components of kafka-Storm, please refer to other kafka-related blog posts written by [Zhijing].

For the introduction of message middleware, on the one hand, it reduces the pressure of reducing repeated access to External Data Source, and through message middleware, we shield the pressure of repeated access to External Data Sourcede.

Thank you for your reading, the above is the content of "what is the Storm data flow model". After the study of this article, I believe you have a deeper understanding of what the Storm data flow model has, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.