What's the use of storm? 04/28 Update SLTechnology News&Howtos

What's the use of storm?

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail what is the use of storm for you. The editor thinks it is very practical, so I share it with you for reference. I hope you can get something after reading this article.

Storm introduction

Most of the full data processing uses the famous hadoop or hive. As a batch processing system, hadoop has been widely used in massive data processing because of its high throughput and automatic fault tolerance. However, hadoop is not good at real-time computing because it is naturally made for batch processing. Storm is different from hadoop. In Storm, the computing task of a real-time application is packaged and published as Topology, which is similar to Hadoop's MapReduce task. But there is one difference: in Hadoop, the MapReduce task is eventually executed and ends. In Storm, the Topology task will never end once submitted, unless you show it to stop the task, which is used in real-time streaming computing. It is widely used for real-time log processing, real-time statistics, real-time risk control and other scenarios. Of course, it can also be used for real-time preliminary processing of data, stored in a distributed database such as HBase, to facilitate subsequent queries. Faced with the real-time computing of large quantities of data, storm implements a scalable, low-latency, reliable and fault-tolerant distributed computing platform.

Storm component

The Storm cluster is mainly composed of a master node and a group of work nodes (worker node), which are coordinated by Zookeeper.

Primary node:

The master node usually runs a daemon, Nimbus, the master node, which publishes code in the cluster, assigns work to machines, and listens for status.

Work node:

The worker node also runs a daemon, Supervisor, to listen to job assignments and run worker processes based on requirements. Each worker node is an implementation of a subset of topology. The coordination between Nimbus and Supervisor is through the Zookeeper system or cluster.

Zookeeper:

Zookeeper is a service that performs coordination between Supervisor and Nimbus. Nimbus daemons and Supervisors daemons are unreachable and stateless; all states are maintained in Zookeeper or saved on local disk. This means that you can kill-9 Nimbus or Supervisors processes, so they don't need to make backups. This design leads to the incredible stability of the Storm cluster. On the other hand, the real-time logic of the application is encapsulated into the "topology" in Storm. Topology is a group of graphs connected by Spouts (data source) and Bolts (data manipulation) through Stream Groupings.

The following is a more in-depth analysis of the terms that appear.

Spout:

In short, Spout reads data from the source and puts it into topology. Spout is divided into reliable and unreliable; when Storm reception fails, reliable Spout will retransmit tuple (tuple, list of data items), while unreliable Spout will not consider whether receiving is successful or not. The main method in Spout is nextTuple (), which emits a new tuple to topology and simply returns if there is no new tuple emission.

Bolt:

All processing in Topology is done by Bolt. Bolt can do anything, such as filtering connections, aggregating, accessing files / databases, and so on. Bolt receives data from the Spout and processes it, and if it encounters the processing of a complex stream, it is possible to send the tuple to another Bolt for processing. The most important method in Bolt is execute (), which is received with the new tuple as a parameter. Whether it's Spout or Bolt, if the tuple is emitted into multiple streams, those streams can be declared by declareStream ().

Stream Groupings:

Stream Grouping defines how a stream should be split between Bolt tasks. There are six Stream Grouping types provided by Storm:

1. Random grouping (Shuffle grouping): randomly distributes tasks from tuple to Bolt, ensuring that each task gets an equal number of tuple.

two。 Field grouping (Fields grouping): data streams are split and grouped according to the specified fields. For example, according to the "user-id" field, tuples with the same "user-id" are always distributed to the same task, while tuples with different "user-id" may be distributed to different tasks.

3. Grouping all (All grouping): all tasks for which tuple is copied to bolt. This type needs to be used with caution.

4. Global grouping (Global grouping): all streams are assigned to the same task of bolt. Specifically, it is assigned to the smallest task of the ID.

5. No grouping (None grouping): you don't need to care about how streams are grouped. At present, no grouping is equivalent to random grouping. In the end, however, Storm will put the ungrouped Bolts into the same thread where Bolts or Spouts subscribes to them, if possible.

6. Direct grouping (Direct grouping): this is a special grouping type. The tuple producer determines which tuple handler task receives the tuple.

Of course, you can also implement the CustomStreamGroupimg interface to customize the grouping you need.

Storm process

Storm is a distributed, reliable, fault-tolerant data stream processing system. It delegates work tasks to different types of components, each of which is responsible for handling a simple and specific task. The input stream of the Storm cluster is managed by a component called spout. Spout passes the data to bolt, and bolt either saves the data to some kind of memory or passes the data to other bolt. As you can imagine, a Storm cluster is to convert data from spout to a series of bolt.

Here is a simple example to illustrate this concept. I saw the host on the news program last night talking about politicians and their positions on various political topics. They kept repeating different names, and I began to think about whether these names were mentioned the same number of times, and the deviation between different times.

Imagine the subtitles read by the announcer as your data input stream. You can use a spout to read a file (or socket, through HTTP, or some other method). The line of text is passed to a bolt by spout and then cut by word by bolt. The word stream is passed to another bolt, where each word is compared with a list of political names. Each time a matching name is encountered, the second bolt adds 1 to the count of that name in the database. You can query the database at any time to see the results, and these counts are updated in real time as the data arrives. All components (spouts and bolts) and their relationships are shown in Topology figure 1-1

Storm architecture diagram

Storm topology diagram

Storm overall architecture diagram

Storm Job Architecture Diagram

This is the end of this article on "what's the use of storm". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it out for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.