Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the Storm components

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what are the Storm components". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn what Storm components are.

Storm component

A Storm cluster is ostensibly similar to a hadoop cluster. On the hadoop cluster, you can run mapreduce job, while on the storm cluster you can run all kinds of topology. But in fact, mapreduce job and topology, they are very different-the main difference is that a Mapreduce job can run to the end, and then a topology will process the information until you kill it.

There are two types of nodes in a Storm cluster: master node and worker nodes. Master node runs a daemon called "Nimbus", which is very similar to JobTracker in the hadoop 1.0 era. Nimbus is responsible for distributing code, assigning task to node, and monitoring.

Each worker node runs a daemon called "Supervisor". Supervisor listens for tasks assigned to the current node and starts and terminates worker processes (worker process handles tasks assigned to it by Nimbus). And each worker process execution part of the topolgy; an executing topology contains a lot of worker processes execution in the cluster.

The collaboration between Nimbus and Supervisors is done through the Zookeeper cluster. In addition, Nimbus and Supervisor are quick failures and stateless (TODO understand), while all states are saved on Zookeeper or local disk. In other words, you can use kill-9 to kill Nimbus or Supervisors, and they will immediately change back as if nothing had happened. These designs make the Storm cluster very stable. I hope so. It will be used in the project soon.

Topology

To do real-time computing on Storm, you need to create a "Topology". Each node in Topology covers logical processing, data transfer dependencies between nodes, and so on.

Performing a Topology is very simple. First, you need to package all your code and rely on a jar, and then you can execute the next command:

Storm jar all-my-code.jar backtype.storm.MyTopology arg1 arg2

This command executes the class backtype.storm.MyTopology and takes the external parameters arg1 and arg2 (for the main function). The main method of this class defines the topology and submits it to Nimbus. Storm jar will be responsible for connecting to Nimbus and uploading the jar.

Because the structure definition of topology is Thrift (a boast language invocation framework written by facebook) structure, and Nimbus is also a Thrift service, you can create and submit topology written in any language.

Streams

The most abstract thing in Storm is "stream". Stream is an unbounded and serialized tuples. Storm provides primitives to transform one stream into another new stream, and the transformation is distributed and reliable. The picture below is a portrayal of Stream.

The primitives provided by Storm are spouts and bolts. Both Spouts and bolts have interfaces, and you need an implementation to execute your application logic.

Spout is the source of stream. For example, a spout may read tuples from the Kestrel queue and transmit the data in the form of a stream. In other words, a spout will connect to the API of the Twitter to transmit data streams. For Storm, Spout is the crawler of the source data, responsible for providing a continuous stream of data for the entire streaming computing. )

Bolt, on the other hand, is a consumer of stream, doing some data processing and possibly transmitting new data streams. Mixing complex stream transformations requires multiple steps, multiple bolts. Bolts can do anything, such as performing functions, filtering tuples, data flow aggregation, data flow join, collaboration with databases, and so on. For Storm, Bolt is the processor of data flow, and there can be multiple, responsible for aggregate computing on real business logic, and so on. )

The network transmission between Spouts and Bolts is packaged into a topology, and then you hand it over to a Storm cluster to perform. When a spout or bolt emits a tuple into the stream, it emits the tuple into all the bolt that grabs the stream. (it should be noted that when data aggregation occurs in multiple bolt that handle the same logical business, the aggregate result of bolt may not be the final result, which is the same as when multiple reducer calculates Top one. )

In Storm, each node execution is parallel. In your topology, you can specify the parallelism of the node (for example, 5), and then Storm will generate a considerable amount of Thread to be executed in the cluster.

A Topology will be executed forever, or you kill it. Storm will also automatically reassign those failed task. Of course, Storm will also ensure that there is no data loss, even machine offline and data deletion (TODO remains to be studied)

Thank you for your reading, these are the contents of "what are the Storm components?" after the study of this article, I believe you have a deeper understanding of what Storm components have, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report