What are the two modes of data processing in storm 07/02 Update SLTechnology News&Howtos

What are the two modes of data processing in storm

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "what are the two modes of storm processing data". In daily operation, I believe that many people have doubts about what are the two modes of storm processing data. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the doubts of "what are the two modes of storm processing data?" Next, please follow the editor to study!

1. Introduction to Storm

Storm is a distributed, fault-tolerant real-time computing system, which was initially hosted on GitHub and complied with Eclipse Public License 1.0. Storm is a real-time processing system developed by BackType and open source by Twitter

In 2013, Storm entered the Apache community for incubation

In September 2014, it was promoted to the top project of Apache.

The official website http://storm.apache.org/

The difference between hadoop and storm: hadoop starts and stops repeatedly, it takes time for data to land repeatedly, storm uses streaming processing, and data does not land.

Data source: hadoop data is TB-level data on hdfs, and storm is real-time new data.

Process: hadoop is divided into split, map, shuffler, reduce and other stages. Storm is a user-defined process, which can contain multiple steps, each of which can be a data source (spout) or processing logic (bolt).

Whether it is over or not: the hadoop finally has to end; the storm is not finished, and the data processing is waiting for the new data to enter.

Processing speed: hadoop is slow to deal with accumulated data, while storm only deals with newly added data with high timeliness.

2. Storm architecture:

Nimbus: steward, but may not exist. If Nimbus dies, the job that raised the price before can continue to process, and cannot be submitted later, so the main function of Nimbus is to submit tasks and communicate with Supervisor through zk. It is equivalent to Leader.

Supervisor:

Worker

Programming model

DAG: directed acyclic graph

Spout: source

After Bolt:bolt processes the data, it uses the message framework to return the information to the previous Bolt or Spout.

Data transfer: storm underlying data transfer using zmq or netty

Zmq

Zmq is an open source messaging framework; (version 0.9 is no longer used)

Netty

Netty is the network framework of NIO with high efficiency. The reason why there is netty storm after apache, zmq follows the protocol of linux, and the protocol followed by netty is relatively loose.

High availability:

Exception handling, even if the exception crashes, it does not affect. For example, if supervisor fails, nimbus will reschedule.

Message reliability mechanism guarantee, ack mechanism

Maintainable:

Storm UI

3. There are two modes in which Storm processes data:

Real-time request response mode (synchronization)

Client-"DRPC Server -" Spout--- "Bolt-" Return--- "(return to the previous drpc server and then to Client)

There are different Spout corresponding to DRPC Server-- "DRPC Spout, Topology (topology), ReturnResult"

Streaming (asynchronous)

Client--- "MQ--- > Sport- > Bolt1- > bolt2- > Storage (redis, hbase, Mysql, mq, etc.)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.