In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces "what are the two modes of storm processing data". In daily operation, I believe that many people have doubts about what are the two modes of storm processing data. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the doubts of "what are the two modes of storm processing data?" Next, please follow the editor to study!
1. Introduction to Storm
Storm is a distributed, fault-tolerant real-time computing system, which was initially hosted on GitHub and complied with Eclipse Public License 1.0. Storm is a real-time processing system developed by BackType and open source by Twitter
In 2013, Storm entered the Apache community for incubation
In September 2014, it was promoted to the top project of Apache.
The official website http://storm.apache.org/
The difference between hadoop and storm: hadoop starts and stops repeatedly, it takes time for data to land repeatedly, storm uses streaming processing, and data does not land.
Data source: hadoop data is TB-level data on hdfs, and storm is real-time new data.
Process: hadoop is divided into split, map, shuffler, reduce and other stages. Storm is a user-defined process, which can contain multiple steps, each of which can be a data source (spout) or processing logic (bolt).
Whether it is over or not: the hadoop finally has to end; the storm is not finished, and the data processing is waiting for the new data to enter.
Processing speed: hadoop is slow to deal with accumulated data, while storm only deals with newly added data with high timeliness.
2. Storm architecture:
Nimbus: steward, but may not exist. If Nimbus dies, the job that raised the price before can continue to process, and cannot be submitted later, so the main function of Nimbus is to submit tasks and communicate with Supervisor through zk. It is equivalent to Leader.
Supervisor:
Worker
Programming model
DAG: directed acyclic graph
Spout: source
After Bolt:bolt processes the data, it uses the message framework to return the information to the previous Bolt or Spout.
Data transfer: storm underlying data transfer using zmq or netty
Zmq
Zmq is an open source messaging framework; (version 0.9 is no longer used)
Netty
Netty is the network framework of NIO with high efficiency. The reason why there is netty storm after apache, zmq follows the protocol of linux, and the protocol followed by netty is relatively loose.
High availability:
Exception handling, even if the exception crashes, it does not affect. For example, if supervisor fails, nimbus will reschedule.
Message reliability mechanism guarantee, ack mechanism
Maintainable:
Storm UI
3. There are two modes in which Storm processes data:
Real-time request response mode (synchronization)
Client-"DRPC Server -" Spout--- "Bolt-" Return--- "(return to the previous drpc server and then to Client)
There are different Spout corresponding to DRPC Server-- "DRPC Spout, Topology (topology), ReturnResult"
Streaming (asynchronous)
Client--- "MQ--- > Sport- > Bolt1- > bolt2- > Storage (redis, hbase, Mysql, mq, etc.)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.