How to make the preliminary setup of Twitter Storm advance 04/26 Update SLTechnology News&Howtos

How to make the preliminary setup of Twitter Storm advance

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you preliminary settings on how to advance Twitter Storm. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

This Blog is a simple example of getting started with Storm, designed to let readers understand how Storm works. And several Storm advanced features that will be released later, and eventually Storm will be integrated into the YARN of Hadoop 2.x. The target readers are Hadoop,Spark users who have advanced big data, or readers who want to know more about Storm who want to understand Storm deeply.

Project Pom (Storm jar is not submitted to the Maven central warehouse, the following warehouse address needs to be added to the project):

Central Maven Repository Switchboard default http://maven.oschina.net/content/groups/public/ false clojars https://clojars.org/repo/ false true org.yaml snakeyaml 1.13 org.apache.zookeeper zookeeper 3.3.3 org.clojure clojure 1.5.1 storm storm 0.9.0.1 storm libthrift7 0.7.0

The following is an example of Storm's HelloWord, the code has been deleted, readers familiar with Storm can naturally organize the code into a complete example.

Public static void main (String [] args) {Config conf = new Config (); conf.put (Config.STORM_LOCAL_DIR, "/ Volumes/Study/data/storm"); conf.put (Config.STORM_CLUSTER_MODE, "local"); / / conf.put ("storm.local.mode.zmq", "false"); conf.put ("storm.zookeeper.root", "/ storm"); conf.put ("storm.zookeeper.session.timeout", 50000) Conf.put ("storm.zookeeper.servers", "nowledgedata-n15"); conf.put ("storm.zookeeper.port", 2181); / / conf.setDebug (true); / / conf.setNumWorkers (2); TopologyBuilder builder = new TopologyBuilder (); builder.setSpout ("words", new TestWordSpout (), 2); builder.setBolt ("exclaim2", new DefaultStringBolt (), 5) .shuffleGrouping ("words"); LocalCluster cluster = new LocalCluster (); cluster.submitTopology ("test", conf, builder.createTopology ());}

Config.STORM_LOCAL_DIR is to configure a local path in which Storm writes some configuration information and temporary data.

Config.STORM_CLUSTER_MODE is the running mode, local and distributed are two options, that is, local mode and distributed mode. The local mode is multithreaded simulation at run time, which is used for development and testing; the distributed mode is multi-process and truly distributed under the distributed cluster.

Storm's Spout and Blot high availability are coordinated through ZooKeeper. Storm.zookeeper.root is a ZooKeeper address and has a corresponding port number.

Debug is in test mode and has more detailed log information.

TestWordSpout is an example that comes with Storm, which is used to randomly generate strings in new String [] {"nathan", "mike", "jackson", "golda", "bertels"}; lists to provide data sources.

The source code of DefaultStringBolt:

OutputCollector collector; public void prepare (Map conf, TopologyContext context, OutputCollector collector) {this.collector = collector;} public void execute (Tuple tuple) {log.info ("rev a message:" + tuple.getString (0)); collector.emit (tuple, new Values (tuple.getString (0) + "!!")); collector.ack (tuple);}

Run log:

10658 [Thread-29-exclaim2] INFO cn.pointways.dstorm.bolt.DefaultStringBolt-rev a message: jackson 10658 [Thread-31-exclaim2] INFO cn.pointways.dstorm.bolt.DefaultStringBolt-rev a message: jackson 10758 [Thread-26-exclaim2] INFO cn.pointways.dstorm.bolt.DefaultStringBolt-rev a message: mike 10758 [Thread-33-exclaim2] INFO cn.pointways.dstorm.bolt.DefaultStringBolt-rev a message: nathan 10859 [Thread-26-exclaim2] INFO cn.pointways.dstorm.bolt.DefaultStringBolt-rev a message: nathan 10859 [Thread-29-exclaim2] INFO cn.pointways.dstorm.bolt.DefaultStringBolt-rev a message: bertels 10961 [Thread-31-exclaim2] INFO cn.pointways.dstorm.bolt.DefaultStringBolt-rev a message: jackson 10961 [Thread-33-exclaim2] INFO cn.pointways.dstorm.bolt.DefaultStringBolt-rev a message: jackson 11061 [Thread-35-exclaim2] INFO cn.pointways.dstorm.bolt.DefaultStringBolt-rev a message: nathan 11062 [Thread-35-exclaim2] INFO cn.pointways.dstorm.bolt.DefaultStringBolt-rev a message: nathan 11162 [Thread-26-exclaim2] INFO cn.pointways.dstorm.bolt.DefaultStringBolt-rev a message: bertels 11163 [Thread-26-exclaim2] INFO cn.pointways.dstorm.bolt.DefaultStringBolt-rev a message: jackson

The data is generated by a Storm called a nozzle (Spout, which is also the equivalent of a faucet, which produces the source side of the data), which is then passed to a series of Blot at the back end, which is eventually converted and consumed. Spout and Blot are both parallel, and the parallelism can be set on their own (the local run is simulated by multithreading). Such as:

Builder.setSpout ("words", new TestWordSpout (), 2); builder.setBolt ("exclaim2", new DefaultStringBolt (), 5)

The parallelism of nozzle TestWordSpout is 2. The parallelism of DefaultStringBolt is 5. 5.

As you can see from the log, the data goes through the nozzle to a predetermined Blot and the log is printed. The parallelism set by my test code is 5, and according to the statistics in the log, it is indeed 5 threads:

Thread-29-exclaim2 Thread-31-exclaim2 Thread-26-exclaim2 Thread-33-exclaim2 Thread-35-exclaim2

What is it about Storm? Here is a detailed introduction.

To borrow the words of OSC netizens, Hadoop is an automatic elevator in a shopping mall, users need to wait in line, choose the floor, and then arrive, while Storm is like an escalator, after the escalator is pre-set for operation, the incoming person will be transported away immediately, and the destination is clear.

Storm as I understand it, Storm and Hadoop are completely different, and there is no fitting part in the design. Storm is more like the Spring Integration I introduced earlier, which is a data flow system. It can transform, transfer, decompose, merge and transfer the data to the back-end storage according to the preset process. It's just that Storm can be distributed, and the ability to distribute can be set by yourself.

This feature of Storm is very suitable for the development of ETL systems like big data.

The above is the editor for you to share how to Twitter Storm advanced preliminary settings, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.