Big data Learning (Storm)-detailed explanation of the principle! 05/02 Update SLTechnology News&Howtos

Big data Learning (Storm)-detailed explanation of the principle!

2025-05-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Role

Client

The main role of client is to submit topology to the cluster

Worker

Worker is an independent JVM process running on the Supervisor node. Its main function is to run topology. A topology can contain multiple worker, but a worker can only belong to one topology.

Exceutor

For threads running in Worker, an Executor can correspond to one or more Task, and each Task (Spout or Bolt) must correspond to an Executor.

Task

An instance of independent processing logic, where each Spout or Bolt can correspond to multiple Task running in the cluster, and each thread corresponds to an executor thread.

Streaminggroup defines how to send data from one pile of Task to another pile of Task.

The process of startup, task submission and execution of Storm cluster

Start

When a customer runs storm nimbus or storm supervisor, there are actually two python functions inside the storm script, which eventually generate a java command to start a storm java process:

Java-server xxxx.xxxx.nimbus/supervisor args

one

Task submission

Run storm java xxxx.MainClass name, which executes the main function of the Driver driver class

In the driver class, the topologyBuilder.createTopology () method is called, which generates serialized objects for spout and bolt.

The client uploads the jar corresponding to topology to the storm-local/nimbus/inbox directory of nimbus

First, nimbus will copy the storm.jar to the / home/hadoop/storm-local/nimbus/stormdist/wordcount01-2-1525621662 directory and generate the task serialization file and the related configuration serialization file based on the serialization object generated in the second step (a unique topology name generated by wordcount01-2-1525621662 for storm). At this point, nimbus can assign tasks.

-rw-rw-r--. 1 hadoop hadoop 3615 May 6 23:47 stormcode.ser

-rw-rw-r--. 1 hadoop hadoop 733 May 6 23:47 stormconf.ser

-rw-rw-r--. 1 hadoop hadoop 3248667 May 6 23:47 stormjar.jar

one

two

three

Next, the task is assigned, and after the assignment is completed, an assegiment object is generated, which is serialized and saved to the / storm/assignments/wordcount01-2-1525621662 directory of zookeeper.

Supervisor senses the change of / storm/assignments directory through zookeeper's watch mechanism, and pulls its own topology of the data (when nimbus allocates, it specifies the supervisor to which task belongs)

When supversior starts worker on a designated port according to the information pulled, it actually executes a java script

Java-server xxxxx.xxxx.worker

one

After the worker starts, execution begins according to the assigned task information.

Big data Learning Exchange Group 766988146 whether you are rookie or Daniel, I welcome you. Today's source code has been uploaded to the group file and practical information is shared irregularly.

Including my own arrangement of the latest big data development and zero-foundation introductory tutorial suitable for 2018 study. Beginners and advanced partners are welcome.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.