How to use Apache Spark to build real-time analysis Dashboard 02/08 Update SLTechnology News&Howtos

How to use Apache Spark to build real-time analysis Dashboard

2026-02-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

How to use Apache Spark to build real-time analysis Dashboard, in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

We will learn how to build a real-time analysis Dashboard using Apache Spark streaming,Kafka,Node.js,Socket.IO and Highcharts.

Problem description

The e-commerce portal hopes to build a real-time analysis dashboard to visualize the number of orders shipped per minute, so as to optimize the efficiency of logistics.

Solution

Before we solve the problem, let's take a quick look at the tools we will use:

Apache Spark-A general-purpose fast processing engine for large-scale data. The batch processing speed of Spark is nearly 10 times faster than that of Hadoop MapReduce, while the data analysis speed in memory is nearly 100 times faster. More information about Apache Spark.

Python-Python is a widely used high-level, general-purpose, interpreted, dynamic programming language. More information about Python.

Kafka-A high-throughput, distributed message publishing and subscribing system. More information about Kafka.

Node.js-an event-driven Istroke O server-side JavaScript environment that runs on the V8 engine. More information about Node.js.

Socket.io-Socket.IO is a JavaScript library for building real-time Web applications. It supports real-time and two-way communication between Web client and server.

Highcharts-Interactive JavaScript diagrams on web pages. More information about Highcharts.

CloudxLab-provides a real cloud-based environment for practicing and learning tools. You can start practicing immediately by registering online.

How to build a data Pipeline?

The following is the high-level architecture diagram of the data Pipeline

Our real-time analysis Dashboard will look like this

Real-time analysis of Dashboard

Let's start with the description of each phase in the data Pipeline and finish building the solution.

Stage 1

When the customer purchases the items in the system or the order status in the order management system changes, the corresponding order ID as well as the order status and time will be pushed to the corresponding Kafka topic.

Dataset 36 big data (http://www.36dsj.com/)

Since there is no real online e-commerce portal, we are going to use the dataset of CSV files to simulate. Let's look at the dataset:

The dataset contains three columns: "DateTime", "OrderId" and "Status". Each row in the dataset represents the status of the order at a specific time. Here we use "xxxxx-xxx" to represent the order ID. We are only interested in the number of orders shipped per minute, so we do not need the actual order ID.

The source code and datasets of the complete solution can be cloned from the CloudxLab GitHub repository.

The dataset is located in the spark-streaming/data/order_data folder of the project.

Push the dataset to Kafka

The shell script takes each line from these CSV files and pushes it to Kafka. After pushing a CSV file to Kafka, you need to wait 1 minute before pushing the next CSV file, which simulates a real-time e-commerce portal environment where the order status is updated at different intervals. In the real world, when the status of the order changes, the corresponding order details are pushed to Kafka.

Run our shell script to push the data into the Kafka theme. Log in to the CloudxLab Web console and run the following command.

Stage 2

After phase 1, each message in the Kafka "order-data" topic will look like this

Stage 3

The Spark streaming code will take the data from the Kafka topic of "order-data" in the 60-second time window and process it, so that the order for each state can be counted in the 60-second time window. After processing, the total count of each status order is pushed to the Kafka topic of "order-one-min-data".

Please run this Spark streaming code in the Web console

Stage 4

At this stage, each message in the Kafka topic "order-one-min-data" will be similar to the following JSON string

Stage 5

Run Node.js server

Now we will run a node.js server to use the message of the "order-one-min-data" Kafka topic and push it to the Web browser so that the number of orders shipped per minute can be displayed in the Web browser.

Run the following command in the Web console to start the node.js server

The node server will now run on port 3001. If a "EADDRINUSE" error occurs when starting the node server, edit the index.js file and change the port to 3002 in turn. 3003... 3004, etc. Use any available port in the range 3001-3010 to run the node server.

Access it with a browser

After starting the node server, go to http://YOUR_WEB_CONSOLE:PORT_NUMBER to access the real-time analysis Dashboard. If your Web console is f.cloudxlab.com and the node server is running on port 3002, go to http://f.cloudxlab.com:3002 to access Dashboard.

When we access the URL above, the socket.io-client library is loaded into the browser, which opens a two-way communication channel between the server and the browser.

Stage 6

Once a new message arrives in Kafka's "order-one-min-data" topic, the node process consumes it. The consumed message will be sent to the Web browser via socket.io.

Stage 7

Once the socket.io-client in the web browser receives a new "message" event, the data in the event will be processed. If the order status in the received data is "shipped", it will be added to the HighCharts coordinate system and displayed in the browser.

We also recorded a video on how to run all of the above commands and build a real-time analysis Dashboard.

We have successfully built a real-time analysis Dashboard. This is a basic example of how to integrate Spark-streaming,Kafka,node.js and socket.io to build a real-time analysis Dashboard. Now, with this basic knowledge, we can use the above tools to build more complex systems.

This is the answer to the question about how to use Apache Spark to build real-time analysis Dashboard. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.