How to use Apache Spark to build and analyze Dashboard 07/19 Update SLTechnology News&Howtos

How to use Apache Spark to build and analyze Dashboard

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "how to use Apache Spark to build and analyze Dashboard". In daily operation, I believe many people have doubts about how to use Apache Spark to build and analyze Dashboard. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the questions of "how to use Apache Spark to build and analyze Dashboard". Next, please follow the editor to study!

Problem description

E-commerce portal (http://www.aaaa.com) wants to build a real-time analysis dashboard to visualize the number of orders shipped per minute, thereby optimizing the efficiency of logistics.

Solution

Before we solve the problem, let's take a quick look at the tools we will use:

Apache Spark-A general-purpose fast processing engine for large-scale data. The batch processing speed of Spark is nearly 10 times faster than that of Hadoop MapReduce, while the data analysis speed in memory is nearly 100 times faster.

Python-Python is a widely used high-level, general-purpose, interpreted, dynamic programming language.

Kafka-A high-throughput, distributed message publishing and subscribing system.

Node.js-an event-driven Istroke O server-side JavaScript environment that runs on the V8 engine.

Socket.io-Socket.IO is a JavaScript library for building real-time Web applications. It supports real-time and two-way communication between Web client and server.

Highcharts-Interactive JavaScript diagrams on web pages.

CloudxLab-provides a real cloud-based environment for practicing and learning tools.

How to build a data Pipeline?

The following is the high-level architecture diagram of the data Pipeline

Data Pipeline

Real-time analysis of Dashboard

Let's start with the description of each phase in the data Pipeline and finish building the solution.

Stage 1

When the customer purchases the items in the system or the order status in the order management system changes, the corresponding order ID as well as the order status and time will be pushed to the corresponding Kafka topic.

Data set

Since there is no real online e-commerce portal, we are going to use the dataset of CSV files to simulate. Let's look at the dataset:

The dataset contains three columns: "DateTime", "OrderId" and "Status". Each row in the dataset represents the status of the order at a specific time. Here we use "xxxxx-xxx" to represent the order ID. We are only interested in the number of orders shipped per minute, so we do not need the actual order ID.

The source code and datasets of the complete solution can be cloned from the CloudxLab GitHub repository.

The dataset is located in the spark-streaming/data/order_data folder of the project.

Push the dataset to Kafka

The shell script takes each line from these CSV files and pushes it to Kafka. After pushing a CSV file to Kafka, you need to wait 1 minute before pushing the next CSV file, which simulates a real-time e-commerce portal environment where the order status is updated at different intervals. In the real world, when the status of the order changes, the corresponding order details are pushed to Kafka.

Run our shell script to push the data into the Kafka theme. Log in to the CloudxLab Web console and run the following command.

Stage 2

After phase 1, each message in the Kafka "order-data" topic will look like this

Stage 3

The Spark streaming code will take the data from the Kafka topic of "order-data" in the 60-second time window and process it, so that the order for each state can be counted in the 60-second time window. After processing, the total count of each status order is pushed to the Kafka topic of "order-one-min-data".

Please run this Spark streaming code in the Web console

Stage 4

At this stage, each message in the Kafka topic "order-one-min-data" will be similar to the following JSON string

Stage 5

Run Node.js server

Now we will run a node.js server to use the message of the "order-one-min-data" Kafka topic and push it to the Web browser so that the number of orders shipped per minute can be displayed in the Web browser.

Run the following command in the Web console to start the node.js server

The node server will now run on port 3001. If a "EADDRINUSE" error occurs when starting the node server, edit the index.js file and change the port to 3002. 3003. 3004 and so on. Use any available port in the range 3001-3010 to run the node server.

Access it with a browser

After starting the node server, go to http://YOUR_WEB_CONSOLE:PORT_NUMBER to access the real-time analysis Dashboard. If your Web console is f.cloudxlab.com and the node server is running on port 3002, go to http://f.cloudxlab.com:3002 to access Dashboard.

When we access the URL above, the socket.io-client library is loaded into the browser, which opens a two-way communication channel between the server and the browser.

Stage 6

Once a new message arrives in Kafka's "order-one-min-data" topic, the node process consumes it. The consumed message will be sent to the Web browser via socket.io.

Stage 7

Once the socket.io-client in the web browser receives a new "message" event, the data in the event will be processed. If the order status in the received data is "shipped", it will be added to the HighCharts coordinate system and displayed in the browser.

Screenshot

We also recorded a video on how to run all of the above commands and build a real-time analysis Dashboard.

We have successfully built a real-time analysis Dashboard. This is a basic example of how to integrate Spark-streaming,Kafka,node.js and socket.io to build a real-time analysis Dashboard. Now, with this basic knowledge, we can use the above tools to build more complex systems.

At this point, the study on "how to use Apache Spark to build and analyze Dashboard" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.