The perfect combination of Redis Streams and Spark 07/12 Update SLTechnology News&Howtos

The perfect combination of Redis Streams and Spark

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Source: Redislabs

Author: Roshan Kumar

Translation: Kevi × ×)

Recently, I had the privilege of giving a speech entitled "Redis + Structured Streaming: the perfect combination to expand your ongoing applications" at the Spark + AI Summit.

My interest in this topic is caused by new features introduced by Apache Spark and Redis in the past few months. Based on my previous experience with Apache Spark, I appreciate its elegance in running batches, and its introduction of Structured Streaming in version 2.0 is a further development in this direction.

Meanwhile, Redis recently announced a new data structure for managing streaming data, called "Streams". Redis Streams provides asynchronous communication between producers and consumers as well as persistence, retrospective query capabilities, and scale-out options similar to Apache Kafka. In essence, Redis provides a lightweight, fast, and easy-to-manage streaming database through Streams, benefiting data engineers.

In addition, the Spark-Redis library was developed to make Redis available as a resilient distributed dataset (RDD). Now that we have Structured Streaming and Redis Streams, we decided to extend the Spark-Redis library to integrate Redis Streams into a data source for Apache Spark Structured Streaming.

In my presentation last month, I demonstrated how to collect user activity data in Redis Streams and download it to Apache Spark for real-time data analysis. I've developed a small Node.js app for mobile devices in which people can click to vote for their favorite dog for interesting races.

It was an uphill battle, and several viewers, even * *, creatively attacked my ji app. They used the "page check" option to change the name of the HTML button in an attempt to mess up the display of the application. But in the end, they failed because the Redis Streams,Apache Spark,Spark-Redis library and my code were strong enough to deal with these attacks on ji effectively.

During and after my speech, the audience also asked some interesting questions, such as:

1. How to expand if the data processing speed is lower than the rate at which Redis Streams receives data?

My answer: configure a consumer group for Redis Streams and treat each Spark job as a consumer belonging to that group, so that each job gets a unique set of data, and it is important to set the output mode to Update, so that each job does not overwrite data submissions from other jobs.

two。 What happens to the data in Redis Streams if I restart the Spark job?

My answer: RedisStreams persists data. So your Spark job will not miss any data, and if you restart the Spark job, it will extract data from where it was previously stopped.

3. Can I use Python to develop my Spark application? (my presentation was written in Scala.)

My answer: yes, you can. Please refer to the Spark-Redis documentation on GitHub.

4. Can I deploy Redis Streams on the cloud?

My answer: yes, Streams is just another data structure in Redis, which is built into Redis since version 5.0. the quickest way is to register on https://redislabs.com/get-started.

My main achievement at the summit was to learn that there is a growing interest in continuous processing and data flow. According to your needs, we have posted a more detailed article on this topic on InfoQ, which provides detailed information on how to set up Redis Streams and Apache Spark and connect using the Spark-Redis library, and you can check out the full video of my presentation at any time.

More high-quality middleware technical information / original / translated articles / materials / practical information, × ×

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.