How to realize the Integration of Spark Streaming and Kafka 07/06 Update SLTechnology News&Howtos

How to realize the Integration of Spark Streaming and Kafka

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article shows you how to achieve the integration of Spark Streaming and Kafka, the content is concise and easy to understand, it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Recently completed the integration of Spark Streaming and Kafka, although the time is not long, but there are still a lot of holes, recorded, convenient for everyone to make a detour.

Let's talk about the environment first:

Spark 2.0.0 kafka_2.11-0.10.0.0

In previous projects, the required Spark Streaming dependencies have been added to pom, but this time you only need to add Spark Streaming Kafka. Here comes the problem. The first is the Spark Streaming dependency I added earlier:

Org.apache.spark

Spark-streaming_2.11

2.0.0

Then there is the dependency of spark streaming support on kafka found:

Org.apache.spark

Spark-streaming-kafka_2.11

1.6.2

Please pay attention to the 2 version parts, which seem to be a little bit worse. Anyway, according to the example, you can see that you have reported all kinds of class not found errors. It can basically be judged that the problem is caused by the difference in version.

But what if you can't find a later version of the dependency on http://mvnrepository.com?

After thinking about it, there is only one way to download the spark source code and compile and package the jar package yourself.

Find the spark project on github, clone down, lazy disease again, did not carefully read the instructions, directly on clean compile and so on. As a result, all kinds of errors were reported again. All right, take a good look. There's an address on github: http://spark.apache.org/docs/latest/building-spark.html. It's no problem if you follow it.

Then delete the dependence on streaming kafka in the pom in the project and introduce the jar package we generated:

Spark-streaming-kafka-0-102.11-2.1.0-SNAPSHOT.jar

Then paste the code:

Val conf = new SparkConf (). SetAppName ("kafkastream"). SetMaster ("spark://master:7077").

Set ("spark.driver.host", "192.168.1.142").

SetJars (List ("/ src/git/msgstream/out/artifacts/msgstream_jar/msgstream.jar"

"/ src/git/msgstream/lib/kafka-clients-0.10.0.0.jar"

"/ src/git/msgstream/lib/kafka_2.11-0.10.0.0.jar"

"/ src/git/msgstream/lib/spark-streaming-kafka-0-102.11-2.1.0-SNAPSHOT.jar"))

Val ssc = new StreamingContext (conf, Seconds (2))

Val topics = List ("woozoom")

Val kafkaParams = Map (("bootstrap.servers", "master:9092,slave01:9092,slave02:9092")

("group.id", "sparkstreaming"), ("key.deserializer", classOf [StringDeserializer])

("value.deserializer", classOf [Stringroomializer])

Val preferredHosts = LocationStrategies.PreferConsistent

Val offsets = Map (new TopicPartition ("woozoom", 0)-> 2L)

Val lines = KafkaUtils.createDirectStream [String, String] (

Ssc

PreferredHosts

ConsumerStrategies.Subscribe [String, String] (topics, kafkaParams, offsets))

Lines.foreachRDD (rdd = > {

Rdd.foreach (x = > {)

Println (x)

})

Ssc.start ()

Ssc.awaitTermination ()

The red part above needs to be paid attention to, which I would not have written. Later, I went to the spark source code to find the test code.

/ src/git/spark/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/DirectKafkaStreamSuite.scala

Test, pass!

The above content is how to achieve the integration of Spark Streaming and Kafka. Have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.