How to resolve the dependency problem of sbt compiling Spark App 04/09 Update SLTechnology News&Howtos

How to resolve the dependency problem of sbt compiling Spark App

2025-04-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article is about how to parse the dependency of sbt compiling Spark App, the editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article, so let's take a look at it.

Background introduction

Spark App (written in Spark APIs) needs to be run from submit to Spark Cluster. For code written by Scala, use sbt or maven to put the following before submission:

source code

Dependent jar package

It's all packaged into a large jar file so that the code doesn't run in the cluster because there are no dependencies.

problem

We use Scala to write Spark streaming application to read Kafka data and store them in cassandra cluster after processing. Here you need to use a package spark-streaming-kafka, the previous version of spark1.6.0. The configuration in sbt is as follows:

LibraryDependencies + + = Seq (/ / Spark dependency "com.eaio.uuid"% "uuid"% "3.2"," org.apache.spark "%" spark-core "%" 1.6.0 "%" provided "," org.apache.spark "%" spark-sql "%" 1.6.0 "%" provided " "org.apache.spark"% "spark-streaming"% "1.6.0"% "provided", "org.apache.spark"% "spark-streaming-kafka"% "1.6.0", "com.datastax.spark"% "spark-cassandra-connector"% "1.6.0-M2", / / Third-party libraries "com.github.scopt"% "scopt"3.4.0")

After upgrading to Spark 2.0.0, you need to update the package version, so change the dependency part of the sbt build configuration to:

LibraryDependencies + + = Seq (/ / Spark dependency "com.eaio.uuid"% "uuid"% "3.2"," org.apache.spark "%" spark-core "%" 2.0.0 "%" provided "," org.apache.spark "%" spark-sql "%" 2.0.0 "%" provided " "org.apache.spark"% "spark-streaming"% "2.0.0"provided", "org.apache.spark"% "spark-streaming-kafka"% "2.0.0", "com.datastax.spark"% "spark-cassandra-connector"% "2.0.0-M2", / / Third-party libraries "com.github.scopt"% "scopt"% "3.4.0")

I thought it would be no problem to rebuild after this modification. But I was so naive that I reported an error after building, with a hint:

[warn]: [warn]:: UNRESOLVED DEPENDENCIES:: [warn]:: : [warn]:: org.apache.spark#spark-streaming-kafka_2.10 2.0.0: not found [warn]: [warn] [warn] Note: Unresolved dependencies path: [warn] org.apache.spark:spark-streaming-kafka_2.10:2.0.0 (/ home/linker/workspace/linkerwp/linkerStreaming/build.sbt#L12 -23) [warn] +-Linker Networks Inc.:linker-streaming_2.10:0.0.1sbt.ResolveException: unresolved dependency: org.apache.spark#spark-streaming-kafka_2.10 2.0.0: not found

It shows that the bag does not exist, so go to maven repo immediately. Common Maven public repo are:

Search.maven.org... Used to search for your dependent packages.

Https://mvnrepository.com/... Maven warehouse.

Enter the URL and enter spark-streaming-kafka to search several options, the first four results do not support Spark 2.0.0, which makes me think that Spark 2.0.0 does not support Kafka, this idea is misled by the previous search results. Because a specific version number has been added to the 2.0.0 Spark,Kafka! Spark-streaming-kafka → spark-streaming-kafka-0-8 can be found (in fact, this version is also in the maven repo search results, because I didn't look at it later)!

I used to be particularly resistant to the compilation of Java/Scala because the configuration files of maven and sbt are cumbersome and not as concise as Python. The 20-line dependency file in Python is at least 200 lines in maven/sbt, and as long as there is one place that is not written correctly, it cannot be compiled correctly.

It is now found that in order to compile correctly and ensure that the source code is fine, you need to specify the correct dependent package and format. This needs to be searched in maven's warehouse, confirmed and then added to the configuration file.

Learn to diverge and expand your thinking. When you see a sbt compilation failure, you should guess the cause of the problem based on the error message: "the version of the dependency package is incorrect," and then specify the correct version.

The above is how to resolve the dependency problem of sbt compiling Spark App. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.