Writing spark programs in IDEA 04/12 Update SLTechnology News&Howtos

Writing spark programs in IDEA

2025-04-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Here is an example of a scala version of the word count program:

① creates a maven project:

② enter the GAV of maven:

③ fill in the project name:

After ④ has created the maven project, click Enable Auto-Import

⑤ configuration pom.xml file:

UTF8 1.8 UTF-8 2.11.8 2.3.1 2.7.6 2.11 org.scala-lang scala-library ${scala.version} org.apache.spark spark-core_2.11 ${spark.version} org.apache.spark spark-sql_2.11 ${spark.version} org.apache.spark spark-streaming_2.11 ${spark .version} org.apache.spark spark-graphx_2.11 ${spark.version} org.apache.spark spark-mllib_2.11 ${spark.version} org.apache.hadoop hadoop-client ${hadoop.version}

Write code:

Object WordCount {def main (args: Array [String]): Unit = {/ / get the cluster entry val conf: SparkConf = new SparkConf () conf.setAppName ("WordCount") val sc = new SparkContext (conf) / / read the file from HDFS val lineRDD: RDD [String] = sc.textFile ("hdfs://zzy/data/input/words.txt") / / do Data processing val wordRDD: RDD [String] = lineRDD.flatMap (line= > line.split ("\\ s +")) val wordAndCountRDD: RDD [(String) Int)] = wordRDD.map (word= > (word,1)) / / write the result to HDFS wordAndCountRDD.reduceByKey (_ + _) .saveAsTextFile ("hdfs://zzy/data/output") / / close the programming entry sc.stop ()}}

Pack the jar package:

Add the appropriate plug-ins to pom.xml:

Maven-clean-plugin 3.1.0 maven-resources-plugin 3.0.2 maven-compiler-plugin 3.8.0 maven-surefire-plugin 2.22.1 maven-jar-plugin 3.0 . 2 maven-install-plugin 2.5.2 maven-deploy-plugin 2.8.2 maven-site-plugin 3.7.1 maven-project-info-reports-plugin 3.0.0

Then:

Upload the jar package to the cluster to run:

Spark-submit\-class com.zy.scala.WordCount\-- master yarn\-- deploy-mode cluster\-- driver-memory 200m\-- executor-memory 200m\-- total-executor-cores 1\ hdfs://zzy/data/jar/ScalaTest-1.0-SNAPSHOT.jar\

At this point, you can check the progress of the corresponding program in the web of yarn.

At this point, the program always ends abnormally:

I will use:

Yarn logs-applicationId application_1522668922644_40211 looks at the error message.

Results: not fount class: scala.WordCount.

Then I wondered if there was something wrong with the jar package, so I opened it. In the jar package I uploaded before, I could not find the program for me to play jar at all. There was only one program, META-INF. At this time, I had to understand it, and then I tried it again and again, but I still couldn't solve it. Later, when I came back from dinner, I suddenly thought that maven could not package the programs written by scala into jar packages. Later, through Baidu, I found:

Maven compiles only java files by default, not scala files. But maven provides a class library that can compile scala.

Well-intentioned blogger: scala in IDEA jar package related questions: https://blog.csdn.net/freecrystal_alex/article/details/78296851

Then I modified the pom.xml file:

Http://down.51cto.com/data/2457588

In accordance with the above steps, the task was resubmitted to the cluster, and the result was unsatisfactory and went wrong again:

But this time the error is different from the last one (indicating that the previous problem has been solved):

It turned out that the memory allocated by the Driver process was too small, at least more than 450m, and then I modified-- driver-memory 512m-- executor-memory 512m, to resubmit the task. As a result, it runs successfully!

Note:

The task call of yarn is used here, not the standalone that comes with spark. You need to add the parameters:

-- master yarn

-- deploy-mode cluster

Here-- deploy-mode, uses cluster cluster mode, client is client mode.

The difference between the two is that client indicates that Driver starts at the node where it is submitted, while cluster mode means that when Driver is put into the cluster, it starts.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.