How to compile Spark 04/25 Update SLTechnology News&Howtos

How to compile Spark

2025-04-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article is about how Spark compiles. Xiaobian thinks it is quite practical, so share it with everyone for reference. Let's follow Xiaobian and have a look.

1. compilation environment

CentOS6.6 JDK1.7.0_80 Maven3.2.5

2. Download the Spark source code and extract it

[yyl@vmnode ~]$ pwd/home/yyl[yyl@vmnode make]$ pwd/home/yyl/make[yyl@vmnode make]$ wget http://mirrors.cnnic.cn/apache/spark/spark-1.5.0/spark-1.5.0.tgz[yyl@vmnode make]$ tar -zxf spark-1.5.0.tgz

3. compilation

There is a pom.xml file in the root directory of the extracted source package, which is the step file for compiling Spark using Maven.

OK, now start compiling:

[yyl@vmnode spark-1.5.0]$ pwd/home/yyl/make/spark-1.5.0[yyl@vmnode spark-1.5.0]$ export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"[yyl@vmnode spark-1.5.0]$ mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package

Error reported during compilation:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:1.4:enforce (enforce-versions) on project spark-parent_2.10: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. -> [Help 1][ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles:[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

There are two solutions to this error: one is to add the-Denforcer.skip=true parameter at compile time; the other is to modify the variable value defined by properties in pom.xml file to the version of maven and java in the actual environment.

[yyl@vmnode spark-1.5.0]$ vim pom.xml 1.7 3.2.5

After resolving the above error, recompile, and the result is wrong:

[INFO] ------------------------------------------------------------------------[INFO] Reactor Summary:[INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [ 4.619 s][INFO] Spark Project Launcher ............................. SUCCESS [ 11.669 s][INFO] Spark Project Networking ........................... SUCCESS [ 11.537 s][INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 6.245 s][INFO] Spark Project Unsafe ............................... SUCCESS [ 17.217 s][INFO] Spark Project Core ................................. SUCCESS [04:15 min][INFO] Spark Project Bagel ................................ SUCCESS [ 22.739 s][INFO] Spark Project GraphX ............................... SUCCESS [01:09 min][INFO] Spark Project Streaming ............................ SUCCESS [02:04 min][INFO] Spark Project Catalyst ............................. SUCCESS [02:43 min][INFO] Spark Project SQL .................................. SKIPPED......--------------------------------------------------- java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)Caused by: scala.reflect.internal.Types$TypeError: bad symbolic reference. A signature in WebUI.class refers to term servletin value org.jetty which is not available.It may be completely missing from the current classpath, or the version onthe classpath might be incompatible with the version used when compiling WebUI.class. at scala.reflect.internal.pickling.UnPickler$Scan.toTypeError(UnPickler.scala:847) at scala.reflect.internal.pickling.UnPickler$Scan$LazyTypeRef.complete(UnPickler.scala:854) at scala.reflect.internal.pickling.UnPickler$Scan$LazyTypeRef.load(UnPickler.scala:863) at scala.reflect.internal.Symbols$Symbol.typeParams(Symbols.scala:1489)......

Building Spark using Maven requires Maven 3.3 or newer and Java 7+. The Spark build can supply a suitable Maven binary; see below. Decisively upgrade maven to 3.3.3, compile again, OK, compile successfully!

If you want to compile Scala 2.11.x compatible Spark, compile it using the following command (Scala 2.10.x compatible by default):

[yyl@vmnode spark-1.5.0]$ ./ dev/change-scala-version.sh 2.11[yyl@vmnode spark-1.5.0]$ mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package

Compile Spark with Hive and JDBC support

[yyl@vmnode spark-1.5.0]$ mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package

4. Generate deployment package

There is a make-distribution.sh script in the root directory of the source package. This script can package the distribution package of Spark. The make-distribution.sh file is actually compiled by calling Maven and can be run by the following command:

[yyl@vmnode spark-1.5.0]$ ./ make-distribution.sh --tgz -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver

Syntax of make-distribution.sh:./ make-distribution.sh [--name] [--tgz] [--mvn ] [--with-tachyon]

--tgz: Spark-$VERSION-bin.tgz is generated in the root directory, tgz file is not generated without this parameter, only/dist directory is generated

--name NAME: and tgz can be combined to generate spark-$VERSION-bin-$NAME.tgz deployment package, without this parameter NAME is hadoop version number

--with-tachyon: whether memory file system Tachyon is supported, tachyon is not supported without this parameter

PS: How to specify Hadoop versions at compile time

For example, Spark is reading files from Hadoop 2.5.2. How does maven compile them? The answer is: mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.5.2 -Phive -Phive-thriftserver -DskipTests clean package

Please see the official website for details:

In addition, if you encounter a new version of Hadoop that is incompatible with the official compilation file, you can only modify the pom.xml file by adding support for the new version, such as adding:

hadoop-2.7

2.7.1

0.9.3

3.4.6

2.6.0

Thank you for reading! About "Spark how to compile" this article is shared here, I hope the above content can be of some help to everyone, so that everyone can learn more knowledge, if you think the article is good, you can share it to let more people see it!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.