Download and compile spark-2.4.2.tgz 07/09 Update SLTechnology News&Howtos

Download and compile spark-2.4.2.tgz

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Doesn't 51CTO have a directory function? It hurts.

If you have any questions, please add Penguin to discuss ^-^

1176738641

Pre-preparation folder creation # create five folders under the user directory app # to store the application software # to store the application package data # to store the test data lib # to store the jar package source # to store the software and version apache-maven-3.6.1-bin.tar.gzhadoop-2.6.0-cdh6.14.0 needed for the source code download .tar.gzjdk-8u131-linux-x64.tar.gzscala-2.11.8.tgz install jdk8 uninstall existing jdkrpm-qa | grep java# if the installed version is less than 1.7 Uninstall the jdkrpm-e package 1 package 2 extract the jdk to the tar-zxf jdk-8u131-linux-x64.tar.gz-C ~ / app/ test jdk8 in the ~ / app directory ~ / app/jdk1.8.0_131/bin/java-versionjava version "1.8.0x131" Java (TM) SE Runtime Environment (build 1.8.0_131-b11) Java HotSpot (TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

The version information is printed normally, which means that the installation is successful.

Configure environment variables

Remember > > for addition! > to overwrite! Make sure you don't hit it >

Echo "# JAVA_HOME####" echo "export JAVA_HOME=/home/max/app/jdk1.8.0_131" > ~ / .bash_profile echo "export PATH=$JAVA_HOME/bin:$PATH" > > ~ / .bash_profile # refresh environment variable source ~ / .bash_profile

At this point, the use of java-version can take effect in any directory

Install maven and extract to ~ / app/tar-zxvf apache-maven-3.6.1-bin.tar.gz-C ~ / app Test maven successfully installed ~ / app/apache-maven-3.6.1/bin/mvn-vApache Maven 3.6.1 (d66c9c0b3152b2e69ee9bac180bb8fcc8e6af555 2019-04-04T15:00:29-04:00) Maven home: / home/max/app/apache-maven-3.6.1Java version: 1.8.0: 131, vendor: Oracle Corporation, runtime: / home/max/app/jdk1.8.0_131/jreDefault locale: en_US, platform encoding: UTF-8OS name: "linux", version: "2.6.32-358.el6.x86_64", arch: "amd64", family: "unix"

The version information is displayed, indicating that the installation was successful

Add environment variabl

Remember > > for addition! > to overwrite! Make sure you don't hit it >

Echo "# MAVEN_HOME####" > > ~ / .bash_profileecho "export MAVEN_HOME=/home/max/app/apache-maven-3.6.1/" > > ~ / .bash_profileecho "export PATH=$MAVEN_HOME/bin:$PATH" > > ~ / .bash_profile # refresh environment variable source ~ / .bash_profile

At this point, the use of mvn-v can take effect in any directory

Configure local warehouse directory & & remote warehouse address # create local warehouse folder mkdir ~ / maven_repo# modify settings.xml file vim $MAVEN_HOME/conf/settings.xml

Pay attention to the label! Don't conflict with existing tags.

/ home/max/maven_repo nexus-aliyun * ! cloudera Nexus aliyun http://maven.aliyun.com/nexus/content/groups/public installation Scala decompressed to ~ / app/tar-zxf scala-2.11.8.tgz-C ~ / app/ Test scala installation succeeded ~ / app/scala-2.11.8/bin/scala scala > [max@hadoop000 scala-2.11.8] $scalaWelcome to Scala 2.11.8 (Java HotSpot (TM) 64-Bit Server VM Java 1.8.0mm 131). Type in expressions for evaluation. Or try: help.scala >

Installation succeeded

Add the environment variable echo "# SCALA_HOME####" > > ~ / .bash_profileecho "export SCALA_HOME=/home/max/app/scala-2.11.8" > > ~ / .bash_profileecho "export PATH=$SCALA_HOME/bin:$PATH" > > ~ / .bash_profile # refresh the environment variable source ~ / .bash_profile

At this point, the use of scala can take effect in any directory

Install Git

Default is CentOS user

Sudo yum install git# automatic installation During this period, you need to press y # several times to display the following Installed: git.x86_64 0 9.el6_9 Dependency Installed 1.7.1-9.el6_9 Dependency Installed: perl-Error.noarch 1v 0.17015-4.el6 perl-Git.noarch 0v 1.7.1-9.el6_9 Dependency Updated: openssl.x86_64 0:1.0.1e-57.el6 complet # installed successfully

The preliminary work is finally over!

= tired as a dog's dividing line =

In fact, the long road to compilation has only just begun.

Download and compile the Spark source code

Sacrifice a mass murder weapon! = = > refer to the official website

Download & extract the Spark2.4.2 source code cd ~ / sourcewget https://archive.apache.org/dist/spark/spark-2.4.2/spark-2.4.2.tgz# sometimes slow [max@hadoop000 source] $lltotal 15788Murray RW Murray RW Murray. 1 max max 16165557 Apr 28 12:27 spark-2.4.2.tgz [max@hadoop000 source] $tar-zxf spark-2.4.2.tgz about Maven

Instead of using the mvn command, we use the make-distribution.sh script directly, but we need to modify it.

Vim. / dev/make-distribution.sh# under the # spark-2.4.2 folder comments these lines here as a best practice To reduce compilation time # VERSION=$ ("$MVN" help:evaluate-Dexpression=project.version $@ 2 > / dev/null\ # | grep-v "INFO"\ # | grep-v "WARNING"\ # | tail-n 1) # SCALA_VERSION=$ ("$MVN" help:evaluate-Dexpression=scala.binary.version $@ 2 > / dev/null\ # | grep-v "INFO"\ # | grep-v "WARNING"\ # | tail -n 1) # SPARK_HADOOP_VERSION=$ ("$MVN" help:evaluate-Dexpression=hadoop.version $@ 2 > / dev/null\ # | grep-v "INFO"\ # | grep-v "WARNING"\ # | tail-n 1) # SPARK_HIVE=$ ("$MVN" help:evaluate-Dexpression=project.activeProfiles-pl sql/hive $@ 2 > / dev/null\ # | grep-v "INFO"\ # | grep-v "WARNING"\ # | fgrep-count "hive" \ # # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\ # # because we use "set-o pipefail" # echo-n) # # add a parameter. Note that the version number should modify the pom.xml corresponding to the production environment VERSION=2.4.2SCALA_VERSION=2.11SPARK_HADOOP_VERSION=hadoop-2.6.0-cdh6.14.0SPARK_HIVE=1 you want.

In the default library of maven, only apache version of Hadoop is dependent by default, but since our hadoop version is hadoop-2.6.0-cdh6.14.0, we need to add CDH repository to the pom file.

# vim pom.xml central http://maven.aliyun.com/nexus/content/groups/public// true true always fail cloudera https://repository.cloudera.com/artifactory/cloudera-repos/ under the spark-2.4.2 folder starts compilation . / dev/make-distribution.sh\-- name hadoop-2.6.0-cdh6.14.0\-- tgz\-Phadoop-2.6\-Dhadoop.version=2.6.0-cdh6.14.0\-Phive- Phive-thriftserver\-Pyarn\-Pkubernetes

It takes about 1 hour to compile for the first time. I am Aliyun image.

Compiling again will approximately require 10min

Note: if you report an error, you must learn to read the wrong log!

# # compilation completed

# Log + mkdir / home/max/source/spark-2.4.2/dist/conf+ cp / home/max/source/spark-2.4.2/conf/docker.properties.template / home/max/source/spark-2.4.2/conf/fairscheduler.xml.template / home/max/source/spark-2.4.2/conf/log4j.properties.template / home/max/source/spark-2.4.2/conf/metrics.properties. Template / home/max/source/spark-2.4.2/conf/slaves.template / home/max/source/spark-2.4.2/conf/spark-defaults.conf.template / home/max/source/spark-2.4.2/conf/spark-env.sh.template / home/max/source/spark-2.4.2/dist/conf+ cp / home/max/source/spark-2.4.2/README.md / home/max/source/spark-2.4.2 / dist+ cp-r / home/max/source/spark-2.4.2/bin / home/max/source/spark-2.4.2/dist+ cp-r / home/max/source/spark-2.4.2/python / home/max/source/spark-2.4.2/dist+'['false = = true']'+ cp-r / home/max/source/spark-2.4.2/sbin / home/max/source/spark-2.4.2/dist+'[' -d / home/max/source/spark-2.4.2/R/lib/SparkR']'+'['true = = true']'+ TARDIR_NAME=spark-2.4.2-bin-hadoop-2.6.0-cdh6.14.0+ TARDIR=/home/max/source/spark-2.4.2/spark-2.4.2-bin-hadoop-2.6.0-cdh6.14.0+ rm-rf / home/max/source/spark-2.4.2/spark -2.4.2 cp-r / home/max/source/spark-2.4.2/dist / home/max/source/spark-2.4.2/spark-2.4.2-bin-hadoop-2.6.0-cdh6.14.0+ tar czf spark-2.4.2-bin-hadoop-2.6.0-cdh6.14.0.tgz-C / home/max/source/spark-2.4 .2 spark-2.4.2-bin-hadoop-2.6.0-cdh6.14.0+ rm-rf / home/max/source/spark-2.4.2/spark-2.4.2-bin-hadoop-2.6.0-cdh6.14.0

From this we can see that the compiled package is under the spark source package.

Extract [max@hadoop000 spark-2.4.2] $tar-zxf spark-2.4.2-bin-hadoop-2.6.0-cdh6.14.0.tgz-C ~ / app/ [max@hadoop000 spark-2.4.2] $cd ~ / app/ [max@hadoop000 app] $lltotal 16drwxrwxr-x. 6 max max 4096 Apr 28 17:02 apache-maven-3.6.1drwxr-xr-x. 8 max max 4096 Mar 15 2017 jdk1.8.0_131drwxrwxr-x. 6 max max 4096 Mar 4 2016 scala-2.11.8drwxrwxr-x. 11 max max 4096 Apr 28 21:20 spark-2.4.2-bin-hadoop-2.6.0-cdh6.14.0

We're done!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.