Spark on Yarn installation configuration 07/09 Update SLTechnology News&Howtos

Spark on Yarn installation configuration

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

1. Description

This article is deployed on the basis of xxx, requires the relevant configuration and dependencies of hadoop, etc., Spark on Yarn mode, Spark installation and configuration can be installed and synchronized on all nodes of the Yarn cluster. Without starting the service, there is no distinction between master and slave. Spark submits the task to Yarn, and the ResourceManager does the task scheduling.

2. Installation

Yum-y install spark-core spark-netlib spark-python

3. Configuration

Vim / etc/spark/conf/spark-defaults.confspark.eventLog.enabled falsespark.executor.extraJavaOptions-verbose:gc-XX:+PrintGCDetails-XX:+PrintGCDateStamps-XX:+UseConcMarkSweepGC-XX:CMSInitiatingOccupancyFraction=70-XX:MaxHeapFreeRatio=70-XX:+CMSClassUnloadingEnabledspark.driver.extraJavaOptions-Dspark.driver.log.level=INFO-XX:+UseConcMarkSweepGC-XX:MaxHeapFreeRatio=70-XX:+CMSClassUnloadingEnabled-XX:MaxPermSize=512Mspark.master yarn # # specify the operation mode of spark

PS: about the configuration of spark-env.sh, because my hadoop cluster is installed through yum, it is estimated that the relevant configuration and dependencies of hadoop can be found by using the default configuration. If the hadoop cluster is a binary package installation, you need to modify the corresponding path.

4. Test

A. Pass the spark-shell test

[root@ip-10-10-103144conf] # cat test.txt 1122334455 [root@ip-10-103144conf] # hadoop fs-put test.txt / tmp/Java HotSpot (TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 [Roo [root @ ip-10-1010103246 conf] # spark-shell Java HotSpot (TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M Support was removed in 8.0Setting default log level to "WARN" .To adjust logging level use sc.setLogLevel (newLevel). SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jarlemagedStaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/flume-ng/lib/slf4j-log4j12-1.7.5.jarlerBinder] Found binding in. Class] SLF4J: Found binding in [jar:file:/usr/lib/parquet/lib/slf4j-log4j12-1.7.5.jarbank implcancanStaticLoggerBinder.class] SLF4J: Found binding in [See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J Actual binding is of: SLF4J: Type [org.slf4j.impl.Log4jLoggerFactory] Welcome to _ / _ / / _\ / _ _\ / _ _ / `/ _ _ / / _ /. _ _ /. _ /\ _\ version 1.6.0 / _ / Using Scala version 2.10.5 (Java HotSpot (TM) 64-Bit Server VM, Java 1.8.0 / 121) Type in expressions to have them evaluated.Type: help for more information.Spark context available as sc (master = yarn-client App id = application_1494472050574_0009). SQL context available as sqlContext.scala > val file=sc.textFile ("hdfs://mycluster:8020/tmp/test.txt") file: org.apache.spark.rdd.RDD [String] = hdfs://mycluster:8020/tmp/test.txt MapPartitionsRDD [1] at textFile at: 27scala > val count=file.flatMap (line= > line.split (")) .map (test= > (test,1)). ReduceByKey (_ + _) count: org.apache.spark.rdd.RDD [(String) Int)] = ShuffledRDD [4] at reduceByKey at: 29scala > count.collect () res0: Array [(String, Int)] = Array ((33)), (55), (22), (44)), (11)) scala >

B. Pass the run-example test

[root@ip-10-10-103246 conf] # / usr/lib/spark/bin/run-example SparkPi 2 > & 1 | grep "Pi is roughly" Pi is roughly 3.1432557162785812

5. Problems encountered

The error in performing the spark-shell calculation is as follows:

Scala > val count=file.flatMap (line= > line.split ("")) .map (word= > (word) ReduceByKey (_ + _) 21:06:28 on 17-05-11 ERROR lzo.GPLNativeCodeLoader: Could not load native gpl libraryjava.lang.UnsatisfiedLinkError: no gplcompression in java.library.path at java.lang.ClassLoader.loadLibrary (ClassLoader.java:1867) at java.lang.Runtime.loadLibrary0 (Runtime.java:870) at java.lang.System.loadLibrary (System.java:1122) at com.hadoop.compression.lzo.GPLNativeCodeLoader. (GPLNativeCodeLoader.java: 32) at com.hadoop.compression.lzo.LzoCodec. (LzoCodec.java:71) at java.lang.Class.forName0 (Native Method) at java.lang.Class.forName (Class.java:348) at $line20.$read. (48) at $line20.$read$. (: 52) at $line20.$read$. () at $line20.$eval$. (: 7) at $line20.$eval$. () At $line20.$eval.$print () at sun.reflect.NativeMethodAccessorImpl.invoke0 (NativeMethod) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:498) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call (SparkIMain.scala:1045) at org.apache.spark .repl.SparkIMain $Request.loadAndRun (SparkIMain.scala:1326) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1 (SparkIMain.scala:821) at org.apache.spark.repl.SparkIMain.interpret (SparkIMain.scala:852) at org.apache.spark.repl.SparkIMain.interpret (SparkIMain.scala:800) at org.apache.spark.repl.SparkILoop.reallyInterpret$1 (SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith (SparkILoop.scala:902)

Solution:

Add in spark-env.sh

Export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/hadoop/lib/native/

Just let Spark find lzo's lib package.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.