Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to compile hive on spark

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article is about how hive on spark is compiled. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

Pre-condition statement

Hive on Spark is Hive running on Spark, using the Spark execution engine instead of MapReduce, just like Hive on Tez.

Since Hive version 1.1, Hive on Spark has become part of the Hive code and is on the spark branch.

Source code download

Git clone https://github.com/apache/hive.git hive_on_spark

Compile cd hive_on_spark/ git branch- r origin/HEAD-> origin/master origin/HIVE-4115 origin/HIVE-8065 origin/beeline-cli origin/branch-0.10 origin/branch-0.11 origin/branch-0.12 origin/branch-0.13 origin/branch-0.14 origin/branch-0.2 origin/branch-0.3 origin/branch-0.4 origin/branch-0.5 origin/branch-0.6 origin/branch-0.7 origin/branch-0.8 Origin/branch-0.8-r2 origin/branch-0.9 origin/branch-1 origin/branch-1.0 origin/branch-1.0.1 origin/branch-1.1 origin/branch-1.1.1 origin/branch-1.2 origin/cbo origin/hbase-metastore origin/llap origin/master origin/maven origin/next origin/parquet origin/ptf-windowing origin/release-1.1 origin/spark origin/spark-new origin/spark2 origin/tez origin/vectorization git checkout origin/spark git branch* (separated from origin/spark) master123456789101112131415161718192021222324252627282930313233343536373839404142434445

Modify $HIVE_ON_SPARK/pom.xml

Change the version of spark to spark1.4.1

1.4.11

Change the version of hadoop to 2.3.0-cdh6.1.0

2.3.0-cdh6.1.01

Compile command

Export MAVEN_OPTS= "- Xmx2g-XX:MaxPermSize=512M-XX:ReservedCodeCacheSize=512m" mvn clean package-Phadoop-2-DskipTests12 method of adding Spark dependency to Hive

Spark home:/home/cluster/apps/spark/spark-1.4.1

Hive home:/home/cluster/apps/hive_on_spark

1.set the property 'spark.home' to point to the Spark installation:

Hive > set spark.home=/home/cluster/apps/spark/spark-1.4.1; 1

Define the SPARK_HOME environment variable before starting Hive CLI/HiveServer2:

Export SPARK_HOME=/home/cluster/apps/spark/spark-1.4.11

3.Set the spark-assembly jar on the Hive auxpath:

Hive-auxpath / home/cluster/apps/spark/spark-1.4.1/lib/spark-assembly-*.jar1

Add the spark-assembly jar for the current user session:

Hive > add jar / home/cluster/apps/spark/spark-1.4.1/lib/spark-assembly-*.jar;1

Link the spark-assembly jar to $HIVE_HOME/lib.

Possible error in starting Hive: [ERROR] Terminal initialization failed Falling back to unsupportedjava.lang.IncompatibleClassChangeError: Found class jline.Terminal But interface was expected at jline.TerminalFactory.create (TerminalFactory.java:101) at jline.TerminalFactory.get (TerminalFactory.java:158) at jline.console.ConsoleReader. (ConsoleReader.java:229) at jline.console.ConsoleReader. (ConsoleReader.java:221) at jline.console.ConsoleReader. (ConsoleReader.java:209) at org.apache.hadoop.hive.cli.CliDriver.getConsoleReader (CliDriver.java:773) At org.apache.hadoop.hive.cli.CliDriver.executeDriver (CliDriver.java:715) at org.apache.hadoop.hive.cli.CliDriver.run (CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main (CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0 (NativeMethod) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl .java: 43) at java.lang.reflect.Method.invoke (Method.java:606) at org.apache.hadoop.util.RunJar.main (RunJar.java:212) Exception in thread "main" java.lang.IncompatibleClassChangeError: Found class jline.Terminal But interface was expected123456789101112131415161718

Solution: export HADOOP_USER_CLASSPATH_FIRST=true

For error resolution in other scenarios, see: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

You need to set spark.eventLog.dir parameters, such as:

Set spark.eventLog.dir= hdfs://master:8020/directory

Otherwise, the query will report an error, otherwise it will always report an error: a folder like / tmp/spark-event does not exist

After starting hive, set the execution engine to set the running mode of spark for spark:hive > set hive.execution.engine=spark;1: hive > set spark.master=spark://master:70771

Or yarn:spark.master=yarn.

Configure Spark-application configs for Hive

Can be configured in spark-defaults.conf or hive-site.xml

Spark.master=spark.eventLog.enabled=true; spark.executor.memory=512m; spark.serializer=org.apache.spark.serializer.KryoSerializer;spark.executor.memory=... # Amount of memory to use per executor process.spark.executor.cores=... # Number of cores per executor.spark.yarn.executor.memoryOverhead=...spark.executor.instances=... # The number of executors assigned to each application.spark.driver.memory=... # The amount of memory assigned to the Remote Spark Context (RSC). We recommend 4GB.spark.yar.driver.overheaders... # We recommend 400 (MB) .12345678910

For more information on parameter configuration, please see document: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started.

After executing the sql statement, you can view information such as job/stages hive (default) > select city_id, count (*) c from city_info group by city_id order by c desc limit 5 on the monitoring page. Query ID = spark_20150309173838_444cb5b1-b72e-4fc3-87db-4162e364cb1eTotal jobs = 1Launching Job 1 out of 1In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=In order to limit the maximum number of reducers: set hive.exec.reducers.max=In order to set a constant number of reducers: set mapreduce.job.reduces=state = SENTstate = STARTEDstate = STARTEDstate = STARTEDstate = STARTEDQuery Hive on Spark job [0] stages:1Status: Running (Hive on Spark job [0]) Job Progress FormatCurrentTime StageId _ StageAttemptId: SucceededTasksCount (+ RunningTasksCount-FailedTasksCount) / TotalTasksCount [StageCost] 2015-03-09 17 Stage-2_0 38 38 Stage-0_0: 0 (+ 1) / 1 Stage-1_0: 0 Stage-2_0 1 Stage-2_0: 0/1state = STARTEDstate = STARTED2015-03-09 17 38 38 Stage-2_0 14845 Stage-0_0: 0 (+ 1) / 1 Stage-1_0: 0 Stage-2_0: 0/1state = STARTEDstate = STARTED2015-03-09 1738 Stage-2_0 16861 Stage-0_ 0: 1secondsOKcity_id 1 Finished Stage-1_0: 0 (+ 1) / 1 Stage-2_0: 0/1state = SUCCEEDED2015-03-09 17 Stage-0_0 38 Stage-0_0: 1 Finished Stage-1_0: 1 Finished Stage-2_0: 1 FinishedStatus: Finished successfully in 10.07 secondsOKcity_id Clysee 1000 22826-10 17294-20 10608-1 6186 4158Time taken: 18.417 seconds Fetched: 5 row (s) Thank you for reading! This is the end of this article on "how to compile hive on spark". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report