In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly shows you "how cloudera executes spark hql on the spark-shell command line", which is easy to understand and clear. I hope it can help you solve your doubts. Let me lead you to study and learn this article "how cloudera executes spark hql on the spark-shell command line".
Compiling spark assembly that supports hive
Native spark assembly jar is independent of hive, and if you want to use spark hql, you must type hive-related dependency packages into spark assembly jar. Packaging method:
Assume that maven has been installed.
1 add environment variables, if these configurations of jvm are too small, it may result in OOM during compilation, so zoom in:
Export MAVEN_OPTS= "- Xmx2g-XX:MaxPermSize=512M-XX:ReservedCodeCacheSize=512m"
2 copy the scalastyle-config.xml under the spark source code to assembly
3 cd to the spark source directory, execute:
Mvn-Pyarn-Dhadoop.version=2.5.0-cdh6.3.0-Dscala-2.10.4-Phive- Phive-thriftserver-DskipTests clean package
(with the cdh version, just write mvn-Pyarn-Phive-DskipTests clean package)
Note that the versions of hadoop.version and scala are set to the corresponding versions
After a long compilation process (I compiled for two and a half hours), I finally succeeded. There is a spark- assembly-1.2.0-cdh6.3.0-hadoop2.5.0-cdh6.3.0.jar file in the assembly/target/scala-2.10 directory. Open it with rar to see if hive jdbc package is included, and if so, it means that the compilation is successful.
How cloudera executes spark hql on the spark-shell command line
I showed you how to compile spark-assembly.jar that contains hive
After the spark installed by cloudera manager, execute spark-shell directly and enter the command line, write the following statement:
Val hiveContext = new org.apache.spark.sql.hive.HiveContext (sc)
You will find that it cannot be passed, because the native spark installed with cm does not support spark hql, so we need to make some adjustments manually:
The first step is to upload the compiled JAR package containing hive to the default spark sharelib directory configured on hdfs: / user/spark/share/lib
Step 2: under the / opt/cloudera/parcels/CDH- 5.3.0-1.cdh6.3.0.p0.30/lib/spark/lib/ directory on the node where you want to run the spark-shell script, download the jar to this directory: hadoop fs-get hdfs://n1:8020/user/spark/share/lib/spark-assembly-with-hive-maven.jar (replace the body path with your own). Then under this directory, there will be a soft link spark-assembly.jar pointing to spark-assembly-1.2.0- cdh6.3.0-hadoop2.5.0-cdh6.3.0.jar. Let's delete this soft link and recreate a soft link with the same name: ln-s spark-assembly-with-hive-maven.jar spark-assembly.jar, pointing to the JAR package we just downloaded. This JAR package will be loaded into driver program's classpath when you start the spark-shell script, and sparkContext is also created in driver, so you need to replace the original spark-assembly.jar package with our compiled JAR package, so that when you start spark-shell, the spark-assembly containing hive is loaded into classpath.
Step 3: create a hive-site.xml under the / opt/cloudera/parcels/CDH/lib/spark/conf/ directory. The / opt/cloudera/parcels/CDH/lib/spark/conf directory is the default configuration directory for spark, but you can change the location of the default configuration directory. The hive-site.xml content is as follows:
Hive.metastore.local false hive.metastore.uris thrift://n1:9083 hive.metastore.client.socket.timeout 300 hive.metastore.warehouse.dir / user/hive/warehouse
As you all know, let spark find out where the hive metadata is, so there are some of the above configurations.
Step 4: modify / opt/cloudera/parcels/CDH/lib/spark/conf/spark- defaults.conf and add an attribute: spark.yarn.jar=hdfs://n1:8020/user/spark/share/lib / spark-assembly-with-hive-maven.jar. This is for each executor to be downloaded locally and then loaded under its own classpath, mainly for yarn-cluster mode. Local mode does not matter because driver and executor are the same process.
After the above, run spark-shell, and then type:
Val hiveContext = new org.apache.spark.sql.hive.HiveContext (sc)
There should be no problem. Let's execute one more statement to verify that it is connected to our specified hive Metabase:
HiveContext.sql ("show tables"). Take (10) / / take a look at the first ten tables
Finally, I would like to focus on the second step, step 3 and step 4 here. If it is yarn-cluster mode, you should replace the spark- assembly.jar of all nodes in the cluster. The spark conf directory of all nodes in the cluster needs to be added hive-site.xml, and each node spark-defaults.conf needs to add spark.yarn.jar=hdfs://n1:8020/user/spark/share/lib/spark-assembly-with- hive-maven.jar. You can write a shell script to replace, otherwise it is tiring to replace one node at a time.
The above is all the contents of the article "how cloudera executes spark hql on the spark-shell command line". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.