How to use java code to submit hive sql tasks for Spark 07/01 Update SLTechnology News&Howtos

How to use java code to submit hive sql tasks for Spark

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces how to use java code to submit Spark hive sql tasks, the article is very detailed, has a certain reference value, interested friends must read it!

My environment: hadoop 2.7.1, spark 1.6.0, hive 2.0, java 1.7

Goal: to run the submit spark application through java-jar xxx.jar and execute the query hive sql.

Question 1: first of all, I would like to mention that if you execute according to java-jar, an java.lang.OutOfMemoryError: PermGen space error will be reported, so you need to start with the following parameters

Java-Xms1024m-Xmx1024m-XX:MaxNewSize=256m-XX:MaxPermSize=256m-jar spark.jar

Question 2: if the three jar packages of datanucleus are not added, the following error http://zengzhaozheng.blog.51cto.com/8219051/1597902?utm_source=tuicool&utm_medium=referral will be reported

Javax.jdo.JDOFatalUserException: Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found. At javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation (JDOHelper.java:1175) at javax.jdo.JDOHelper.getPersistenceManagerFactory (JDOHelper.java:808) at javax.jdo.JDOHelper.getPersistenceManagerFactory (JDOHelper.java:701) at org.apache.hadoop.hive.metastore.ObjectStore.getPMF (ObjectStore.java:365) at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager (ObjectStore.java:394). NestedThrowablesStackTrace:java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory at java.net.URLClassLoader$1.run (URLClassLoader.java:366) at java.net.URLClassLoader$1.run (URLClassLoader.java:355) at java.security.AccessController.doPrivileged (Native Method) at java.net.URLClassLoader.findClass (URLClassLoader.java:354) at java.lang.ClassLoader.loadClass (ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass (Launcher.java:308) at java.lang.ClassLoader.loadClass (ClassLoader.java:358) at java.lang.Class.forName0 (Native Method) at java.lang.Class.forName (Class.java:274) at javax.jdo.JDOHelper$18.run (JDOHelper.java:2018) at javax.jdo.JDOHelper$18.run (JDOHelper.java:2016) at java.security.AccessController.doPrivileged (Native Method) at Javax.jdo.JDOHelper.forName (JDOHelper.java:2015) at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation (JDOHelper.java:1162) at javax.jdo.JDOHelper.getPersistenceManagerFactory (JDOHelper.java:808) at javax.jdo.JDOHelper.getPersistenceManagerFactory (JDOHelper.java:701) at org.apache.hadoop.hive.metastore.ObjectStore.getPMF (ObjectStore.java:365) at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager (ObjectStore.java:394 ) at org.apache.hadoop.hive.metastore.ObjectStore.initialize (ObjectStore.java:291) at org.apache.hadoop.hive.metastore.ObjectStore.setConf (ObjectStore.java:258) at org.apache.hadoop.util.ReflectionUtils.setConf (ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance (ReflectionUtils.java:133) at org.apache.hadoop.hive.metastore.RawStoreProxy. (RawStoreProxy.java:57 ) at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy (RawStoreProxy.java:66) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore (HiveMetaStore.java:593) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS (HiveMetaStore.java:571) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB (HiveMetaStore.java:620) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init ( HiveMetaStore.java:461).

Question 3: the master set by SparkConf in the java code, that is, the spark mode you choose. I use yarn-client mode here, and I will report an error if I write yarn-cluster. Http://stackoverflow.com/questions/31327275/pyspark-on-yarn-cluster-mode, the summary of its web page content:

1. If you want to embed spark code directly into your web app, you need to use yarn-client2. If you want your spark code to be loosely coupled to the yarn-cluster pattern for actual use, you can create another child thread of python to call spark-submit to execute the yarn-cluster mode.

Problem 4: three additional configuration files are needed: core-site.xml, hdfs-site.xml, and hive-site.xml. Otherwise, starting the java-jar command will directly report an error.

So, the correct java calls spark to execute the hive sql code as follows:

Create a java project and introduce the spark-assembly-1.6.0-hadoop2.6.0.jar package. This package is in the lib directory of spark's installation directory, 178m, which is really big.

The code for calling java is as follows. My code will be packaged as spark.jar and stored in / data/houxm/spark/spark.jar:

Package cn.centaur.test.spark;import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaSparkContext;import org.apache.spark.sql.hive.HiveContext; public class SimpleDemo {public static void main (String [] args) {String [] jars = new String [] {"/ data/houxm/spark/spark.jar"} SparkConf conf = new SparkConf (). SetAppName ("simpledemo"). SetMaster ("yarn-client"). Set ("executor-memory", "2g") .setJars (jars) .set ("driver-class-path", "/ data/spark/lib/mysql-connector-java-5.1.21.jar"); JavaSparkContext sc = new JavaSparkContext (conf); HiveContext hiveCtx = new HiveContext (sc); testHive (hiveCtx); sc.stop () Sc.close ();} / Test spark sql query table public static void testHive (HiveContext hiveCtx) {hiveCtx.sql ("create table temp_spark_java as select mobile,num from default.mobile_id_num02 limit 10");}}

Create a new MANIFEST.MF file in the root directory of the java project as follows:

Manifest-Version: 1.0Class-Path: / data/spark/lib/spark-assembly-1.6.0-hadoop2.6.0.jar / data/spark/lib/mysql-connector-java-5.1.21.jar / data/spark/lib/datanucleus-api-jdo-3.2.6.jar / data/spark/lib/datanucleus-core-3.2.10.jar / data/spark/lib/datanucleus-rdbms-3.2.9.jarMain-Class: cn.centaur.test.spark.SimpleDemo

In the resources directory (mine is the maven project, ordinary java project can add files under src), add core-site.xml, hdfs-site.xml, hive-site.xml three configuration files.

Using eclipse, package the java code according to this manifest file. Generate a jar file, upload it to the server, and run it.

The above is all the contents of the article "how to submit Spark's hive sql tasks using java code". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.