How to install and deploy spark2.0.1 and use jdbc to connect to hive-based sparksql 09/22 Update SLTechnology News&Howtos

How to install and deploy spark2.0.1 and use jdbc to connect to hive-based sparksql

2025-09-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

How to install and deploy spark2.0.1 and how to use jdbc to connect hive-based sparksql? I believe many inexperienced people don't know what to do about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

1. Installation

In the following configuration, the spark history service is configured in addition to spark

# first go to http://spark.apache.org/ to select the compiled package according to your environment, and then get the download connection cd / optmkdir sparkwget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.1-bin-hadoop2.6.tgztar-xvzf spark-2.0.1-bin-hadoop2.6.tgzcd spark-2.0.1-bin-hadoop2.6/conf

Make a copy of spark-env.sh.template and rename it spark-env.sh. Then edit the spark-env.sh

Export JAVA_HOME=/usr/java/jdk1.8.0_111export SPARK_MASTER_HOST=hadoop-n

Copy a spark-defaults.conf.template, change it to spark-defaults.conf, and then edit spark-defaults.conf

# specify the master address so that there is no need to add the-- master parameter at startup to start the cluster spark.master spark://hadoop-n:7077# to compile the sql query with bytecode. For small data query, it is recommended to turn off spark.sql.codegen true# and enable the task prediction execution mechanism. When a slow task occurs, try to execute a copy of the task on another node to help reduce the impact of individual slow tasks in a large cluster. Spark.speculation true# default serialization is slow This is officially recommended by spark.serializer org.apache.spark.serializer.KryoSerializer# to automatically compress the column storage in memory spark.sql.inMemoryColumnarStorage.compressed true# whether to enable the event log spark.eventLog.enabled true#event logging directory, it must be a globally visible directory If you need to set up a folder in hdfs first, whether spark.eventLog.dir hdfs://hadoop-n:9000/spark_history_log/spark-events# starts compressed spark.eventLog.compress true

Copy a slaves.template, change it to slaves, and then edit slaves

Hadoop-d1hadoop-d2

Copy a copy of hive-site.xml from $HIVE_HOME/conf to the current directory.

Edit the profile under / etc/ and add at the end

Export SPARK_HOME=/opt/spark/spark-2.0.1-bin-hadoop2.6export PATH=$PATH:$SPARK_HOME/binexport HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopexport SPARK_HISTORY_OPTS= "- Dspark.history.ui.port=18080-Dspark.history.retainedApplications=3-Dspark.history.fs.logDirectory=hdfs://hadoop-n:9000/spark_history_log/spark-events"

To ensure absolute validity, / etc/bashrc makes the same settings, and then refreshes the settings

Source / etc/profilesource / etc/bashrc2, startup

A) start hadoop first

Cd $HADOOP_HOME/sbin./start-dfs.sh

Visit http://ip:port 50070 to see if the startup is successful.

B) then start hive

Cd $HIVE_HOME/bin./hive-- service metastore

Execute the beeline or hive command to check whether the startup is successful. The default hive log is / tmp/$ {username} / hive.log

C) finally start spark

Cd $SPARK_HOME/sbin./start-all.sh

Sprark ui: http://hadoop-n:8080

Spark client

Cd $SPARK_HOME/bin./spark-shell

Sparksql client

Cd $SPARK_HOME/bin./spark-sql

Note the port number of the webui prompted after executing the command, and the corresponding monitoring information can be queried through webui.

Start thriftserver

Cd $SPARK_HOME/sbin./start-thriftserver.sh

Spark thriftserver ui: http://hadoop-n:4040

Start historyserver

Cd $SPARK_HOME/sbin./start-history-server.sh

Spark histroy ui: http://hadoop-n:18080

3. Use jdbc to connect hive-based sparksql

A) if hive starts hiveserver2, close

B) execute the following command to start the service

Cd $SPARK_HOME/sbin./start-thriftserver.sh

Execute the following command to test whether the startup is successful

Cd $SPARK_HOME/bin./beeline-u jdbc:hive2://ip:10000# is the actual output [root@hadoop-n bin] #. / beeline-u jdbc:hive2://hadoop-n:10000Connecting to jdbc:hive2://hadoop-n:1000016/11/08 21:03:05 INFO jdbc.Utils: Supplied authorities: hadoop-n:1000016/11/08 21:03:05 INFO jdbc.Utils: Resolved authority: hadoop-n:1000016/11/08 21:03:05 INFO Jdbc.HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://hadoop-n:10000Connected to: Spark SQL (version 2.0.1) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READBeeline version 1.2.1.spark2 by Apache Hive0: jdbc:hive2://hadoop-n:10000 > show databases +-+-+ | databaseName | +-+-+ | default | | test | +-+-- + 2 rows selected (0.829 seconds) 0: jdbc:hive2://hadoop-n:10000 >

Write code to connect to sparksql

Add dependencies according to your own environment

Jdk.tools jdk.tools 1.6 system ${JAVA_HOME} / lib/tools.jar org.apache.hive hive-jdbc 1.2.1 org.apache.hadoop hadoop-common 2.6.0

Then write the class

/ * @ Title: HiveJdbcTest.java * @ Package com.scc.hive * @ Description: TODO (describe what the file does in one sentence) * @ author scc * @ date 10:16:32 on November 9, 2016 * / package com.scc.hive;import java.sql.Connection;import java.sql.DriverManager;import java.sql.ResultSet;import java.sql.SQLException;import java.sql.Statement / * * @ ClassName: HiveJdbcTest * @ Description: TODO (here to describe the role of this class in one sentence) * @ author scc * @ date 10:16:32 on November 9, 2016 * / public class HiveJdbcTest {private static String driverName = "org.apache.hive.jdbc.HiveDriver"; public static void main (String [] args) throws SQLException {try {Class.forName (driverName) } catch (ClassNotFoundException e) {e.printStackTrace (); System.exit (1);} Connection con = DriverManager.getConnection ("jdbc:hive2://10.5.3.100:10000", "", "); Statement stmt = con.createStatement (); String tableName =" l_access "; String sql ="; ResultSet res = null Sql = "describe" + tableName; res = stmt.executeQuery (sql); while (res.next ()) {System.out.println (res.getString (1) + "\ t" + res.getString (2));} sql = "select * from" + tableName + "limit 10;"; res = stmt.executeQuery (sql) While (res.next ()) {System.out.println (res.getObject ("id"));} sql = "select count (1) from" + tableName; res = stmt.executeQuery (sql); while (res.next ()) {System.out.println ("count:" + res.getString (1));}

Here is the console output

Log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils). Log4j:WARN Please initialize the log4j system properly.log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.id intreq_name stringreq_version stringreq_param stringreq_no stringreq_status stringreq_desc stringret stringexcute_time intreq_time datecreate_time date212213214215216217218219220221count:9324, precautions

The cluster needs to configure ssh password-free login

Don't forget to copy hive's configuration file, or spark will create a physical database file locally

When hive starts, it prompts ls: cannot access / opt/spark/spark-2.0.1-bin-hadoop2.6/lib/spark-assembly-*.jar: No such file or directory, which does not affect the running of the program.

After reading the above, have you learned how to install and deploy spark2.0.1 and how to use jdbc to connect to hive-based sparksql? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.