Combination of hive and hbase 10/28 Update SLTechnology News&Howtos

Combination of hive and hbase

2025-10-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

I. the combination of hive and hbase

Hive is often used in conjunction with Hbase, using Hbase as the storage path for Hive, so it is particularly important for Hive to integrate Hbase. Using Hive to read data in Hbase, you can use HQL statements to query and insert on HBase tables, or even complex queries such as Join and Union. This feature was introduced from Hive 0.6.0. The implementation of the integration of Hive and HBase uses their own external API interface to communicate with each other, which mainly depends on the classes in the hive-hbase-handler-*.jar tool. Using Hive to manipulate tables in HBase only provides convenience, and the hiveQL engine uses MapReduce, which is not satisfactory in terms of performance.

Steps:

1. Copy the hbase-related jar package to hive/lib, as follows:

[hadoop@bus-stable hive] $cp / opt/hbase/lib/hbase-protocol-1.4.5.jar / opt/hive/lib/ [hadoop@bus-stable hive] $cp / opt/hbase/lib/hbase-server-1.4.5.jar / opt/hive/lib/ [hadoop@bus-stable hive] $cp / opt/hbase/lib/hbase-client-1.4.5.jar / opt/hive/lib/ [hadoop@bus-stable hive] $cp / opt/hbase/ Lib/hbase-common-1.4.5.jar / opt/hive/lib/ [hadoop@bus-stable hive] $cp / opt/hbase/lib/hbase-common-1.4.5-tests.jar / opt/hive/lib/ [hadoop@bus-stable hive] $

2. Reference hbase in the hive-site.xml file and add the following:

[hadoop@bus-stable hive] $vim / opt/hive/conf/hive-site.xml hive.aux.jars.path file:///opt/hive/lib/hive-hbase-handler-2.3.3.jar, file:///opt/hive/lib/hbase-protocol-1.4.5.jar, file:///opt/hive/lib/hbase-server-1.4.5.jar, File:///opt/hive/lib/hbase-client-1.4.5.jar, file:///opt/hive/lib/hbase-common-1.4.5.jar, file:///opt/hive/lib/hbase-common-1.4.5-tests.jar, file:///opt/hive/lib/zookeeper-3.4.6.jar, File:///opt/hive/lib/guava-14.0.1.jar The location of the plugin jars that contain implementations of user defined functions and serdes. Hbase.zookeeper.quorum open-stable,permission-stable,sp-stable dfs.permissions.enabled false

3. Start hive:

[hadoop@bus-stable hive] $hive- hiveconf hbase.master=oversea-stable:60000 SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [SLF4J SLF4J: Found binding in [jar:file:/opt/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7] .25.jar! / org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in and may not be available in the future versions and may not be available in the future versions Async: trueHive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. Spark, tez) or using Hive 1.x releases.hive > create table htest (key int,value string) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties (' hbase.columns.mapping'=':key,f:value') tblproperties ('hbase.table.name'='htest'); OKTime taken: 9.376 secondshive > show databases;OKdefaultinspiryTime taken: 0.121 seconds, Fetched: 2 row (s) hive > show tables OKhtestTime taken: 0.047 seconds, Fetched: 1 row (s) hive > select * from htest; OKTime taken: 1.967 secondshive >

4. Verify the data in hbase:

[hadoop@oversea-stable opt] $hbase shell SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/hbase-1.4.5/lib/slf4j-log4j12-1.7.10.jarbank SLF4J] SLF4J: Found binding in [jar:file:/opt/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jarbank) SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase ShellUse "help" to get list of supported commands.Use "exit" to quit this interactive shell.Version 1.4.5 Rca99a9466415dc4cfc095df33efb45cb82fe5480 Wed Jun 13 15:13:00 EDT 2018hbase (main): 001main 0 > listTABLE htest 1 row (s) in 0.2970 seconds= > ["htest"] hbase (main): 002seconds= 0 > scan "htest" ROW COLUMN+CELL 0 row (s) in 0.1410 secondshbase (main): 003purl 0 >

Second, import external data

(1) the data files are as follows:

[hadoop@bus-stable ~] $cat score.csv

Hive,85

Hbase,90

Hadoop,92

Flume,89

Kafka,95

Spark,80

Storm,70

[hadoop@bus-stable ~] $hadoop fs-put score.csv / data/score.csv

[hadoop@bus-stable ~] $hadoop fs-ls / data/

Found 2 items

-rw-r--r-- 3 hadoop supergroup 88822 2018-06-15 10:32 / data/notepad.txt

-rw-r--r-- 3 hadoop supergroup 70 2018-06-26 15:59 / data/score.csv

[hadoop@bus-stable ~] $

(2) create an external table

Create an hive external table using the existing data on hdfs

Hive > create external table if not exists course.testcourse (cname string,score int) row format delimited fields terminated by', 'stored as textfile location' / data'

Time taken: 0.282 seconds

Hive > show databases

Course

Default

Inspiry

Time taken: 0.013 seconds, Fetched: 3 row (s)

Hive > use course

Time taken: 0.021 seconds

Hive > show tables

Testcourse

Time taken: 0.036 seconds, Fetched: 1 row (s)

Hive > select * from testcourse

Hive 85

Hbase 90

Hadoop 92

Flume 89

Kafka 95

Spark 80

Storm 70

Time taken: 2.272 seconds, Fetched: 7 row (s)

Hive >

Third, use HQL statement to create hbase table.

Use the HQL statement to create an Hive table that points to HBase, with the following syntax:

The table name in CREATE TABLE tbl_name (key int, value string) / / Hive tbl_nameSTORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' / / specifies the storage processor WITH SERDEPROPERTIES ("hbase.columns.mapping" = ": key,cf1:val") / / declares the column family, and the column name TBLPROPERTIES ("hbase.table.name" = "tbl_name", "hbase.mapred.output.outputtable" = "iteblog") / / hbase.table.name declares the HBase table name. The optional attribute is the same as the Hive table name by default. Hbase.mapred.output.outputtable specifies the table to be written when inserting data. If you need to insert data into the table later, you need to specify this value.

(1) create the statement as follows

Hive > create table course.hbase_testcourse (cname string,score int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ": key,cf:score") TBLPROPERTIES ("hbase.table.name" = "hbase_testcourse", "hbase.mapred.output.outputtable" = "hbase_testcourse"); OKTime taken: 3.745 secondshive > show databases;OKcoursedefaultinspiryTime taken: 0.019 seconds, Fetched: 3 row (s) hive > use course;OKTime taken: 0.02 secondshive > show tables OKhbase_testcoursetestcourseTime taken: 0.025 seconds, Fetched: 2 row (s) hive > select * from hbase_testcourse;OKTime taken: 1.883 secondshive >

(2) after creating an internal table, you can import the data of a table into HBase through insert overwrite supported by Hive.

Hive > insert overwrite table course.hbase_testcourse select cname,score from course.testcourse;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. Spark, tez) or using Hive 1.X releases.Query ID = hadoop_20180626170540_c7eecb8d-2925-4ad2-be7f-237d9815d1cbTotal jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1529932626564_0002, Tracking URL = http://oversea-stable:8088/proxy/application_1529932626564_0002/Kill Command = / opt/hadoop/bin/hadoop job-kill job_1529932626564_0002Hadoop job information for Stage-3: number of mappers: 1 Number of reducers: 02018-06-26 17 reduce 06V 02793 Stage-3 map = 0%, reduce = 0% 2018-06-26 17 seconds 06 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 6.12 secMapReduce Total cumulative CPU time: 6 seconds 120 msecEnded Job = job_1529932626564_0002MapReduce Jobs Launched: Stage-Stage-3: Map: 1 Cumulative CPU: 6.12 sec HDFS Read: 4224 HDFS Write: 0 seconds 120 msecOKTime taken: 41.489 secondshive > hive > select * from hbase_testcourse OKflume 89hadoop 92hbase 90hive 85kafka 95spark 80storm 70Time taken: 0.201 seconds, Fetched: 7 row (s) hive >

(3) verify hbase

Hbase (main): 011VO > listTABLE hbase_testcourse Htest 2 row (s) in 0.0110 seconds= > ["hbase_testcourse" "htest"] hbase (main): 012 hbase_testcourse 0 > scan "hbase_testcourse" ROW COLUMN+CELL flume column=cf:score, timestamp=1530003973026 Value=89 hadoop column=cf:score, timestamp=1530003973026, value=92 hbase column=cf:score, timestamp=1530003973026 Value=90 hive column=cf:score, timestamp=1530003973026, value=85 kafka column=cf:score, timestamp=1530003973026 Value=95 spark column=cf:score, timestamp=1530003973026, value=80 storm column=cf:score, timestamp=1530003973026 Value=70 7 row (s) in 0.0760 secondshbase (main): 01315 >

Use Hive to map tables that already exist in HBase

(1) create a HBase table in hbase, and enter the HBase Shell client to execute the table creation command

Hbase (main): 036 hbase_test', 0 > create 'hbase_test', {NAME = >' cf'} 0 row (s) in 2.2830 seconds= > Hbase::Table-hbase_test

(2) insert data

Hbase (main): 037 in 0 > put 'hbase_test','hadoop','cf:score',' 95 hbase_test','hadoop','cf:score', 0 row (s) in 0.1110 secondshbase (main): 038 in 0 > put 'hbase_test','storm','cf:score',' 96 Tibet 0 row (s) in 0.0120 secondshbase (main): 039 in 0 > put 'hbase_test','spark','cf:score',' 97 Tibet 0 row (s) in 0.0110 seconds

(3) View data

Hbase (main): 041 hbase_test 0 > scan "hbase_test" ROW COLUMN+CELL hadoop column=cf:score, timestamp=1530004351399 Value=95 spark column=cf:score, timestamp=1530004365368, value=97 storm column=cf:score, timestamp=1530004359169 Value=96 3 row (s) in 0.0220 secondshbase (main): 042 in 0 >

(4) enter the Hive Shell client and create an external table course.hbase_test. The table creation command is as follows

Hive > create external table course.hbase_test (cname string,score int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ": key,cf:score") TBLPROPERTIES ("hbase.table.name" = "hbase_test", "hbase.mapred.output.outputtable" = "hbase_test"); OKTime taken: 0.221 secondshive > show tables OKhbase_testhbase_testcoursetestcourseTime taken: 0.024 seconds, Fetched: 3 row (s) Note: the commands for creating external tables and internal tables are basically the same, except that create table is used to create internal tables and create external table is used to create external tables. Hive View data hive > select * from hbase_test;OKhadoop 95spark 97storm 96Time taken: 0.22 seconds, Fetched: 3 row (s) hive >

The Hive table is an external table, so deleting the table does not delete the data in the HBase table, there are a few points to note:

A), if key is not specified when creating or mapping a table, the first column defaults to the row key

B) there is no concept of timestamp in the Hive table corresponding to HBase. The latest version of the value is returned by default.

C) since there is no data type information in HBase, it is converted to String type when storing data

5. Use java to connect hive and operate hbase

Pom.xml

4.0.0 cn.itcast.hbase hbase 0.0.1-SNAPSHOT org.apache.hadoop hadoop-client 2.6.4 org.apache.hadoop hadoop-common 2.6.4 junit junit 4.12 org.apache.hbase hbase-client 1.4.0 org.apache.hbase hbase-server 1.4.0 org.apache.hive hive-jdbc 1.2.1 org.apache.hive hive-metastore 1.2.1 org.apache.hive hive-exec 1.2.1

Hive_Hbase.java

Package cn.itcast.bigdata.hbase;import java.sql.Connection;import java.sql.DriverManager;import java.sql.ResultSet;import java.sql.SQLException;import java.sql.Statement;public class Hive_Hbase {public static void main (String [] args) {try {Class.forName ("org.apache.hive.jdbc.HiveDriver"); Connection connection = DriverManager.getConnection ("jdbc:hive2://hadoop1:10000/shizhan02", "hadoop", "") Statement statement = connection.createStatement (); String sql = "SELECT * FROM hive_hbase_table_kv"; ResultSet res = statement.executeQuery (sql); while (res.next ()) {System.out.println (res.getString (2)) }} catch (ClassNotFoundException | SQLException e) {/ / TODO Auto-generated catch block e.printStackTrace ();}}

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.