Query HBase through Hive 07/02 Update SLTechnology News&Howtos

Query HBase through Hive

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

The storage of online zipkin uses HBase0.94.6. At first, Dev wants to write MR directly to do offline analysis. Later, it is found that using Hive will improve the efficiency of development. (of course, the SQL interface of HBase and phoenix,Impala are queried here, but they are not mature enough, and offline analysis is not adhocquery,BTW. In the previous stage, when I talked to intel, their Hive Over HBase skipped MR, which was very efficient, but the money was a little expensive. =)

In fact, querying HBase with Hive is very simple:

/ / first create a table in HBase and insert several pieces of data hbase (main): 003main 0 > create 'table_inhbase' 'cf'0 row (s) in 1.2060 seconds= > Hbase::Table-table_inhbasehbase (main): 004put 0 > listTABLE table_inhbase 1 row (s) in 0.0350 secondshbase (main): 005Rose 0 > put' table_inhbase','row1','cf:a' 'value1'0 row (s) in 0.0830 secondshbase (main): 006 table_inhbase','row2','cf:a','value2'0 row 0 > put' table_inhbase','row2','cf:a','value2'0 row (s) in 0.0200 secondshbase (main): 007 secondshbase 0 > put 'table_inhbase','row3','cf:b' 'value3'0 row (s) in 0.0180 secondshbase (main): 008secondshbase 0 > scan' table_inhbase'ROW COLUMN+CELL row1 column=cf:a, timestamp=1383736436773 Value=value1 row2 column=cf:a, timestamp=1383736462917,value=value2 row3 column=cf:b, timestamp=1383736476017 Value=value3 3 row (s) in 0.0660 seconds// creates an external table in Hive Be sure to join ZK in hive-site.xml, otherwise you will hang, always retry localhost:2181CREATE EXTERNAL TABLE ext_table_inhbase (key string, avalue string,bvaluestring) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf:a,cf:b") TBLPROPERTIES ("hbase.table.name" = "table_inhbase") Hive > CREATE EXTERNAL TABLE ext_table_inhbase (key string, avaluestring,bvalue string) > STORED BY'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf:a,cf:b") > TBLPROPERTIES ("hbase.table.name" = "table_inhbase") OK// note that these two jar packages need to be added here: hbase-0.94.6-cdh5.4.0.jar,hive-hbase-handler-0.10.0-cdh5.4.0.jar or an exception hive > select * from ext_table_inhbase;OKrow1 value1 NULLrow2 value2 NULLrow3 NULL value3Time taken: 0.609 secondshive > select key,avalue from ext_table_inhbase will be thrown. Java.io.IOException: Cannot create an instance of InputSplit.apache.hadoop.hive.hbase.HBaseSplit:Classorg.apache.hadoop.hive.hbase.HBaseSplit not found at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields (HiveInputFormat.java:146) atorg.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize (WritableSerialization.java:73) atorg.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize (WritableSerialization.java:44) atorg.apache.hadoop. Mapred.MapTask.getSplitDetails (MapTask.java:356) atorg.apache.hadoop.mapred.MapTask.runOldMapper (MapTask.java:388) atorg.apache.hadoop.mapred.MapTask.run (MapTask.java:332) atorg.apache.hadoop.mapred.Child$4.run (Child.java:268) atjava.security.AccessController.doPrivileged (Native Method) atjavax.security.auth.Subject.doAs (Subject.java:396) atorg .apache.hadoop.security.UserGroupInformation.doas (UserGroupInformation.java:1408) hive > select key Avalue from ext_table_inhbase Total MapReduce jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorHadoop job information for Stage-1: number of mappers: 1 Number ofreducers: 019Stage-1 map 33Stage-1 map 55386 Stage-1 map = 0%, reduce = 01472 Stage-1 map = 100%, reduce = 0%, CumulativeCPU 2.73 sec19:34:03512 Stage-1 map = 100%, reduce = 0%, sec19:34:03512 Stage-1 map = 100%, reduce = 100% Cumulative CPU 2.73 secMapReduce Total cumulative CPU time: 2 seconds 730 msecEnded Job = job_201311061424_0003MapReduce Jobs Launched:Job 0: Map: 1 Cumulative CPU: 2.73 sec HDFS Read:255 HDFS Write: 39 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 730 msecOKrow1 value1row2 value2// try to query beeline >! connect jdbc:hive2://test-2:10000 hdfs hdfsorg.apache.hive.jdbc.HiveDriver Connecting to jdbc:hive2://test-2:10000Connected to: Hive (version 0.10) via HiveServer .0) Driver: Hive (version 0.10.0-cdh5.4.0) Transaction isolation: TRANSACTION_REPEATABLE_READ0: jdbc:hive2://test-2:10000 > show databases +-+ | database_name | +-+ | default | +-+ 1 row selected (1.483 seconds) 0: jdbc:hive2://test-2:10000 > show tables +-+ | tab_name | +-+ | ext_table_inhbase | | test | +-+ 2 rows selected (0.657 seconds) 0: jdbc:hive2://test-2:10000 > select count (*) from ext_table_inhbase +-+ | _ c0 | +-+ | 3 | +-+

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.