Detailed explanation of the process of data interaction and integration between HBase and Hive 07/04 Update SLTechnology News&Howtos

Detailed explanation of the process of data interaction and integration between HBase and Hive

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Hive and Hbase integration theory

1. Why should hive integrate with hbase

2. Advantages and disadvantages of integration.

Advantages:

(1). Hive conveniently provides Hive QL interface to simplify the use of MapReduce.

HBase provides low-latency database access. If the two are combined, it can be beneficial.

Use the advantages of MapReduce to calculate and analyze a large amount of content stored in HBase offline.

(2)。 Easy to operate, hive provides a large number of system functions

Disadvantages:

Performance loss, hive has such a feature, it supports manipulating hbase through syntax similar to sql statements.

But the speed is slow.

3. What kind of preparatory work needs to be done for integration

4. The goal after integration

(1)。 Tables created in hive can be created and saved directly to hbase.

(2)。 Insert data into the table in hive, and the data is synchronously updated to the corresponding table in hbase.

(3)。 The column cluster value corresponding to hbase changes, and it also changes in the corresponding table in Hive.

(4)。 Realize the transformation of multi-column and multi-column clusters: (example: 3 columns in hive correspond to 2 columns in hbase)

5. What if you communicate after the integration of hive and Hbase?

View the hive and Hbase communication diagrams:

Hive is mainly implemented through hive-hbase-handler-1.2.1.jar under the lib directory of hive

Communicate with Hbase.

Integration process (case operation)

The data of the table created in hive is saved directly in hbase.

First, start hive. Go to the interactive interface and create a table.

Hive version: apache-hive-1.2.1

Hbase version: apache-hbase-1.1.2

Hadoop version: hadoop-2.7.3

First: create tables that hbase can recognize.

Build a table sentence:

Create table if not exists hive_hbase (

Id int

Name String

Age int

Sex String

Address String

)

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES ("hbase.columns.mapping" = ": key,cf_info:eName,cf_info:eAge,cf_info:eSex,cf_beizhu:eAddress")

TBLPROPERTIES ("hbase.table.name" = "ns2:hive_hbase01")

Note: the org.apache.hadoop.hive.hbase.HBaseStorageHandler class here is under the hive lib package and needs to be replaced with the .hive-1.2.1 version of the jar package. Otherwise, there will be an error indicating that this class cannot be found.

Error prompt: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Org.apache.hadoop.hbase.HTableDescriptor.addFamily (Lorg/apache/hadoop/hbase/HColumnDescriptor;) V

Nor can the hive version be too high. For example, version 2.x will report an error.

Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

Make sure that there is a database package for mysql-connector in the lib directory of the hive directory. Otherwise, you will make a mistake.

After creation, you can look at the table in hbase. List

Second:

Prepare your own test data. Omitted here

Create table test (

Id int

Name string)

Row format delimited fields terminated by','

Lines terminated by'\ n'

Stored as textfile

Load data into the table:

Load data local inpath'/ usr/local/test01.txt' overwrite into table test

Insert data into a table by means of a result set

Insert overwrite table hive_hbase select * from test

The mapreduce program will be run here. Process is omitted.

Third: query the inserted data in hbase

Select * from hive_hbase

20170616,zhangshaoqi,22,nan,jincheng

20170617,xuqianya,29,nv,beijing

20170618,xiaolin,29,nv,jincheng

20170619,xiaopan,33,nan,guizhou

20170620,xiaohu,26,nan,shouzhou

1 row (s) in 3.19 seconds

Fourth: scan the table in hbase to see if there is any data

Scan 'ns2:hive_hbase01'

Fifth: hive accesses the existing hbase

An external table of type external is required, otherwise an error will be reported

REATE EXTERNAL TABLE hbase_table_3 (key int, value string)

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES ("hbase.columns.mapping" = "info:name")

TBLPROPERTIES ("hbase.table.name" = "student")

Hive > CREATE EXTERNAL TABLE hbase_table_3 (key int, value string)

> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

> WITH SERDEPROPERTIES ("hbase.columns.mapping" = "info:name")

> TBLPROPERTIES ("hbase.table.name" = "student")

Time taken: 1.21 seconds

Note: if the column cluster name name data in hbase changes, the query results in hive will also change accordingly, if there are no other column clusters in hbase

If the content is updated, the query results will not be displayed in hive.

That's all. If you have any questions, welcome to discuss.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.