Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the method of hive interacting with hbase data

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article introduces the knowledge of "what is the method of data interaction between hive and hbase". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

The Integration principle of HBase and Hive

ApacheCN | apache Chinese website

Hive and Hbase have different characteristics: hive is high-latency, structured and analysis-oriented, and hbase is low-latency, unstructured and programming-oriented. Hive data warehouses have high latency on hadoop. Hive integrates Hbase to use some of the features of hbase. The following is the integrated architecture of hive and hbase:

Figure 1 hive and hbase architecture diagram

Hive integrated HBase can effectively take advantage of the storage features of HBase databases, such as row updates and column indexes. Pay attention to maintaining the consistency of HBase jar packages during the integration process. Hive integration HBase needs to establish a mapping relationship between Hive table and HBase table, that is, the column (columns) and column type (column types) of Hive table are associated with the column family (column families) and column qualifier (column qualifiers) of HBase table. Every field in the Hive table exists in the HBase, and the Hive table does not need to contain all the columns in the HBase. The RowKey in HBase corresponds to Hive for selecting a domain: key to correspond, and the column family (cf:) is mapped to all other fields in Hive, listed (cf:cq). For example, figure 2 below shows that the Hive table is mapped to the HBase table:

Figure 2 Hive table mapping HBase table

Basic introduction

Hive is a data warehouse tool based on Hadoop, which can map structured data files to a database table, provide complete sql query functions, and transform sql statements into MapReduce tasks to run. Its advantage is that the learning cost is low, simple MapReduce statistics can be quickly realized through SQL-like statements, and there is no need to develop special MapReduce applications, so it is very suitable for statistical analysis of data warehouse.

The integration function of Hive and HBase is realized by using their own external API interfaces to communicate with each other, which mainly depends on the hive_hbase-handler.jar tool class, which roughly means as shown in the figure:

Software version

Software version used: (just find it in Baidu with no download address)

Jdk-6u24-linux-i586.bin

Hive-0.9.0.tar.gz http://yun.baidu.com/share/link?shareid=2138315647&uk=1614030671

Hbase-0.94.14.tar.gz http://mirror.bit.edu.cn/apache/hbase/hbase-0.94.14/

Hadoop-1.1.2.tar.gz http://pan.baidu.com/s/1mgmKfsG

Installation location

Install directory: / usr/local/ (remember to unzip and rename it)

The installation path of Hbase is: / usr/local/hbase

The installation path of Hive is: / usr/local/hive

Integration steps

The process of integrating hive with hbase is as follows:

1. Under / usr/local/hbase-0.90.4:

Copy hbase-0.94.14.jar,hbase-0.94.14-tests.jar and lib/zookeeper-3.4.5.jar to the / usr/local / hive/lib folder

Note:

If other versions of these two files already exist under hive/lib (for example, zookeeper-3.3.1.jar)

It is recommended to use the relevant version under hbase after deletion.

Still need

Copy protobuf-java-2.4.0a.jar to / usr/local/hive/lib and / usr/local/hadoop/lib

two。 Modify the hive-site.xml file

Under the directory under / usr/local/hive/conf, add the following at the bottom of hive-site.xml:

(jump to the bottom linux command: press and hold ESC + colon + $and enter.) hive.querylog.location / usr/local/hive/logs hive.aux.jars.path Note: create your own if it doesn't exist, or rename the file and use it. Copy to all nodes included. Copy the files to all nodes included.

Note that the following error is likely to occur when running hive if you skip the two steps of 3pr. 4:

Org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately.

This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and

Then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information. At org.apache.hadoop.

Hbase.zookeeper.ZooKeeperWatcher.

5 start hive (test succeeded)

Single node startup

Bin/hive-hiveconf hbase.master=master:60000

Cluster startup (I didn't test this)

Bin/hive-hiveconf hbase.zookeeper.quorum=node1,node2,node3 (all zookeeper nodes)

If hive.aux.jars.path is not configured in the hive-site.xml file, you can start it as follows.

Hive--auxpath / opt/mapr/hive/hive-0.7.1/lib/hive-hbase-handler-0.7.1.jar,/opt/mapr/hive/hive-0.7.1/lib/hbase-0.90.4.jar,/opt/mapr/hive/hive-0.7.1/lib/zookeeper-3.3.2.jar-hiveconf hbase.master=localhost:60000

Modified hive configuration file hive-site.xml after testing

Hive.zookeeper.quorum

Node1,node2,node3

The list of zookeeper servers to talk to. This is only needed for read/write locks.

You can join hive with hbase without adding parameters to start hbase

Test hive into hbase

Test after startup (restart the cluster)

1. Create tables that can be recognized by hbase with hive

The statement is as follows:

Create table hbase_table_1 (key int, value string)

Stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

With serdeproperties ("hbase.columns.mapping" >: key,cf1:val ")

Tblproperties ("hbase.table.name" = "xyz")

Now you find an extra table 'xyz'' in hbase shell.

(you can skip this sentence first: hbase.table.name is defined in the table name of hbase

When multi-column: data:1,data:2; multi-column family: data1:1,data2:1;)

Hbase.columns.mapping is defined in the column family of hbase, where: key is a fixed value and make sure that the foo field in the table pokes is a unique value

Create a partitioned table

Create table hbase_table_1 (key int, value string)

Partitioned by (day string)

Stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

With serdeproperties ("hbase.columns.mapping" = ": key,cf1:val")

Tblproperties ("hbase.table.name" = "xyz")

Table modification is not supported

It will be prompted that the non-local surface cannot be modified.

Hive > ALTER TABLE hbase_table_1 ADD PARTITION (day = '2012-09-22')

FAILED: Error in metadata: Cannot use ALTER TABLE on a non-native table FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

two。 Import data into the associated hbase table to 1. 0. Create a new intermediate table in hive

Create table pokes (foo int,bar string)

Row format delimited fields terminated by','

Batch import data

Load data local inpath'/ home/1.txt' overwrite into table pokes

The contents of the 1.txt file are

1,hello

2,pear

3,world

Import hbase_table_1 using sql

Set hive.hbase.bulk=true

two。 Insert data into the hbase table

Insert overwrite table hbase_table_1

Select * from pokes

Import tables with partitions

Insert overwrite table hbase_table_1 partition (day='2012-01-01')

Select * from pokes

3. View the table associated with the hbase

Hive > select * from hbase_table_1

OK

1 hello

2 pear

3 world

(note: partitioned tables integrated with hbase are stored in a problem select * from table cannot query data, select key,value from table can find data)

4. Log in to hbase to view the data in that table

Hbase shell

Hbase (main): 002 describe 0 > xyz'

DESCRIPTION ENABLED {NAME = > 'xyz', FAMILIES = > [{NAME = >' cf1', BLOOMFILTER = > 'NONE', REPLICATION_S true

COPE = > '014, COMPRESSION = >' NONE', VERSIONS = >'3, TTL = > '2147483647, BLOCKSI

ZE = > '65536', IN_MEMORY = > 'false', BLOCKCACHE = >' true'}]}

1 row (s) in 0.0830 seconds

Hbase (main): 003 scan 0 > xyz'

ROW COLUMN+CELL

1 column=cf1:val, timestamp=1331002501432, value=hello

2 column=cf1:val, timestamp=1331002501432, value=pear

3 column=cf1:val, timestamp=1331002501432, value=world

At this point, you can see the data you just inserted in hive in Hbase.

Test hbase to hive 1. Create a table in hbase

Create 'test1','a','b','c'

Put 'test1','1','a','qqq'

Put 'test1','1','b','aaa'

Put 'test1','1','c','bbb'

Put 'test1','2','a','qqq'

Put 'test1','2','c','bbb'

two。 Associate tables in hbase to hive

For tables that already exist in hbase, use CREATE EXTERNAL TABLE in hive to create

For example, the table name in hbase is test1, and the fields are a:, b:, c: build the table statement in hive as

Create external table hive_test

(key int,gid map,sid map,uid map)

Stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

With serdeproperties ("hbase.columns.mapping" = "aVOR Magna BJV Magi c:")

Tblproperties ("hbase.table.name" = "test1")

two。 Check the data in test1

After the table is established in hive, query the contents of the test1 table in hbase

Select * from hive_test

OK

1 {"": "qqq"} {":" aaa "} {":" bbb "}

2 {"": "qqq"} {} {":" bbb "}

The method to query the value in the gid field is

Select gid ['] from hive_test

Get the query results

Total MapReduce jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_201203052222_0017, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201203052222_0017

Kill Command = / opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job-Dmapred.job.tracker=maprfs:///-kill job_201203052222_0017

2012-03-06 14 38 29141 Stage-1 map = 0%, reduce = 0%

2012-03-06 14 3815 33171 Stage-1 map = 100%, reduce = 100%

Ended Job = job_201203052222_0017

OK

Qqq

Qqq

If the field in the hbase table test1 is user:gid,user:sid,info:uid,info:level, the table statement in hive is

Create external table hive_test

(key int,user map,info map)

Stored by 'org.apache.hadoop.hive.hbase.hbasestoragehandler'

With serdeproperties ("hbase.columns.mapping" = "user:,info:")

Tblproperties ("hbase.table.name" = "test1")

The method to query the hbase table is

Select user ['gid'] from hive_test

Note: hive connection hbase optimization to add configuration to the hbase-site.xml file in HADOOP_HOME/conf

Hbase.client.scanner.caching

10000

Or execute hive > set hbase.client.scanner.caching=10000 before executing the hive statement

Error report: Hive error report

1.NoClassDefFoundError

Could not initialize class java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.io.HbaseObjectWritable

Add protobuf-***.jar to the jars path

/ / $HIVE_HOME/conf/hive-site.xml

Hive.aux.jars.path

File:///data/hadoop/hive-0.10.0/lib/hive-hbase-handler-0.10.0.jar,file:///data/hadoop/hive-0.10.0/lib/hbase-0.94.8.jar,file:///data/hadoop/hive-0.10.0/lib/zookeeper-3.4.5.jar,file:///data/hadoop/hive-0.10.0/lib/guava-r09.jar, File:///data/hadoop/hive-0.10.0/lib/hive-contrib-0.10.0.jar,file:///data/hadoop/hive-0.10.0/lib/protobuf-java-2.4.0a.jar

Hbase reported an error

: java.lang.NoClassDefFoundError: com/google/protobuf/Message

Write a Hbase program, the system prompts an error, java.lang.NoClassDefFoundError: com/google/protobuf/Message

After searching for a long time, I found something from this place: http://abloz.com/2012/06/15/hive-execution-hbase-create-the-table-can-not-find-protobuf.html

The contents are as follows:

Hadoop:1.0.3

Hive:0.9.0

Hbase:0.94.0

Protobuf:$HBASE_HOME/lib/protobuf-java-2.4.0a.jar

As you can see, the jar of the hbase included in the 0.9.0 hive is version 0.92.

[zhouhh@Hadoop48 ~] $hive- auxpath $HIVE_HOME/lib/hive-hbase-handler-0.9.0.jar,$HIVE_HOME/lib/hbase-0.92.0.jar,$HIVE_HOME/lib/zookeeper-3.3.4.jar,$HIVE_HOME/lib/guava-r09.jar,$HBASE_HOME/lib/protobuf-java-2.4.0a.jar

Hive > CREATE TABLE hbase_table_1 (key int, value string)

> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ": key,cf1:val")

> TBLPROPERTIES ("hbase.table.name" = "xyz")

Java.lang.NoClassDefFoundError: com/google/protobuf/Message

At org.apache.hadoop.hbase.io.HbaseObjectWritable. (HbaseObjectWritable.java

...

Caused by: java.lang.ClassNotFoundException: com.google.protobuf.Message

Solution:

Copy $HBASE_HOME/lib/protobuf-java-2.4.0a.jar to $HIVE_HOME/lib/.

[zhouhh@Hadoop48 ~] $cp / home/zhouhh/hbase-0.94.0/lib/protobuf-java-2.4.0a.jar $HIVE_HOME/lib/.

Hive > CREATE TABLE hbase_table_1 (key int, value string)

> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ": key,cf1:val")

> TBLPROPERTIES ("hbase.table.name" = "xyz")

OK

Time taken: 10.492 seconds

Hbase (main): 002 list 0 > xyz'

TABLE

Xyz

1 row (s) in 0.0640 seconds

You can include protobuf-java-2.4.0a.jar in the referenced jar package.

Test script

Bin/hive-hiveconf hbase.master=master:60000

Hive--auxpath / usr/local/hive/lib/hive-hbase-handler-0.9.0.jar,/usr/local/hive/lib/hbase-0.94.7-security.jar,/usr/local/hive/lib/zookeeper-3.4.5.jar-hiveconf hbase.master=localhost:60000

Hive.aux.jars.path

File:///usr/local/hive/lib/hive-hbase-handler-0.9.0.jar,file:///usr/local/hive/lib/hbase-0.94.7-security.jar,file:///usr/local/hive/lib/zookeeper-3.4.5.jar

1 nana

2 hehe

3 xixi

Hadoop dfsadmin-safemode leave

/ home/hadoop

Create table hbase_table_1 (key int, value string)

Stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

With serdeproperties ("hbase.columns.mapping" = ": key,cf1:val")

Tblproperties ("hbase.table.name" = "xyz")

Drop table pokes

Create table pokes

(id int,name string)

Row format delimited fields terminated by''

Stored as textfile

Load data local inpath'/ home/hadoop/kv1.txt' overwrite into table pokes

Insert into table hbase_table_1

Select * from pokes

Create external table hive_test

(key int,gid map,sid map,uid map)

Stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

With serdeproperties ("hbase.columns.mapping" = "aVOR Magna BJV Magi c:")

Tblproperties ("hbase.table.name" = "test1")

This is the end of the content of "how hive interacts with hbase data". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report