In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the knowledge of "what is the method of data interaction between hive and hbase". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
The Integration principle of HBase and Hive
ApacheCN | apache Chinese website
Hive and Hbase have different characteristics: hive is high-latency, structured and analysis-oriented, and hbase is low-latency, unstructured and programming-oriented. Hive data warehouses have high latency on hadoop. Hive integrates Hbase to use some of the features of hbase. The following is the integrated architecture of hive and hbase:
Figure 1 hive and hbase architecture diagram
Hive integrated HBase can effectively take advantage of the storage features of HBase databases, such as row updates and column indexes. Pay attention to maintaining the consistency of HBase jar packages during the integration process. Hive integration HBase needs to establish a mapping relationship between Hive table and HBase table, that is, the column (columns) and column type (column types) of Hive table are associated with the column family (column families) and column qualifier (column qualifiers) of HBase table. Every field in the Hive table exists in the HBase, and the Hive table does not need to contain all the columns in the HBase. The RowKey in HBase corresponds to Hive for selecting a domain: key to correspond, and the column family (cf:) is mapped to all other fields in Hive, listed (cf:cq). For example, figure 2 below shows that the Hive table is mapped to the HBase table:
Figure 2 Hive table mapping HBase table
Basic introduction
Hive is a data warehouse tool based on Hadoop, which can map structured data files to a database table, provide complete sql query functions, and transform sql statements into MapReduce tasks to run. Its advantage is that the learning cost is low, simple MapReduce statistics can be quickly realized through SQL-like statements, and there is no need to develop special MapReduce applications, so it is very suitable for statistical analysis of data warehouse.
The integration function of Hive and HBase is realized by using their own external API interfaces to communicate with each other, which mainly depends on the hive_hbase-handler.jar tool class, which roughly means as shown in the figure:
Software version
Software version used: (just find it in Baidu with no download address)
Jdk-6u24-linux-i586.bin
Hive-0.9.0.tar.gz http://yun.baidu.com/share/link?shareid=2138315647&uk=1614030671
Hbase-0.94.14.tar.gz http://mirror.bit.edu.cn/apache/hbase/hbase-0.94.14/
Hadoop-1.1.2.tar.gz http://pan.baidu.com/s/1mgmKfsG
Installation location
Install directory: / usr/local/ (remember to unzip and rename it)
The installation path of Hbase is: / usr/local/hbase
The installation path of Hive is: / usr/local/hive
Integration steps
The process of integrating hive with hbase is as follows:
1. Under / usr/local/hbase-0.90.4:
Copy hbase-0.94.14.jar,hbase-0.94.14-tests.jar and lib/zookeeper-3.4.5.jar to the / usr/local / hive/lib folder
Note:
If other versions of these two files already exist under hive/lib (for example, zookeeper-3.3.1.jar)
It is recommended to use the relevant version under hbase after deletion.
Still need
Copy protobuf-java-2.4.0a.jar to / usr/local/hive/lib and / usr/local/hadoop/lib
two。 Modify the hive-site.xml file
Under the directory under / usr/local/hive/conf, add the following at the bottom of hive-site.xml:
(jump to the bottom linux command: press and hold ESC + colon + $and enter.) hive.querylog.location / usr/local/hive/logs hive.aux.jars.path Note: create your own if it doesn't exist, or rename the file and use it. Copy to all nodes included. Copy the files to all nodes included.
Note that the following error is likely to occur when running hive if you skip the two steps of 3pr. 4:
Org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately.
This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and
Then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information. At org.apache.hadoop.
Hbase.zookeeper.ZooKeeperWatcher.
5 start hive (test succeeded)
Single node startup
Bin/hive-hiveconf hbase.master=master:60000
Cluster startup (I didn't test this)
Bin/hive-hiveconf hbase.zookeeper.quorum=node1,node2,node3 (all zookeeper nodes)
If hive.aux.jars.path is not configured in the hive-site.xml file, you can start it as follows.
Hive--auxpath / opt/mapr/hive/hive-0.7.1/lib/hive-hbase-handler-0.7.1.jar,/opt/mapr/hive/hive-0.7.1/lib/hbase-0.90.4.jar,/opt/mapr/hive/hive-0.7.1/lib/zookeeper-3.3.2.jar-hiveconf hbase.master=localhost:60000
Modified hive configuration file hive-site.xml after testing
Hive.zookeeper.quorum
Node1,node2,node3
The list of zookeeper servers to talk to. This is only needed for read/write locks.
You can join hive with hbase without adding parameters to start hbase
Test hive into hbase
Test after startup (restart the cluster)
1. Create tables that can be recognized by hbase with hive
The statement is as follows:
Create table hbase_table_1 (key int, value string)
Stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
With serdeproperties ("hbase.columns.mapping" >: key,cf1:val ")
Tblproperties ("hbase.table.name" = "xyz")
Now you find an extra table 'xyz'' in hbase shell.
(you can skip this sentence first: hbase.table.name is defined in the table name of hbase
When multi-column: data:1,data:2; multi-column family: data1:1,data2:1;)
Hbase.columns.mapping is defined in the column family of hbase, where: key is a fixed value and make sure that the foo field in the table pokes is a unique value
Create a partitioned table
Create table hbase_table_1 (key int, value string)
Partitioned by (day string)
Stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
With serdeproperties ("hbase.columns.mapping" = ": key,cf1:val")
Tblproperties ("hbase.table.name" = "xyz")
Table modification is not supported
It will be prompted that the non-local surface cannot be modified.
Hive > ALTER TABLE hbase_table_1 ADD PARTITION (day = '2012-09-22')
FAILED: Error in metadata: Cannot use ALTER TABLE on a non-native table FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
two。 Import data into the associated hbase table to 1. 0. Create a new intermediate table in hive
Create table pokes (foo int,bar string)
Row format delimited fields terminated by','
Batch import data
Load data local inpath'/ home/1.txt' overwrite into table pokes
The contents of the 1.txt file are
1,hello
2,pear
3,world
Import hbase_table_1 using sql
Set hive.hbase.bulk=true
two。 Insert data into the hbase table
Insert overwrite table hbase_table_1
Select * from pokes
Import tables with partitions
Insert overwrite table hbase_table_1 partition (day='2012-01-01')
Select * from pokes
3. View the table associated with the hbase
Hive > select * from hbase_table_1
OK
1 hello
2 pear
3 world
(note: partitioned tables integrated with hbase are stored in a problem select * from table cannot query data, select key,value from table can find data)
4. Log in to hbase to view the data in that table
Hbase shell
Hbase (main): 002 describe 0 > xyz'
DESCRIPTION ENABLED {NAME = > 'xyz', FAMILIES = > [{NAME = >' cf1', BLOOMFILTER = > 'NONE', REPLICATION_S true
COPE = > '014, COMPRESSION = >' NONE', VERSIONS = >'3, TTL = > '2147483647, BLOCKSI
ZE = > '65536', IN_MEMORY = > 'false', BLOCKCACHE = >' true'}]}
1 row (s) in 0.0830 seconds
Hbase (main): 003 scan 0 > xyz'
ROW COLUMN+CELL
1 column=cf1:val, timestamp=1331002501432, value=hello
2 column=cf1:val, timestamp=1331002501432, value=pear
3 column=cf1:val, timestamp=1331002501432, value=world
At this point, you can see the data you just inserted in hive in Hbase.
Test hbase to hive 1. Create a table in hbase
Create 'test1','a','b','c'
Put 'test1','1','a','qqq'
Put 'test1','1','b','aaa'
Put 'test1','1','c','bbb'
Put 'test1','2','a','qqq'
Put 'test1','2','c','bbb'
two。 Associate tables in hbase to hive
For tables that already exist in hbase, use CREATE EXTERNAL TABLE in hive to create
For example, the table name in hbase is test1, and the fields are a:, b:, c: build the table statement in hive as
Create external table hive_test
(key int,gid map,sid map,uid map)
Stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
With serdeproperties ("hbase.columns.mapping" = "aVOR Magna BJV Magi c:")
Tblproperties ("hbase.table.name" = "test1")
two。 Check the data in test1
After the table is established in hive, query the contents of the test1 table in hbase
Select * from hive_test
OK
1 {"": "qqq"} {":" aaa "} {":" bbb "}
2 {"": "qqq"} {} {":" bbb "}
The method to query the value in the gid field is
Select gid ['] from hive_test
Get the query results
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201203052222_0017, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201203052222_0017
Kill Command = / opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job-Dmapred.job.tracker=maprfs:///-kill job_201203052222_0017
2012-03-06 14 38 29141 Stage-1 map = 0%, reduce = 0%
2012-03-06 14 3815 33171 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201203052222_0017
OK
Qqq
Qqq
If the field in the hbase table test1 is user:gid,user:sid,info:uid,info:level, the table statement in hive is
Create external table hive_test
(key int,user map,info map)
Stored by 'org.apache.hadoop.hive.hbase.hbasestoragehandler'
With serdeproperties ("hbase.columns.mapping" = "user:,info:")
Tblproperties ("hbase.table.name" = "test1")
The method to query the hbase table is
Select user ['gid'] from hive_test
Note: hive connection hbase optimization to add configuration to the hbase-site.xml file in HADOOP_HOME/conf
Hbase.client.scanner.caching
10000
Or execute hive > set hbase.client.scanner.caching=10000 before executing the hive statement
Error report: Hive error report
1.NoClassDefFoundError
Could not initialize class java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.io.HbaseObjectWritable
Add protobuf-***.jar to the jars path
/ / $HIVE_HOME/conf/hive-site.xml
Hive.aux.jars.path
File:///data/hadoop/hive-0.10.0/lib/hive-hbase-handler-0.10.0.jar,file:///data/hadoop/hive-0.10.0/lib/hbase-0.94.8.jar,file:///data/hadoop/hive-0.10.0/lib/zookeeper-3.4.5.jar,file:///data/hadoop/hive-0.10.0/lib/guava-r09.jar, File:///data/hadoop/hive-0.10.0/lib/hive-contrib-0.10.0.jar,file:///data/hadoop/hive-0.10.0/lib/protobuf-java-2.4.0a.jar
Hbase reported an error
: java.lang.NoClassDefFoundError: com/google/protobuf/Message
Write a Hbase program, the system prompts an error, java.lang.NoClassDefFoundError: com/google/protobuf/Message
After searching for a long time, I found something from this place: http://abloz.com/2012/06/15/hive-execution-hbase-create-the-table-can-not-find-protobuf.html
The contents are as follows:
Hadoop:1.0.3
Hive:0.9.0
Hbase:0.94.0
Protobuf:$HBASE_HOME/lib/protobuf-java-2.4.0a.jar
As you can see, the jar of the hbase included in the 0.9.0 hive is version 0.92.
[zhouhh@Hadoop48 ~] $hive- auxpath $HIVE_HOME/lib/hive-hbase-handler-0.9.0.jar,$HIVE_HOME/lib/hbase-0.92.0.jar,$HIVE_HOME/lib/zookeeper-3.3.4.jar,$HIVE_HOME/lib/guava-r09.jar,$HBASE_HOME/lib/protobuf-java-2.4.0a.jar
Hive > CREATE TABLE hbase_table_1 (key int, value string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ": key,cf1:val")
> TBLPROPERTIES ("hbase.table.name" = "xyz")
Java.lang.NoClassDefFoundError: com/google/protobuf/Message
At org.apache.hadoop.hbase.io.HbaseObjectWritable. (HbaseObjectWritable.java
...
Caused by: java.lang.ClassNotFoundException: com.google.protobuf.Message
Solution:
Copy $HBASE_HOME/lib/protobuf-java-2.4.0a.jar to $HIVE_HOME/lib/.
[zhouhh@Hadoop48 ~] $cp / home/zhouhh/hbase-0.94.0/lib/protobuf-java-2.4.0a.jar $HIVE_HOME/lib/.
Hive > CREATE TABLE hbase_table_1 (key int, value string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ": key,cf1:val")
> TBLPROPERTIES ("hbase.table.name" = "xyz")
OK
Time taken: 10.492 seconds
Hbase (main): 002 list 0 > xyz'
TABLE
Xyz
1 row (s) in 0.0640 seconds
You can include protobuf-java-2.4.0a.jar in the referenced jar package.
Test script
Bin/hive-hiveconf hbase.master=master:60000
Hive--auxpath / usr/local/hive/lib/hive-hbase-handler-0.9.0.jar,/usr/local/hive/lib/hbase-0.94.7-security.jar,/usr/local/hive/lib/zookeeper-3.4.5.jar-hiveconf hbase.master=localhost:60000
Hive.aux.jars.path
File:///usr/local/hive/lib/hive-hbase-handler-0.9.0.jar,file:///usr/local/hive/lib/hbase-0.94.7-security.jar,file:///usr/local/hive/lib/zookeeper-3.4.5.jar
1 nana
2 hehe
3 xixi
Hadoop dfsadmin-safemode leave
/ home/hadoop
Create table hbase_table_1 (key int, value string)
Stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
With serdeproperties ("hbase.columns.mapping" = ": key,cf1:val")
Tblproperties ("hbase.table.name" = "xyz")
Drop table pokes
Create table pokes
(id int,name string)
Row format delimited fields terminated by''
Stored as textfile
Load data local inpath'/ home/hadoop/kv1.txt' overwrite into table pokes
Insert into table hbase_table_1
Select * from pokes
Create external table hive_test
(key int,gid map,sid map,uid map)
Stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
With serdeproperties ("hbase.columns.mapping" = "aVOR Magna BJV Magi c:")
Tblproperties ("hbase.table.name" = "test1")
This is the end of the content of "how hive interacts with hbase data". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.