In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Hive and Hbase integration theory
1. Why should hive integrate with hbase
2. Advantages and disadvantages of integration.
Advantages:
(1). Hive conveniently provides Hive QL interface to simplify the use of MapReduce.
HBase provides low-latency database access. If the two are combined, it can be beneficial.
Use the advantages of MapReduce to calculate and analyze a large amount of content stored in HBase offline.
(2)。 Easy to operate, hive provides a large number of system functions
Disadvantages:
Performance loss, hive has such a feature, it supports manipulating hbase through syntax similar to sql statements.
But the speed is slow.
3. What kind of preparatory work needs to be done for integration
4. The goal after integration
(1)。 Tables created in hive can be created and saved directly to hbase.
(2)。 Insert data into the table in hive, and the data is synchronously updated to the corresponding table in hbase.
(3)。 The column cluster value corresponding to hbase changes, and it also changes in the corresponding table in Hive.
(4)。 Realize the transformation of multi-column and multi-column clusters: (example: 3 columns in hive correspond to 2 columns in hbase)
5. What if you communicate after the integration of hive and Hbase?
View the hive and Hbase communication diagrams:
Hive is mainly implemented through hive-hbase-handler-1.2.1.jar under the lib directory of hive
Communicate with Hbase.
Integration process (case operation)
The data of the table created in hive is saved directly in hbase.
First, start hive. Go to the interactive interface and create a table.
Hive version: apache-hive-1.2.1
Hbase version: apache-hbase-1.1.2
Hadoop version: hadoop-2.7.3
First: create tables that hbase can recognize.
Build a table sentence:
Create table if not exists hive_hbase (
Id int
Name String
Age int
Sex String
Address String
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ": key,cf_info:eName,cf_info:eAge,cf_info:eSex,cf_beizhu:eAddress")
TBLPROPERTIES ("hbase.table.name" = "ns2:hive_hbase01")
Note: the org.apache.hadoop.hive.hbase.HBaseStorageHandler class here is under the hive lib package and needs to be replaced with the .hive-1.2.1 version of the jar package. Otherwise, there will be an error indicating that this class cannot be found.
Error prompt: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Org.apache.hadoop.hbase.HTableDescriptor.addFamily (Lorg/apache/hadoop/hbase/HColumnDescriptor;) V
Nor can the hive version be too high. For example, version 2.x will report an error.
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Make sure that there is a database package for mysql-connector in the lib directory of the hive directory. Otherwise, you will make a mistake.
After creation, you can look at the table in hbase. List
Second:
Prepare your own test data. Omitted here
Create table test (
Id int
Name string)
Row format delimited fields terminated by','
Lines terminated by'\ n'
Stored as textfile
Load data into the table:
Load data local inpath'/ usr/local/test01.txt' overwrite into table test
Insert data into a table by means of a result set
Insert overwrite table hive_hbase select * from test
The mapreduce program will be run here. Process is omitted.
Third: query the inserted data in hbase
Select * from hive_hbase
20170616,zhangshaoqi,22,nan,jincheng
20170617,xuqianya,29,nv,beijing
20170618,xiaolin,29,nv,jincheng
20170619,xiaopan,33,nan,guizhou
20170620,xiaohu,26,nan,shouzhou
1 row (s) in 3.19 seconds
Fourth: scan the table in hbase to see if there is any data
Scan 'ns2:hive_hbase01'
Fifth: hive accesses the existing hbase
An external table of type external is required, otherwise an error will be reported
REATE EXTERNAL TABLE hbase_table_3 (key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "info:name")
TBLPROPERTIES ("hbase.table.name" = "student")
Hive > CREATE EXTERNAL TABLE hbase_table_3 (key int, value string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = "info:name")
> TBLPROPERTIES ("hbase.table.name" = "student")
OK
Time taken: 1.21 seconds
Note: if the column cluster name name data in hbase changes, the query results in hive will also change accordingly, if there are no other column clusters in hbase
If the content is updated, the query results will not be displayed in hive.
That's all. If you have any questions, welcome to discuss.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.