In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
I. the combination of hive and hbase
Hive is often used in conjunction with Hbase, using Hbase as the storage path for Hive, so it is particularly important for Hive to integrate Hbase. Using Hive to read data in Hbase, you can use HQL statements to query and insert on HBase tables, or even complex queries such as Join and Union. This feature was introduced from Hive 0.6.0. The implementation of the integration of Hive and HBase uses their own external API interface to communicate with each other, which mainly depends on the classes in the hive-hbase-handler-*.jar tool. Using Hive to manipulate tables in HBase only provides convenience, and the hiveQL engine uses MapReduce, which is not satisfactory in terms of performance.
Steps:
1. Copy the hbase-related jar package to hive/lib, as follows:
[hadoop@bus-stable hive] $cp / opt/hbase/lib/hbase-protocol-1.4.5.jar / opt/hive/lib/ [hadoop@bus-stable hive] $cp / opt/hbase/lib/hbase-server-1.4.5.jar / opt/hive/lib/ [hadoop@bus-stable hive] $cp / opt/hbase/lib/hbase-client-1.4.5.jar / opt/hive/lib/ [hadoop@bus-stable hive] $cp / opt/hbase/ Lib/hbase-common-1.4.5.jar / opt/hive/lib/ [hadoop@bus-stable hive] $cp / opt/hbase/lib/hbase-common-1.4.5-tests.jar / opt/hive/lib/ [hadoop@bus-stable hive] $
2. Reference hbase in the hive-site.xml file and add the following:
[hadoop@bus-stable hive] $vim / opt/hive/conf/hive-site.xml hive.aux.jars.path file:///opt/hive/lib/hive-hbase-handler-2.3.3.jar, file:///opt/hive/lib/hbase-protocol-1.4.5.jar, file:///opt/hive/lib/hbase-server-1.4.5.jar, File:///opt/hive/lib/hbase-client-1.4.5.jar, file:///opt/hive/lib/hbase-common-1.4.5.jar, file:///opt/hive/lib/hbase-common-1.4.5-tests.jar, file:///opt/hive/lib/zookeeper-3.4.6.jar, File:///opt/hive/lib/guava-14.0.1.jar The location of the plugin jars that contain implementations of user defined functions and serdes. Hbase.zookeeper.quorum open-stable,permission-stable,sp-stable dfs.permissions.enabled false
3. Start hive:
[hadoop@bus-stable hive] $hive- hiveconf hbase.master=oversea-stable:60000 SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [SLF4J SLF4J: Found binding in [jar:file:/opt/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7] .25.jar! / org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in and may not be available in the future versions and may not be available in the future versions Async: trueHive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. Spark, tez) or using Hive 1.x releases.hive > create table htest (key int,value string) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties (' hbase.columns.mapping'=':key,f:value') tblproperties ('hbase.table.name'='htest'); OKTime taken: 9.376 secondshive > show databases;OKdefaultinspiryTime taken: 0.121 seconds, Fetched: 2 row (s) hive > show tables OKhtestTime taken: 0.047 seconds, Fetched: 1 row (s) hive > select * from htest; OKTime taken: 1.967 secondshive >
4. Verify the data in hbase:
[hadoop@oversea-stable opt] $hbase shell SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/hbase-1.4.5/lib/slf4j-log4j12-1.7.10.jarbank SLF4J] SLF4J: Found binding in [jar:file:/opt/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jarbank) SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase ShellUse "help" to get list of supported commands.Use "exit" to quit this interactive shell.Version 1.4.5 Rca99a9466415dc4cfc095df33efb45cb82fe5480 Wed Jun 13 15:13:00 EDT 2018hbase (main): 001main 0 > listTABLE htest 1 row (s) in 0.2970 seconds= > ["htest"] hbase (main): 002seconds= 0 > scan "htest" ROW COLUMN+CELL 0 row (s) in 0.1410 secondshbase (main): 003purl 0 >
Second, import external data
(1) the data files are as follows:
[hadoop@bus-stable ~] $cat score.csv
Hive,85
Hbase,90
Hadoop,92
Flume,89
Kafka,95
Spark,80
Storm,70
[hadoop@bus-stable ~] $hadoop fs-put score.csv / data/score.csv
[hadoop@bus-stable ~] $hadoop fs-ls / data/
Found 2 items
-rw-r--r-- 3 hadoop supergroup 88822 2018-06-15 10:32 / data/notepad.txt
-rw-r--r-- 3 hadoop supergroup 70 2018-06-26 15:59 / data/score.csv
[hadoop@bus-stable ~] $
(2) create an external table
Create an hive external table using the existing data on hdfs
Hive > create external table if not exists course.testcourse (cname string,score int) row format delimited fields terminated by', 'stored as textfile location' / data'
OK
Time taken: 0.282 seconds
Hive > show databases
OK
Course
Default
Inspiry
Time taken: 0.013 seconds, Fetched: 3 row (s)
Hive > use course
OK
Time taken: 0.021 seconds
Hive > show tables
OK
Testcourse
Time taken: 0.036 seconds, Fetched: 1 row (s)
Hive > select * from testcourse
OK
Hive 85
Hbase 90
Hadoop 92
Flume 89
Kafka 95
Spark 80
Storm 70
Time taken: 2.272 seconds, Fetched: 7 row (s)
Hive >
Third, use HQL statement to create hbase table.
Use the HQL statement to create an Hive table that points to HBase, with the following syntax:
The table name in CREATE TABLE tbl_name (key int, value string) / / Hive tbl_nameSTORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' / / specifies the storage processor WITH SERDEPROPERTIES ("hbase.columns.mapping" = ": key,cf1:val") / / declares the column family, and the column name TBLPROPERTIES ("hbase.table.name" = "tbl_name", "hbase.mapred.output.outputtable" = "iteblog") / / hbase.table.name declares the HBase table name. The optional attribute is the same as the Hive table name by default. Hbase.mapred.output.outputtable specifies the table to be written when inserting data. If you need to insert data into the table later, you need to specify this value.
(1) create the statement as follows
Hive > create table course.hbase_testcourse (cname string,score int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ": key,cf:score") TBLPROPERTIES ("hbase.table.name" = "hbase_testcourse", "hbase.mapred.output.outputtable" = "hbase_testcourse"); OKTime taken: 3.745 secondshive > show databases;OKcoursedefaultinspiryTime taken: 0.019 seconds, Fetched: 3 row (s) hive > use course;OKTime taken: 0.02 secondshive > show tables OKhbase_testcoursetestcourseTime taken: 0.025 seconds, Fetched: 2 row (s) hive > select * from hbase_testcourse;OKTime taken: 1.883 secondshive >
(2) after creating an internal table, you can import the data of a table into HBase through insert overwrite supported by Hive.
Hive > insert overwrite table course.hbase_testcourse select cname,score from course.testcourse;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. Spark, tez) or using Hive 1.X releases.Query ID = hadoop_20180626170540_c7eecb8d-2925-4ad2-be7f-237d9815d1cbTotal jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1529932626564_0002, Tracking URL = http://oversea-stable:8088/proxy/application_1529932626564_0002/Kill Command = / opt/hadoop/bin/hadoop job-kill job_1529932626564_0002Hadoop job information for Stage-3: number of mappers: 1 Number of reducers: 02018-06-26 17 reduce 06V 02793 Stage-3 map = 0%, reduce = 0% 2018-06-26 17 seconds 06 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 6.12 secMapReduce Total cumulative CPU time: 6 seconds 120 msecEnded Job = job_1529932626564_0002MapReduce Jobs Launched: Stage-Stage-3: Map: 1 Cumulative CPU: 6.12 sec HDFS Read: 4224 HDFS Write: 0 seconds 120 msecOKTime taken: 41.489 secondshive > hive > select * from hbase_testcourse OKflume 89hadoop 92hbase 90hive 85kafka 95spark 80storm 70Time taken: 0.201 seconds, Fetched: 7 row (s) hive >
(3) verify hbase
Hbase (main): 011VO > listTABLE hbase_testcourse Htest 2 row (s) in 0.0110 seconds= > ["hbase_testcourse" "htest"] hbase (main): 012 hbase_testcourse 0 > scan "hbase_testcourse" ROW COLUMN+CELL flume column=cf:score, timestamp=1530003973026 Value=89 hadoop column=cf:score, timestamp=1530003973026, value=92 hbase column=cf:score, timestamp=1530003973026 Value=90 hive column=cf:score, timestamp=1530003973026, value=85 kafka column=cf:score, timestamp=1530003973026 Value=95 spark column=cf:score, timestamp=1530003973026, value=80 storm column=cf:score, timestamp=1530003973026 Value=70 7 row (s) in 0.0760 secondshbase (main): 01315 >
Use Hive to map tables that already exist in HBase
(1) create a HBase table in hbase, and enter the HBase Shell client to execute the table creation command
Hbase (main): 036 hbase_test', 0 > create 'hbase_test', {NAME = >' cf'} 0 row (s) in 2.2830 seconds= > Hbase::Table-hbase_test
(2) insert data
Hbase (main): 037 in 0 > put 'hbase_test','hadoop','cf:score',' 95 hbase_test','hadoop','cf:score', 0 row (s) in 0.1110 secondshbase (main): 038 in 0 > put 'hbase_test','storm','cf:score',' 96 Tibet 0 row (s) in 0.0120 secondshbase (main): 039 in 0 > put 'hbase_test','spark','cf:score',' 97 Tibet 0 row (s) in 0.0110 seconds
(3) View data
Hbase (main): 041 hbase_test 0 > scan "hbase_test" ROW COLUMN+CELL hadoop column=cf:score, timestamp=1530004351399 Value=95 spark column=cf:score, timestamp=1530004365368, value=97 storm column=cf:score, timestamp=1530004359169 Value=96 3 row (s) in 0.0220 secondshbase (main): 042 in 0 >
(4) enter the Hive Shell client and create an external table course.hbase_test. The table creation command is as follows
Hive > create external table course.hbase_test (cname string,score int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ": key,cf:score") TBLPROPERTIES ("hbase.table.name" = "hbase_test", "hbase.mapred.output.outputtable" = "hbase_test"); OKTime taken: 0.221 secondshive > show tables OKhbase_testhbase_testcoursetestcourseTime taken: 0.024 seconds, Fetched: 3 row (s) Note: the commands for creating external tables and internal tables are basically the same, except that create table is used to create internal tables and create external table is used to create external tables. Hive View data hive > select * from hbase_test;OKhadoop 95spark 97storm 96Time taken: 0.22 seconds, Fetched: 3 row (s) hive >
The Hive table is an external table, so deleting the table does not delete the data in the HBase table, there are a few points to note:
A), if key is not specified when creating or mapping a table, the first column defaults to the row key
B) there is no concept of timestamp in the Hive table corresponding to HBase. The latest version of the value is returned by default.
C) since there is no data type information in HBase, it is converted to String type when storing data
5. Use java to connect hive and operate hbase
Pom.xml
4.0.0 cn.itcast.hbase hbase 0.0.1-SNAPSHOT org.apache.hadoop hadoop-client 2.6.4 org.apache.hadoop hadoop-common 2.6.4 junit junit 4.12 org.apache.hbase hbase-client 1.4.0 org.apache.hbase hbase-server 1.4.0 org.apache.hive hive-jdbc 1.2.1 org.apache.hive hive-metastore 1.2.1 org.apache.hive hive-exec 1.2.1
Hive_Hbase.java
Package cn.itcast.bigdata.hbase;import java.sql.Connection;import java.sql.DriverManager;import java.sql.ResultSet;import java.sql.SQLException;import java.sql.Statement;public class Hive_Hbase {public static void main (String [] args) {try {Class.forName ("org.apache.hive.jdbc.HiveDriver"); Connection connection = DriverManager.getConnection ("jdbc:hive2://hadoop1:10000/shizhan02", "hadoop", "") Statement statement = connection.createStatement (); String sql = "SELECT * FROM hive_hbase_table_kv"; ResultSet res = statement.executeQuery (sql); while (res.next ()) {System.out.println (res.getString (2)) }} catch (ClassNotFoundException | SQLException e) {/ / TODO Auto-generated catch block e.printStackTrace ();}}
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.