What are the methods of importing Hive data into HBase 07/09 Update SLTechnology News&Howtos

What are the methods of importing Hive data into HBase

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the methods of importing Hive data into HBase. It is very detailed and has a certain reference value. Friends who are interested must finish reading it.

There are basically two solutions for importing Hive data into HBase:

1. Create a table in HBase, and then an external table in Hive, so that when data is written in Hive, it will be updated in HBase at the same time

2. MapReduce reads Hive data, and then writes (API or Bulkload) to HBase

1. Hive external table

Create a hbase table

(1) create a table classes with 1 column family user

Create 'classes','user'

(2) View the construction of the table

Hbase (main): 005classes'DESCRIPTION ENABLED 0 > describe 'classes'DESCRIPTION ENABLED' classes', {NAME = > 'user', DATA_BLOCK_ENCODING = >' NONE', BLOOMFILTER = > 'ROW', REPLICATION_SCOPE = >' 0mm, true VERSIONS = > '1mm, COMPRESSION = >' NONE', MIN_VERSIONS = > '0mm, TTL = >' 2147483647', KEEP_DELETED_CELLS = > 'false', BLOCKSIZE = >' 65536', IN_MEMORY = > 'false', BLOCKCACHE = >' true'}

(3) add 2 rows of data

Put 'classes','001','user:name','jack'put' classes','001','user:age','20'put 'classes','002','user:name','liza'put' classes','002','user:age','18'

(4) View the data in classes

Hbase (main): 016 classes'ROW COLUMN+CELL 0 > scan 'classes'ROW COLUMN+CELL 001 column=user:age, timestamp=1404980824151, value=20 001 column=user:name, timestamp=1404980772073, value=jack 002 column=user:age, timestamp=1404980963764, value=18 002 column=user:name, timestamp=1404980953897, value=liza

(5) create external hive table, query and verify

Create external table classes (id int, name string, age int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ": key,user:name,user:age") TBLPROPERTIES ("hbase.table.name" = "classes"); select * from classes;OK1 jack 202 liza 18

(6) add data to HBase

Put 'classes','003','user:age','1820183291839132'hbase (main): 025scan' classes'ROW COLUMN+CELL 001 column=user:age, timestamp=1404980824151, value=20 001 column=user:name, timestamp=1404980772073, value=jack 002 column=user:age, timestamp=1404980963764, value=18 002 column=user:name, timestamp=1404980953897, value=liza 003 column=user:age, timestamp=1404981476497, value=1820183291839132

(7) Hive query to see the new data

Select * from classes;OK1 jack 202 liza 183 NULL NULL-this is null, because 003 does not have name, so make up Null, and age is Null because the maximum value is exceeded

(8) the following is the verification

Put 'classes','004','user:name','test'put' classes','004','user:age','1820183291839112312'-has surpassed int hbase (main): 030 column=user:name 0 > scan 'classes'ROW COLUMN+CELL 001 column=user:age, timestamp=1404980824151, value=20 001 column=user:name, timestamp=1404980772073, value=jack 002 column=user:age, timestamp=1404980963764, value=18 002 column=user:name, timestamp=1404980953897, value=liza 003 column=user:age, timestamp=1404981476497, value=1820183291839132 004 column=user:age, timestamp=1404981558125, value=1820183291839112312 004 column=user:name, timestamp=1404981551508, value=test select * from classes 1 jack 202 liza 183 NULL NULL4 test NULL-nullput 'classes','005','user:age','1231342'hbase (main): 034 jack 0* scan' classes'ROW COLUMN+CELL 001 column=user:age, timestamp=1404980824151, value=20 001 column=user:name, timestamp=1404980772073, value=jack 002 column=user:age, timestamp=1404980963764, value=18 002 column=user:name, timestamp=1404980953897, value=liza 003 column=user:age, timestamp=1404981476497, value=1820183291839132 004 column=user:age, timestamp=1404981558125, value=1820183291839112312 004 column=user:name, timestamp=1404981551508, value=test column=user:age, timestamp=1404981720600, value=1231342 select * from classes 1 jack 202 liza 183 NULL NULL4 test NULL5 NULL 1231342

Note:

1. The empty cell in hbase will complement null in hive.

2. Fields that do not match in hive and hbase will complement null.

3. Data of Bytes type. Building hive means adding # b.

Http://stackoverflow.com/questions/12909118/number-type-value-in-hbase-not-recognized-by-hive

Http://www.aboutyun.com/thread-8023-1-1.html

4 、 HBase CF to hive Map

Https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

2. MapReduce writes to HBase

There are two common methods for writing MR to HBase: 1 is to call HBase Api directly, using Table and Put to write; 2 is to generate HFile through MR, and then Bulkload to HBase, which is recommended when there is a large amount of data.

Note:

1. What if you need to read some values from the path of hive

Private String reg = "stat_date= (. *?)\ / softid= ([\ d] +) /"; write-String filePathString = ((FileSplit) context.getInputSplit ()) .getPath () .toString () in the private String stat_date;private String softid;-Xiamen map function. / user/hive/warehouse/snapshot.db/stat_all_info/stat_date=20150820/softid=201/000000_0// parses stat_date and softidPattern pattern = Pattern.compile (reg); Matcher matcher = pattern.matcher (filePathString); while (matcher.find ()) {stat_date= matcher.group (1); softid= matcher.group (2);}

2. How to deal with map and list in hive

There are mainly 8 kinds of delimiters in hive, which are\ 001->\ 008.

Default ^ A\ 001, ^ B\ 002: ^ C\ 003

Lis saved in Hive, the lowest data format is jerrick, liza, tom, jerry, Map. The data format is jerrick:23, liza:18, tom:0.

Therefore, simple processing is needed when MR is read, for example, map requires: "{" + mapkey.replace ("\ 002", ",") .replace ("\ 003", ":") + "}", which is then converted to JSON, and then saved to HBase after toString.

3, simple examples, a lot of code deletion, only for reference!

Public void map (LongWritable key, Text value, Mapper.Context context) {String filePathString = ((FileSplit) context.getInputSplit ()) .getPath () .toString () / user/hive/warehouse/snapshot.db/stat_all_info/stat_date=20150820/softid=201/000000_0 / / parse stat_date and softid Pattern pattern = Pattern.compile (reg); Matcher matcher = pattern.matcher (filePathString) While (matcher.find ()) {stat_date = matcher.group (1); softid = matcher.group (2) } rowMap.put ("stat_date", stat_date); rowMap.put ("softid", softid) String [] vals = value.toString () .split ("\ 001"); try {Configuration conf = context.getConfiguration () String cf = conf.get ("hbase.table.cf", HBASE_TABLE_COLUME_FAMILY); String arow = rowkey; for (int index=10; index < vals.length) Index++) {byte [] row = Bytes.toBytes (arow); ImmutableBytesWritable k = new ImmutableBytesWritable (row); KeyValue kv = new KeyValue () If (index = = vals.length-1) {/ / dict need logger.info ("d is:" + vals [index]) Logger.info ("d is:" + "{" + vals [index]. Replace ("\ 002", ",") .replace ("\ 003", ":") + "}") JSONObject json = new JSONObject ("{" + vals [index]. Replace ("002", ",") .replace ("003", ":") + "}") Kv = new KeyValue (row, cf.getBytes (), Bytes.toBytes (valueKeys [index]), Bytes.toBytes (json.toString ()) } else {kv = new KeyValue (row, cf.getBytes (), Bytes.toBytes (valueKeys [index]), Bytes.toBytes (vals [index])) } context.write (k, kv) }} catch (Exception E1) {context.getCounter ("offile2HBase", "Map ERROR") .increment (1) Logger.info ("map error:" + e1.toString ());} context.getCounter ("offile2HBase", "Map TOTAL") .increment (1);}}

4 、 bulkload

Int jobResult= (job.waitForCompletion (true))? 0: 1 false logger.info ("jobResult=" + jobResult); Boolean bulkloadHfileToHbase = Boolean.valueOf (conf.getBoolean ("hbase.table.hfile.bulkload", false)); if ((jobResult= = 0) & & (bulkloadHfileToHbase.booleanValue () {LoadIncrementalHFiles loader = new LoadIncrementalHFiles (conf); loader.doBulkLoad (outputDir, hTable) } these are all the contents of this article entitled "what are the ways to import Hive data into HBase?" Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.