Several ways of deriving data from hbase 04/13 Update SLTechnology News&Howtos

Several ways of deriving data from hbase

2025-04-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

here the editor introduces two ways to import data, one is based on hive, and the other is to generate HFile from basic files.

1. Hive-hbase-handler derived data

This approach requires a jar package to support:

Download address: https://down.51cto.com/data/2464129

Put it in $HBASE_HOME/lib and copy the original jar package.

Secondly, modify the hive-site.xml:

# join: hive.aux.jars.path file:///applications/hive-2.3.2/lib/hive-hbase-handler.jar,file:///applications/hive-2.3.2/lib/guava-14.0.1.jar,file:///ap plications/hbase-2.0.5/lib/hbase-common-2.0.5.jar, file:///applications/hbase-2.0.5/lib/hbase-client-2.0.5.jar, File:///application Sapper HiveUr 2.3.2 Guru lib Universe zookeepermuri 3.4.6.jarhbase.zookeeper.quorumhadoop01pur2181

Import hive data into hbase:

① creates the hive table:

Create table hive_hbase_test (id int,name string,age int)

② inserts data into the hive table

Insert into hive_hbae_test (id,name,age) values (1, "xiaozhang", "18"); insert into hive_hbase_test (id,name,age) values (2, "xiaowang", "19")

Here the test environment can insert data in this way, and it is best to use appearance in the real world.

③ maps the table of Hbase

Create table hive_hbase_pro (row_key string,id bigint,name string,age int) STORED BY "org.apache.hadoop.hive.hbase.HBaseStorageHandler" WITH SERDEPROPERTIES ("hbase.columns.mapping" = ": key,info:id,info:name,info:age") TBLPROPERTIES ("hbase.table.name" = "hive_hbase_pro")

A table called hive_hbase_ pro is created in hbase.

Data that ④ inserts into the mapped Hbase table

# configure the following parameters in hive: set hive.hbase.wal.enabled=false;set hive.hbase.bulk=true;set hbase.client.scanner.caching=1000000

⑤ Import data:

Insert overwrite table hive_hbase_pro select id as row_key,id,name,age from hive_hbase_test

At this point, there is the data in hive in the hive table:

Add: if the table already exists in hbase, only the appearance can be created in hive:

Create external table hive_hbase_xiaoxu (row_key string,id bigint,name string,age int) STORED BY "org.apache.hadoop.hive.hbase.HBaseStorageHandler" WITH SERDEPROPERTIES ("hbase.columns.mapping" = ": key,info:id,info:name,info:age") TBLPROPERTIES ("hbase.table.name" = "hive_hbase_pro")

At this point, the created appearance can read the data in the hbase table.

summed up: in this way, the data is inserted in the form of one bar, and the speed is relatively slow. If the order of magnitude is in the case of millions of millions of machines, it can be used in this way, and the execution speed is about 2-3W per second.

There are also derivatives of Phoneix and pig, which feel more or less the same as hive, so I won't introduce them here.

2. Guide data in Bulkload mode

Importing data in this way is quite fast because it skips WAL and produces the underlying HFile file directly.

Advantages:

BulkLoad does not write WAL, nor does it produce flush or split. If we call a large number of PUT interfaces to insert data, it may result in a large number of GC operations. If the table of Hbase is not pre-partitioned, it will lead to hot issues on a single machine, and in serious cases, it may even affect the stability of HBase nodes. There is no such concern about using BulkLoad.

There are not a large number of interface calls in the process that consume performance.

Steps:

① uploads the data file to HDFS:

Download address: https://down.51cto.com/data/2464129

Here the contents of the file are separated by commas.

$hadoop fs-put sp_address.txt / tmp/sp_addr_bulktable

② uses importtsv command to generate Hfile file

Hbase org.apache.hadoop.hbase.mapreduce.ImportTsv-Dimporttsv.separator= ","-Dimporttsv.bulk.output=/tmpbulkdata/sp_addr_data-Dimporttsv.columns=HBASE_ROW_KEY,sp_address:ID,sp_address:PLACE_TYPE,sp_address:PLACE_CODE,sp_address:PLACE_NAME,sp_address:UP_PLACE_CODE sp_address_bulkload "/ tmp/sp_addr_bulktable"

Parameter description:

-Dimporttsv.separator: specifies the delimiter of the file

Dimporttsv.bulk.output: the directory of the generated HFile (this directory must not exist)

Relational Mapping of Dimporttsv.columns:hbase Table

Sp_address_bulkload: hbase table name (here be sure to create the hbase table before generating the hfile)

"/ tmp/sp_addr_bulktable": source data directory

* * table sentence: * * create 'sp_address_bulkload',' sp_address'

③ imports the Hfile file into Hbase

$hadoop jar / applications/hbase-2.0.5/lib/hbase-mapreduce-2.0.5.jar completebulkload / tmpbulkdata/sp_addr_data/ sp_address_bulkload

There is a hole here, it is said on the Internet that the editor here uses version 2.0.5 of hbase-server-VRESION-hadoop2.jar, and this completebulkload main class is under the jar package hbase-mapreduce-2.0.5.jar.

Benefit: running this command is essentially a mv operation for hdfs and does not start MapReduce.

④ view the hbase table

$scan 'sp_address_bulkload'

At this point, the data is loaded into the hbase.

Of course, can also use the API way, but the cost of learning will double, if the scene is not particularly complex, using shell can basically solve the problem.

summary: this method is the fastest, the principle is carried out according to Hfile, one time to deal with multiple pieces of data, it is recommended to use this way. It's going to be pretty fast in a real environment, and we're testing more than 400 million pieces of data in 20 minutes. Maybe you can't see whether it's fast or not, but here the editor can provide a real situation. On a machine with 256 GB of memory, it takes 27 minutes to import 5000W of data with sqoop.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.