Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of bulk-load loading hdfs data to hbase

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article shares with you the content of a sample analysis of bulk-load loading hdfs data into hbase. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

The function of bulk-load is to load files on hdfs into hbase in the way of mapreduce, which is very useful for loading massive data into hbase.

Hbase provides ready-made programs to import files on hdfs into hbase, that is, bulk-load mode. It consists of two steps (which can also be done at once):

1 package the file as hfile,hadoop jar/path/to/hbase.jar importtsv-Dimporttsv.columns=a,b,c

For example:

Java code

Hadoop dfs-cat test/1

1 2

3 4

5 6

7 8

Hadoop dfs-cat test/1

1 2

3 4

5 6

7 8

Execution

Java code

Hadoop jar ~ / hbase/hbase-0.90.2.jar importtsv-Dimporttsv.columns=HBASE_ROW_KEY,f1 T8 test

Hadoop jar ~ / hbase/hbase-0.90.2.jar importtsv-Dimporttsv.columns=HBASE_ROW_KEY,f1 T8 test

The mapreduce program will be started to generate the table t8 on hdfs, with a rowkey of 1.357 and a corresponding value of 2.468.

Note that the source file is delimited by "\ t" by default. If you need to replace it with another separator, add-Dimporttsv.separator= ",", and then split it with ",".

2 in the previous step, if you set the output directory, such as

Java code

Hadoop jar ~ / hbase/hbase-0.90.2.jar importtsv-Dimporttsv.bulk.output=tmp-Dimporttsv.columns=HBASE_ROW_KEY,f1 T8 test

Hadoop jar ~ / hbase/hbase-0.90.2.jarimporttsv-Dimporttsv.bulk.output=tmp-Dimporttsv.columns=HBASE_ROW_KEY,f1 t8test

Then the T8 table will not be generated for the time being, just output the hfile to the tmp folder, and we can look at the tmp:

Java code

Hadoop dfs-du tmp

Found 3 items

0 hdfs://namenode:9000/user/test/tmp/_SUCCESS

65254 hdfs://namenode:9000/user/test/tmp/_logs

462 hdfs://namenode:9000/user/test/tmp/f1

Hadoop dfs-du tmp

Found 3 items

0 hdfs://namenode:9000/user/test/tmp/_SUCCESS

65254 hdfs://namenode:9000/user/test/tmp/_logs

462 hdfs://namenode:9000/user/test/tmp/f1

Then execute hadoop jar hbase-VERSION.jarcompletebulkload / user/todd/myoutput mytable to transfer the hfile in this output directory to the corresponding region, which is quite fast because it is only mv. Such as:

Hadoop jar ~ / hbase/hbase-0.90.2.jar completebulkload tmp T8

And then

Java code

Hadoop dfs-du / hbase/t8/c408963c084d328490cc2f809ade9428

Found 4 items

124 hdfs://namenode:9000/hbase/t8/c408963c084d328490cc2f809ade9428/.oldlogs

692 hdfs://namenode:9000/hbase/t8/c408963c084d328490cc2f809ade9428/.regioninfo

0 hdfs://namenode:9000/hbase/t8/c408963c084d328490cc2f809ade9428/.tmp

462 hdfs://namenode:9000/hbase/t8/c408963c084d328490cc2f809ade9428/f1

Hadoop dfs-du/hbase/t8/c408963c084d328490cc2f809ade9428

Found 4 items

124 hdfs://namenode:9000/hbase/t8/c408963c084d328490cc2f809ade9428/.oldlogs

692 hdfs://namenode:9000/hbase/t8/c408963c084d328490cc2f809ade9428/.regioninfo

0 hdfs://namenode:9000/hbase/t8/c408963c084d328490cc2f809ade9428/.tmp

462 hdfs://namenode:9000/hbase/t8/c408963c084d328490cc2f809ade9428/f1

Table T8 has been generated at this time.

Note that if the data is very large and there is already a region in the table, the sharding work will be performed to find the region corresponding to the data and load

Note when using the program:

1 because the hadoop program is executed, the config path of hbase will not be found automatically, so the environment variable of hbase cannot be found. So you need to add hbase-site.xml to the hadoop-conf variable

2 you also need to put the jar package in hbase/lib into classpath

3 when performing step 2 above, you need to write the configuration of zookeeper to core-site.xml, because the hbase-site.xml will not even be read at that step, otherwise the zookeeper will not be connected.

Thank you for reading! This is the end of this article on "sample analysis of bulk-load loading hdfs data into hbase". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it out for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report