How to import HBase data into HDFS 07/12 Update SLTechnology News&Howtos

How to import HBase data into HDFS

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces how to import HBase data into HDFS, which has certain reference value. Interested friends can refer to it. I hope you will gain a lot after reading this article. Let Xiaobian take you to understand it together.

Practice: Import HBase data into HDFS

What if customer data to be used in MapReduce is combined with data in HDFS? You can write a MapReduce job that takes the HDFS dataset as input and extracts data directly from HBase's map or reduce code. However, in some cases, it may be more useful to dump data from HBase directly to HDFS, especially if the data is planned to be used in multiple MapReduce jobs and the HBase data is immutable or does not change frequently.

problem

Import HBase data into HDFS

solutions

HBase contains an Export class that can be used to import HBase data into HDFS in SequenceFile format. This technique also describes HBase data codes that can be used to read imports.

discussed

Before you can start using this technique, you need to get HBase up and running.

To be able to export data from HBase, you first need to load some data into HBase. The loader creates an HBase table named stocks_example, which contains details of a column family. We store the HBase data as Avro binary serialized data, so this part of the code is not listed.

Run the loader and use it to load sample data into HBase:

Load results can be viewed using the HBase shell. The list command (without any arguments) displays all the tables in HBase, while the scan command with a single argument dumps all the contents of the table:

With this data, we can export it to HDFS. HBase comes with an org.apache.hadoop.hbase. mareduce.Export class that dumps HBase tables. The following code snippet shows an example of using the Export class. Using this command, you can export the entire HBase table:

The Export class also supports exporting only a single column family and compressing the output:

The Export class writes the HBase output in SequenceFile format, where the HBase line key is stored in the SequenceFile record key using org.apache.hadoop.hbase.io. ImmutableBytesWriteable, and the HBase value is stored in the SequenceFile record value using org.apache.hadoop.hbase.client.Result.

What if exported data is processed in HDFS? The following list shows an example of how to read HBase SequenceFile and extract Avro records.

Code 5.3 Read HBase SequenceFile to extract Avro records

You can run the code against the HDFS directory for export and see the results:

The HBaseExportedStockReader class can read and dump SequenceFile contents used by the Export class of HBase.

The built-in HBase Export class makes it easier to export data from HBase to HDFS. But what if instead of writing HBase data to HDFS, you want to process it directly in a MapReduce job? Let's see how HBase can be used as a data source for MapReduce jobs.

Practice: Using HBase as a Data Source for MapReduce

The built-in HBase exporter uses SequenceFile to output HBase data, which is not supported by programming languages other than Java and does not support schema evolution. It only supports Hadoop file systems as data sinks. If you want more control over HBase data extraction, you may need additional HBase tools.

problem

You want to operate on HBase directly in MapReduce jobs without the intermediate step of copying data to HDFS.

solutions

HBase has a TableInputFormat class that can be used in MapReduce jobs to extract data directly from HBase.

discussed

HBase provides an InputFormat class called TableInputFormat that can be used as a data source in MapReduce. The following code shows a MapReduce job that reads data from HBase using this input format (called via TableMapReduceUtil.initTableMapperJob).

Code 5.4 Import HBase Data into HDFS Using MapReduce

You can run this MapReduce job as follows:

$ hip hip.ch6.hbase.ImportMapReduce --output output

A quick look at HDFS tells MapReduce whether the job is working as expected:

This output confirms that the MapReduce job works as expected.

Thank you for reading this article carefully. I hope that the article "How to import HBase data into HDFS" shared by Xiaobian will be helpful to everyone. At the same time, I hope that everyone will support you a lot and pay attention to the industry information channel. More relevant knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.