In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces how to import HBase data into HDFS, which has certain reference value. Interested friends can refer to it. I hope you will gain a lot after reading this article. Let Xiaobian take you to understand it together.
Practice: Import HBase data into HDFS
What if customer data to be used in MapReduce is combined with data in HDFS? You can write a MapReduce job that takes the HDFS dataset as input and extracts data directly from HBase's map or reduce code. However, in some cases, it may be more useful to dump data from HBase directly to HDFS, especially if the data is planned to be used in multiple MapReduce jobs and the HBase data is immutable or does not change frequently.
problem
Import HBase data into HDFS
solutions
HBase contains an Export class that can be used to import HBase data into HDFS in SequenceFile format. This technique also describes HBase data codes that can be used to read imports.
discussed
Before you can start using this technique, you need to get HBase up and running.
To be able to export data from HBase, you first need to load some data into HBase. The loader creates an HBase table named stocks_example, which contains details of a column family. We store the HBase data as Avro binary serialized data, so this part of the code is not listed.
Run the loader and use it to load sample data into HBase:
Load results can be viewed using the HBase shell. The list command (without any arguments) displays all the tables in HBase, while the scan command with a single argument dumps all the contents of the table:
With this data, we can export it to HDFS. HBase comes with an org.apache.hadoop.hbase. mareduce.Export class that dumps HBase tables. The following code snippet shows an example of using the Export class. Using this command, you can export the entire HBase table:
The Export class also supports exporting only a single column family and compressing the output:
The Export class writes the HBase output in SequenceFile format, where the HBase line key is stored in the SequenceFile record key using org.apache.hadoop.hbase.io. ImmutableBytesWriteable, and the HBase value is stored in the SequenceFile record value using org.apache.hadoop.hbase.client.Result.
What if exported data is processed in HDFS? The following list shows an example of how to read HBase SequenceFile and extract Avro records.
Code 5.3 Read HBase SequenceFile to extract Avro records
You can run the code against the HDFS directory for export and see the results:
The HBaseExportedStockReader class can read and dump SequenceFile contents used by the Export class of HBase.
The built-in HBase Export class makes it easier to export data from HBase to HDFS. But what if instead of writing HBase data to HDFS, you want to process it directly in a MapReduce job? Let's see how HBase can be used as a data source for MapReduce jobs.
Practice: Using HBase as a Data Source for MapReduce
The built-in HBase exporter uses SequenceFile to output HBase data, which is not supported by programming languages other than Java and does not support schema evolution. It only supports Hadoop file systems as data sinks. If you want more control over HBase data extraction, you may need additional HBase tools.
problem
You want to operate on HBase directly in MapReduce jobs without the intermediate step of copying data to HDFS.
solutions
HBase has a TableInputFormat class that can be used in MapReduce jobs to extract data directly from HBase.
discussed
HBase provides an InputFormat class called TableInputFormat that can be used as a data source in MapReduce. The following code shows a MapReduce job that reads data from HBase using this input format (called via TableMapReduceUtil.initTableMapperJob).
Code 5.4 Import HBase Data into HDFS Using MapReduce
You can run this MapReduce job as follows:
$ hip hip.ch6.hbase.ImportMapReduce --output output
A quick look at HDFS tells MapReduce whether the job is working as expected:
This output confirms that the MapReduce job works as expected.
Thank you for reading this article carefully. I hope that the article "How to import HBase data into HDFS" shared by Xiaobian will be helpful to everyone. At the same time, I hope that everyone will support you a lot and pay attention to the industry information channel. More relevant knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.