Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the methods of data import and export in HBase

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "what are the methods of data import and export in HBase". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

1. Snapshot (Snapshots) mode

HBase Snapshots allows you to take a snapshot of a table (that is, an available copy), it does not have a significant impact on Region Servers, and it does not include data copies for replication and restore operations. Exporting snapshots to another cluster also has no impact on Region Servers. Data migration using snapshots is as follows:

1. Enable snapshot support. Versions after 0.95 + are enabled by default, and those after 0.94.6 + are disabled by default.

Hbase.snapshot.enabled

True

2. Take a snapshot of the table, regardless of whether the table is enabled or disabled, this operation will not copy the data

$. / bin/hbase shell

Hbase > snapshot 'myTable',' myTableSnapshot-122112'

3. List the snapshots that already exist

$. / bin/hbase shell

Hbase > list_snapshots

4. Delete a snapshot

$. / bin/hbase shell

Hbase > delete_snapshot 'myTableSnapshot-122112'

5. Generate a new table from snapshot replication

$. / bin/hbase shell

Hbase > clone_snapshot 'myTableSnapshot-122112',' myNewTestTable'

6. To restore data with a snapshot, you need to disable the table before restoring it.

$. / bin/hbase shell

Hbase > disable 'myTable'

Hbase > restore_snapshot 'myTableSnapshot-122112'

Tip: because the backup (replication) is at the Syslog level and the snapshot is at the file system level, after using the snapshot to restore, the copy will be in a different state from master, if you need to use restore, you have to stop the backup and reset bootstrap.

If the data is lost due to incorrect client behavior, and the full table recovery requires the table to be disabled, you can use a snapshot to generate a new table, and then copy the required data from the new table to the main table with map-reduce.

7. Copy to other clusters

This operation should be performed using hbase's account, and there should be a temporary directory established by hbase's account in hdfs (hbase.tmp.dir parameter control)

Use 16 mappers to copy a snapshot named MySnapshot to a cluster called srv2

$bin/hbase class org.apache.hadoop.hbase.snapshot.ExportSnapshot-snapshot MySnapshot-copy-to hdfs://srv2:8020/hbase-mappers 16

Limit bandwidth consumption

The exported snapshot, by specifying the-bandwidth parameter, which requires an integer representing megabytes per second, can limit bandwidth consumption. The following example is limited to 200 MB / s in the above embodiments.

$bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot-snapshot MySnapshot-copy-to hdfs://srv2:8082/hbase-mappers 16-bandwidth 200

II. HBase built-in tools (Export/Import)

HBase provides a built-in export utility that makes it easy to import data from the hbase table into the SequenceFiles file in the HDFS directory. The tool creates a map reduce task that invokes the cluster through a series of HBase API, gets each row of data from the specified table, and writes the data to the specified HDFS directory. This tool is performance-intensive for clusters because it uses mapreduce and HBase client API. However, it is rich in functionality, supports version or date range, and supports data filtering, thus making incremental backups available.

The following is a sample process for HBASE import and export:

1. Change to the directory of hbase bin, export the table replication in HBase to HDFS, and execute:

Hbase org.apache.hadoop.hbase.mapreduce.Export test_table/data/test_table

In the above, test_table is the name of the table to be exported in HBase, and / data/test_table is the location in hadoop fs (hadoop file system).

Optional parameters for export description:

Versions (optional): number of exported versions

Starttime (optional): the start time of the exported data (note that the start time here refers to the timestamp of the data. For example, the date I passed in yesterday represents the data exported after yesterday).

Endtime (optional): the end time of the exported data (also refers to the timestamp of the data)

By default, the export utility only exports, regardless of storing the latest version of a given cell. To export multiple versions, replace the required number of versions.

Note: the cache input scan is configured in the task configuration through hbase.client.scanner.caching.

2. Change to the hadoop bin directory and copy the files in hadoop HDFS to the local linux path:

Hadoop fs-get / data/test_table ~ /

In the above, / data/test_table is the path in the hadoop HDFS file system and ~ / is the local linux path.

3. Change to the hadoop bin directory and copy the local files in linux to Hadoop HDFS:

Hadoop fs-put ~ / test_table / data/

Where ~ / test_table is the local Linux file and / data/ is the Hadoop HDFS file path

4. After copying the data, you need to create the table to import the data and enter the hbase shell environment:

Create'test_table','test_family'

Note: when creating a table, you need to specify at least one column cluster

5. Import the files in Hadoop HDFS into the table of the specified HBase:

Hbase org.apache.hadoop.hbase.mapreduce.Import test_table / data/test_table

III. ImportTsv import

ImportTsv is a command line tool provided by Hbase. You can easily import the data files of custom delimiters (default\ t) stored on HDFS into the HBase table with one command. It is very useful for importing large amounts of data, including two ways to import data into the HBase table:

The first is to insert data into reduce using TableOutputformat

The second is to convert the file into a HFile format, and then execute a command called CompleteBulkLoad to move the file to the HBase tablespace directory and provide it to the client query.

For example, suppose we load data into a table named "test" with ColumnFamily named "D" two columns "C1" and "C2".

$bin/hbase

$create 'test','d'

Suppose the content of an input file is:

Row1,c1,c2

Row2,c1,c2

Row3,c1,c2

Row4,c1,c2

Row5,c1,c2

Row6,c1,c2

Row7,c1,c2

Row8,c1,c2

Row9,c1,c2

Row10,c1,c2

Upload the file to the / tmp directory of hdfs and name it data.txt

For ImportTsv to use this input file, it must look like this on the command line:

Hbase org.apache.hadoop.hbase.mapreduce.ImportTsv-Dimporttsv.separator=','-Dimporttsv.columns=HBASE_ROW_KEY,d:c1,d:c2 test / tmp/data.txt

Other options that can be specified with-D include:

-Dimporttsv.skip.bad.lines=false-failure if invalid rows are encountered

-Dimporttsv.separator= |-Delimiter in place of tabs in the file

-Dimporttsv.timestamp=currentTimeAsLong-imports using the specified timestamp

-Dimporttsv.mapper.class=my.Mapper-use the user-specified Mapper class instead of the default org.apache.hadoop.hbase.mapreduce.TsvImporterMapper

And in this example, the first column is rowkey, which is why HBASE_ROW_KEY is used. The second and third columns are: "d:c1" and "d:c2". If you have prepared a large amount of data for bulk loading, make sure that the table of the target HBase is pre-partitioned.

If data exists in the table, it is imported as an append.

Import data using bulkload

Use importTSV to generate HFile:

Hbase org.apache.hadoop.hbase.mapreduce.ImportTsv-Dimporttsv.separator= ","-Dimporttsv.bulk.output=/tmp/zhangrenhua/hbase-Dimporttsv.columns=HBASE_ROW_KEY,d:c1,d:c2 test / tmp/data.txt

Import HFile into Hbase:

Hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles / tmp/zhangrenhua/hbase test

Note:

When using ImportTsv, be sure to pay attention to the configuration of the parameter importtsv.bulk.output. Generally speaking, using Bulk output is more friendly to Regionserver. Loading data in this way takes up almost no computing resources of Regionserver, because it just moves the HFile file on the HDFS, and then tells HMaster to bring one or more region of the Regionserver online.

This is the end of the content of "what are the methods of data import and export in HBase". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report