Import HBase data into ImportTsv 07/06 Update SLTechnology News&Howtos

Import HBase data into ImportTsv

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

The ImportTsv tool is done through map reduce. So start yarn. The tool uses the jar package, so be careful to configure classpath. ImportTsv inserts data through hbase api by default

[hadoop-user@rhel work] $cat / home/hadoop-user/.bash_profile

# .bash _ profile

# Get the aliases and functions

If [- f ~ / .bashrc]; then

. ~ / .bashrc

# User specific environment and startup programs

PATH=$PATH:$HOME/bin

Export PATH

JAVA_HOME=/usr/java/jdk1.8.0_171-amd64

PATH=$PATH:$JAVA_HOME/bin

CLASSPATH=$CLASSPATH:$JAVA_HOME/lib

HADOOP_HOME=/home/hadoop-user/hadoop-2.8.0

PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib

HBASE_HOME=/home/hadoop-user/hbase-2.0.0

PATH=$PATH:$HBASE_HOME/bin

CLASSPATH=$CLASSPATH:$HBASE_HOME/lib

ZOOKEEPER_HOME=/home/hadoop-user/zookeeper-3.4.12

PATH=$PATH:$ZOOKEEPER_HOME/bin

PHOENIX_HOME=/home/hadoop-user/apache-phoenix-5.0.0-alpha-HBase-2.0-bin

PATH=$PATH:$PHOENIX_HOME/bin

Export PATH

Create a tabl

Hbase (main): 033 create 0 > test','cf'

Create a file to import

[hadoop-user@rhel work] $cat / home/hadoop-user/work/sample1.csv

Row10, "mjj10"

Row11, "mjj11"

Row12, "mjj12"

Row14, "mjj13"

Put files into hdfs

[hadoop-user@rhel work] $hdfs dfs-put / home/hadoop-user/work/sample1.csv / sample1.csv

ImportTsv Import Command

Hbase org.apache.hadoop.hbase.mapreduce.ImportTsv-Dimporttsv.separator= ","-Dimporttsv.columns=HBASE_ROW_KEY,cf:a test / sample1.csv

Note: HBASE_ROW_KEY indicates the location of the file rowid, followed by the definition of the column. This means that the imported column family is cf and the column name is a. The file to be imported is hdfs / sample1.csv

Explanation in help

Usage: importtsv-Dimporttsv.columns=a,b,c

Imports the given input directory of TSV data into the specified table.

The column names of the TSV data must be specified using the-Dimporttsv.columns

Option. This option takes the form of comma-separated column names, where each

Column name is either a simple columnfamily, or a columnfamily:qualifier. The special

Column name HBASE_ROW_KEY is used to designate that this column should be used

As the row key for each imported record. You must specify exactly one column

To be the row key, and you must specify a column name for every column that exists in the

Input data. Another special columnHBASE_TS_KEY designates that this column should be

Used as timestamp for each record. Unlike HBASE_ROW_KEY, HBASE_TS_KEY is optional.

You must specify at most one column as timestamp key for each imported record.

Record with invalid timestamps (blank, non-numeric) will be treated as bad record.

Note: if you use this option, then 'importtsv.timestamp' option will be ignored.

Note: the content imported by ImportTsv cannot be seen by phoenix. In fact, tables created by hbase are not visible to Phoenix. The table created by phoenix can be seen by hbase, but the content is encoded.

The importtsv tool uses hbase put api to import data by default. When using the option-Dimporttsv.bulk.output, it will be converted to the internal format of the HFILE file.

The importtsv tool, by default, uses the HBase Put API to insert data into the HBase

Table using TableOutputFormat in its map phase. But when the-Dimporttsv.bulk.output option is specified, it instead generates HBase internal format (HFile) files on HDFS

By using HFileOutputFormat. Therefore, we can then use the completebulkload tool to load the generated files into a running cluster. The following steps are to use the bulk output and load tools:

Generate file command in HFILE format

Hbase org.apache.hadoop.hbase.mapreduce.ImportTsv-Dimporttsv.separator= ","-Dimporttsv.bulk.output=/hfiles_tsv-Dimporttsv.columns=HBASE_ROW_KEY,cf:a test / sample1.csv

Note: generate files in hfile format and store them in the / hfile_tsv directory in hdfs. The directory will be created by the command itself.

[hadoop-user@rhel work] $hdfs dfs-ls / hfiles_tsv/cf

18-06-28 10:49:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicable

Found 1 items

-rw-r--r-- 1 hadoop-user supergroup 5125 2018-06-28 10:40 / hfiles_tsv/cf/0e466616d42a4a128fb60caa7dbe075a

Note: the naming format of 0e466616d42a4a128fb60caa7dbe075a is similar to that of region in WEB.

Pass through

Hadoop jar hbase-server-2.0.0.jar completebulkload / hfiles_tsv 'test'

Exception occurred; Exception in thread "main" java.lang.ClassNotFoundException: completebulkload

There are two ways to import HBASE documents:

There are two ways to invoke this utility, with explicit classname and via the driver:

Explicit Classname

$bin/hbase org.apache.hadoop.hbase.tool.LoadIncrementalHFiles

Driver

HADOOP_CLASSPATH= `${HBASE_HOME} / bin/hbase classpath` ${HADOOP_HOME} / bin/hadoop jar ${HBASE_HOME} / hbase-server-VERSION.jar completebulkload

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.