Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to establish the table storage format of hive

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to build table storage format hive," interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let Xiaobian take you to learn "how to establish table storage format in hive"!

hive is a table under construction, you can specify the storage file format by 'STORED AS FILE_FORMAT'

For example:

[plain] view plaincopy

> CREATE EXTERNAL TABLE MYTEST(num INT, name STRING)

> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

> STORED AS TEXTFILE

> LOCATION '/data/test';

Specifies that the file storage format is "TEXTFILE."

Hive file storage formats include the following categories:

TEXTFILE

SEQUENCEFILE

RCFILE

custom format

TEXTFIEL

Default format, data is not compressed, disk overhead, data parsing overhead.

It can be used in combination with Gzip and Bzip2 (system auto-check, auto-decompression when query is executed), but in this way, hive does not slice the data, so it cannot perform parallel operations on the data.

Examples:

[plain] view plaincopy

> create table test1(str STRING)

> STORED AS TEXTFILE;

OK

Time taken: 0.786 seconds

#Write script to generate a random string file, import file:

> LOAD DATA LOCAL INPATH '/home/work/data/test.txt' INTO TABLE test1;

Copying data from file:/home/work/data/test.txt

Copying file: file:/home/work/data/test.txt

Loading data to table default.test1

OK

Time taken: 0.243 seconds

SEQUENCEFILE:

SequenceFile is a binary file support provided by Hadoop API, which has the characteristics of easy to use, divisible and compressible.

SequenceFile supports three compression options: NONE, RECORD, BLOCK. Record compression rate is low, it is generally recommended to use BLOCK compression.

Examples:

[plain] view plaincopy

> create table test2(str STRING)

> STORED AS SEQUENCEFILE;

OK

Time taken: 5.526 seconds

hive> SET hive.exec.compress.output=true;

hive> SET io.seqfile.compression.type=BLOCK;

hive> INSERT OVERWRITE TABLE test2 SELECT * FROM test1;

RCFILE

RCFILE is a storage method combining row and column storage. First of all, it divides the data into blocks according to rows, ensuring that the same record is on a block, avoiding reading multiple blocks for reading a record. Secondly, block data column storage is conducive to data compression and fast column access. RCFILE file example:

Examples:

[plain] view plaincopy

> create table test3(str STRING)

> STORED AS RCFILE;

OK

Time taken: 0.184 seconds

> INSERT OVERWRITE TABLE test3 SELECT * FROM test1;

custom format

When the user's data file format is not recognized by the current Hive, the file format can be customized.

Users can customize input and output formats by implementing inputformat and outputformat, see code:

.\ hive-0.8.1\src\contrib\src\java\org\apache\hadoop\hive\contrib\fileformat\base64

Examples:

table creation

[plain] view plaincopy

> create table test4(str STRING)

> stored as

> inputformat 'org.apache.hadoop.hive.contrib.fileformat.base64.Base64TextInputFormat'

> outputformat 'org.apache.hadoop.hive.contrib.fileformat.base64.Base64TextOutputFormat';

$ cat test1.txt

aGVsbG8saGl2ZQ==

aGVsbG8sd29ybGQ=

aGVsbG8saGFkb29w

test1 file is base64 encoded content, decoded data is:

hello,hive

hello,world

hello,hadoop

Load data and query:

[plain] view plaincopy

hive> LOAD DATA LOCAL INPATH '/home/work/test1.txt' INTO TABLE test4;

Copying data from file:/home/work/test1.txt

Copying file: file:/home/work/test1.txt

Loading data to table default.test4

OK

Time taken: 4.742 seconds

hive> select * from test4;

OK

hello,hive

hello,world

hello,hadoop

Time taken: 1.953 seconds

Summary:

Compared with TEXTFILE and SEQUENCEFILE, RCFILE has better compression ratio and query response due to column storage mode, which consumes more performance when loading data. Data warehouse is characterized by one-write, multi-read, so, overall, RCFILE compared to the other two formats has obvious advantages.

At this point, I believe that everyone has a deeper understanding of "how to establish table storage format hive," may wish to actually operate it! Here is the website, more related content can enter the relevant channels for inquiry, pay attention to us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report