Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the storage formats of hive files

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly shows you the "what are the hive file storage formats", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "what are the hive file storage formats" this article.

Test comparison of Hive File Storage format

1.textfile

Textfile is the default format

Storage method: row storage

Disk overhead big data is expensive to parse

The compressed text file hive cannot be merged and split

2.sequencefile

Binary files, serialized into files in the form of

Storage method: row storage

Divisible compression

Generally choose block compression

The advantage is that the file and the mapfile in hadoop api are compatible.

3.rcfile

Storage method: data is stored in rows, blocks, and columns.

Compressed fast column access

Reading records involves as few block as possible.

You only need to read the header definition of each row group to read the required columns.

The operational performance of reading full data may not have obvious advantages over sequencefile.

4.orc

Storage method: data is stored in rows, blocks, and columns.

Compressed fast column access

It is more efficient than rcfile and is an improved version of rcfile

5. Custom format

Users can customize the input and output format by implementing inputformat and outputformat.

Summary:

Textfile consumes a lot of storage space, and compressed text can not split and merge queries with the lowest efficiency, so it can be stored directly, and the speed of loading data is the highest.

Sequencefile consumes the most storage space, and compressed files can be split and merged with high query efficiency, so it needs to be loaded through text file conversion.

Rcfile has the smallest storage space and the highest query efficiency. It needs to be loaded through the conversion of text files, and the loading speed is the lowest.

Personal advice: if you can not use text,seqfile, try not to use it. It is best to choose orc.

The above is all the contents of the article "what are the hive file storage formats?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report