In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly shows you "what the Hive storage format is like", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "what the Hive storage format is like" this article.
The default storage format for Hive is text file format, which can also be explicitly specified by the optional clause STORED AS TEXTFILE, and users can also specify a variety of delimiters when creating tables. Here we re-show the ods.ods_ log table that we discussed earlier:
CREATE TABLE ods.ods_login (`uuid` string, `event` string, `time` string) ROW FORMAT DELIMITEDFIELDS TERMINATED BY *\ 001*COLLECTION ITEMS TERMINATED BY'\ 002*MAP KEYS TERMINATED BY'\ 003'LINES TERMINATED BY *\ n'STORED AS TEXTFILE
TEXTFILE means that all fields are encoded with letters, numbers, and characters, including those international character sets, although we can find that Hive defaults to using invisible characters as "\ 001" (delimiters). Using TEXTFILE means that each line is considered a separate record.
Users can replace TEXTFILE with other built-in file formats supported by Hive, such as orc,parquet, which use binary encoding and compression (optional) to optimize disk space usage and Imax O bandwidth performance.
Compare the common Hive storage formats:
Each line of TextFile is a record, and each line ends with a newline character (\ n). The data is not compressed, the cost of disk is high, and the cost of data analysis is high. It can be used in combination with Gzip and Bzip2 (the system automatically checks and decompresses the query automatically), but in this way, hive does not split the data, so it cannot operate on the data in parallel.
SequenceFile is a kind of binary file support provided by Hadoop API, which is easy to use, divisible and compressible. Three compression options are supported: NONE, RECORD, BLOCK. Record compression ratio is low, it is generally recommended to use BLOCK compression. Storage space consumption is the largest, compressed files can be split and merged with high query efficiency, and need to be loaded through text file conversion
RCFile is a storage method that combines row and column storage. First of all, it divides the data into rows to ensure that the same record is on the same block, avoiding the need to read multiple block to read a record. Secondly, block data column storage is beneficial to data compression and fast column access.
AVRO is an open source project that provides data serialization and data exchange services for Hadoop. You can exchange data between the Hadoop ecosystem and programs written in any programming language. Avro is one of the popular file formats in applications based on big data Hadoop.
The ORC file represents the optimized columnar file format. The ORC file format provides an efficient way to store data in Hive tables. This file system is actually designed to overcome the limitations of other Hive file formats. Using ORC files can improve performance when Hive reads, writes, and processes data from large tables. Fast compression, fast column access, higher efficiency than rcfile, is an improved version of rcfile
Parquet is a column-oriented binary file format. Parquet is efficient for the type of large query. Parquet is particularly useful for queries that scan specific columns in a specific table. Compression Snappy,gzip; is used for Parquet desk.At present, Snappy compression is lower than ORC,Parquet compression, query efficiency is low, and update, insert and ACID are not supported. But Parquet supports the Impala query engine
The above is all the content of this article "what is the Hive storage format?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.