HIVE data type and storage format 04/02 Update SLTechnology News&Howtos

HIVE data type and storage format

2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Https://www.cnblogs.com/qingyunzong/category/1191578.html

I. data type

1. Basic data types

Hive supports most basic data types in relational data

Boolean true/false TRUE

Signed integer of tinyint 1 byte-128 to 127 1Y

Smallint 2-byte signed integer,-32768mm 32767 1S

Int 4-byte signed integer 1

Bigint 8-byte signed integer 1L

Float 4-byte single-precision floating point number 1.0

Double 8-byte double-precision floating-point number 1.0

Deicimal signed decimal 1.0 with arbitrary precision

String string, variable length "a", "b"

Varchar variable length string "a",'b'

Char fixed-length strings "a",'b'

Binary byte array cannot be represented

Timestamp timestamp with nanosecond precision of 122327493795

Date date '2018-04-07'

Like other SQL languages, these are reserved words. It is important to note that all of these data types are implementations of interfaces in Java, so the specific behavior details of these types are exactly the same as those in Java. For example, the string type implements String,float in Java, float in Java, and so on.

2. Complex types

Array ordered sets of the same type array (1)

Map key-value,key must be a primitive type, and value can be of any type map.

A collection of struct fields, which can be of different types, struct (1), named_stract ('col1','1','col2',1,'clo3',1.0).

II. Storage format

Hive creates a directory on HDFS for each database created, the tables of the database are stored as subdirectories, and the data in the table is stored as files under the table directory. For default databases, the default database does not have its own directory, and tables under the default database are stored in the / user/hive/warehouse directory by default.

(1) textfile

Textfile is the default format and is stored in line storage. The data is not compressed, the cost of disk is high, and the cost of data analysis is high.

(2) SequenceFile

SequenceFile is a kind of binary file support provided by Hadoop API, which is easy to use, divisible and compressible.

SequenceFile supports three compression options: NONE, RECORD, and BLOCK. Record compression ratio is low, it is generally recommended to use BLOCK compression.

(3) RCFile

The utility model relates to a storage mode that combines row and column storage.

(4) ORCFile

Data is divided into blocks by rows, each block is stored in columns, and each block is stored with an index. The new format given by hive belongs to the upgraded version of RCFILE, the performance has been greatly improved, and the data can be compressed and stored, compressed fast column access.

(5) Parquet

Parquet is also a kind of row storage with good compression performance and can reduce a lot of table scan and deserialization time.

III. Data format

When data is stored in a text file, rows and columns must be distinguished according to a certain format, and these distinguishers must be indicated in the Hive. By default, Hive uses a few characters that are rarely seen, and these characters generally do not appear in the record as content.

The default row and column delimiters for Hive are shown in the following table.

Separator

Description

\ nfor text files, each line is a record, so\ nSegment the record.

^ A (Ctrl+A) split fields, which can also be represented by\ 001

^ B (Ctrl+B) is used to split elements in Arrary or Struct, or between keys and values in map, or it can be split with\ 002.

^ C is used for self-separation of keys and values in map, or can be represented by\ 003.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.