What are the data types of Hive 04/20 Update SLTechnology News&Howtos

What are the data types of Hive

2025-04-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)05/31 Report--

Most people don't understand the knowledge points of this article "What are Hive data types?" Therefore, Xiaobian summarizes the following contents for everyone. The contents are detailed, the steps are clear, and they have certain reference value. I hope everyone can gain something after reading this article. Let's take a look at this article "What are Hive data types?"

I. Introduction to Hive

hive: A statistical tool open-sourced by Facebook to solve massive structured logs.

Hive is a Hadoop-based data warehouse tool that maps structured data files into a table and provides SQL-like query capabilities.

Hive's advantages and disadvantages

Pros:

Similar to SQL statements, easy to learn

Avoid writing MapReduce and reduce developer learning costs

Hive execution delay is relatively high, so Hive is often used for data analysis, and the real-time requirements are not high.

Hive has the advantage of processing big data, but it has no advantage for processing small data because Hive has higher execution latency.

Hive supports user-defined functions. Users can implement their own functions according to their own needs.

Disadvantages:

Hive has limited ability to express HQL

Hive is less efficient.

Hive is essentially an MR

Hive Architecture Hive User Interface

Hive CLI(Hive Command Line) Hive Command Line

HWI(Hive Web Interface) HiveWeb Interface

Hive provides Thrift services, or Hiveserver.

Three storage modes of Hive metadata

Single-user mode: Hive is installed using Derby database to store metadata by default, so Hive cannot be invoked concurrently.

Multi-user mode: MySQL server stores metadata

Remote Server Mode: Starting MetaStoreServer

Hive Data Storage

Hive data can be distinguished into table data and metadata. Table data we all know is the data in tables, while metadata is used to store table names, columns, table partitions, and attributes.

Hive is based on Hadoop distributed file storage, and its data is stored in HDFS. Now let's introduce some common data import methods in Hive

Import data from local file system to Hive

Import data from HDFS to Hive tables

Query the corresponding data from other tables and import it into Hive table

When creating a table, query the corresponding records from other tables and insert them into the created table

#1.1 create table student(id string, name string) row format delimited fields terminated by '\t';#1.2 load local files into hive load data local inpath '/root/student.txt' into table default.student; #default.test database. Table name can also be directly table name #2. Demonstration loading HDFS file into hive #2.1 Upload file to HDFS root directory dfs -put/root/student. txt/;#2.2 Load data on HDFS load data inpath '/student.txt' into table test.student;#3. Load data overwrite original data in table #3.1 Upload file to HDFS dfs -put/root/student. txt/;#Loading a file under a table is equivalent to cutting in Windows #3.2 Load data overwrite original data in a table load data inpath '/student.txt' overwrite into table test.student;#4. Query table select * from student;#Insert data into table by query statement (insert)#1.1 Create table student_par (id int,name String)row format delimited fields terminated by '\t';#1.2 Insert data via insert insert into table student_par values(1,' zhangsan'),(2,'lisi'); Architecture principles

user interface

CLI (command-line interface), JDBC/ODBC(jdbc access hive), WEBUI (browser access hive)

metadata

Metadata includes: table name, database to which the table belongs (default), owner of the table, column/partition field, type of table (whether it is an external table), directory where the data of the table is located, etc.

Hadoop

HDFS for storage and MapReduce for computation.

Driver: Driver

(1) SQL Parser: converts SQL strings into abstract syntax tree AST, which is generally completed with third-party tool libraries, such as antlr; performs syntax analysis on AST, such as whether tables exist, whether fields exist, and whether SQL semantics are incorrect.

(2) Compiler (Physical Plan): compile AST to generate logical execution plan.

Query Optimizer: Optimize logical execution plans.

Execution: Transforming logical execution plans into operational physical plans. For Hive, it's MR/Spark.

Hive file format

TextFile

This is the default file format. Data will not be compressed, disk overhead, data parsing overhead is also large.

SequenceFile

This is a binary file support provided by the Hadoo API, serialized into files in binary form.

RCFile

This format is the storage method of row and column storage structure.

ORC

The Optimized Row Columnar ORC file format is a columnar storage format in the Hadoop ecosystem.

Advantages of ORC:

List storage with multiple file compression options

Files are divisible.

Multiple indexes are provided

Support complex data structures such as Map

ORC file format is stored in binary mode, so it is not directly readable.

Hive essence

Convert HQL to MapReduce.

Hive processes data stored on HDFS

The underlying implementation of Hive analytics data is MapReduce.

Executor runs on Yarn

How Hive works

Hive is simply a query engine. When Hive receives a SQL statement, it does the following:

Lexical analysis and syntax analysis. Parse SQL statements into abstract syntax trees using antlr

Semantic analysis. Get metadata information from MetaStore, interpret table name, column name and data type in SQL statement

Logical plan generation. Generative logic plan gets operator tree

Logic plan optimization. Optimizing the operator tree

Physical plan generation. Physical plan of DAG composed of MapReduce tasks generated from logical plan

Physical plan execution. Send DAG to Hadoop cluster for execution

Return query results.

Hive presents MapReduce tasks designed to components:

Metastore: This component stores information about tables in Hive, including tables, table partitions, schemas, columns and their types, table mapping relationships, etc.

Drivers: Components that control the HiveQL lifecycle

Query Editor

execution engine

Hive Server

Client component provides command line interfaces Hive CLI, Web UI, JDBC driver, etc.

Hive data types

Hive supports two data types, an atomic data type and a complex data type.

Basic data type type description example TINYINT1 byte has a matching integer 1SMALLINT2 byte signed integer 1INT4 byte signed integer 1BIGINT8 byte signed integer 1FLOAT4 byte single-precision floating-point number 1.0 DOUBLE8 byte double-precision floating-point number 1.0 BOOLEANtrue/falsetrueSTRING string "hive",'hive'

The String data type in Hive is similar to VARCHAR in MySQL. The type is a mutable string.

Hive supports data type conversion. Hive is written in Java, so the data type conversion rules follow Java:

Implicit conversion--> Small to Large

Forced conversion--> Big to small

Type Description Example ARRAY Ordered fields. Character types must be the same ARRAY(1,2)MAP Unordered key-value pairs. The type must be atomic and the value can be any type. Map ('a ', 1,'b', 2) STRUCTURE A named set of fields. Field types can be different STRUCTURE ('a ', 1,1,0)

The above is the content of this article about "What are Hive data types?" I believe everyone has a certain understanding. I hope the content shared by Xiaobian will be helpful to everyone. If you want to know more relevant knowledge content, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.