What are the basic operations of hive 07/06 Update SLTechnology News&Howtos

What are the basic operations of hive

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Editor to share with you what the basic operation of hive, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to understand it!

Preface

Hive supports common SQL statements, but it also has its own unique parts. Common SQL statements are consistent with syntax in relational databases such as MySQL, so I won't elaborate here. Let's focus on the different parts of Hive.

All operations of Hive can be viewed on the official website of Hive. Click here to LanguageManual

Hive is actually not difficult, and it will be easier for people who have written SQL statements. When learning Hive, it is recommended that everyone knock while learning and make more contact.

Preparatory work

Since wants to use Hive, it is necessary to have a Hive environment. According to this blog, you can install Hive and the environment related to Hive (Hadoop, Yarn) big data biosphere.

here I use the remote mode based on Mysql to operate Hive.

Operation flow

Create the first table

After the connection enters the Hive shell, we first build a new database, and then we practice most of the operations in this database.

Create:

Create database hive_test

Use the database:

Use hive_test

then, create the first table gfstbl, or you can use the database. The format of the table (for example, hive_test.gfstbl).

CREATE TABLE gfstbl (

Id INT

Name STRING

Age INT

Gfs ARRAY

Address MAP

Info STRUCT

)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY''

COLLECTION ITEMS TERMINATED BY','

MAP KEYS TERMINATED BY':'

LINES TERMINATED BY'\ n'

explains:

In the create statement, fields are separated by commas and field names and types are separated by spaces. The data types of fields here are integers, strings, arrays, key-value pairs, and structures.

Data types supported by Hive:

Data type comment

Tinyint short integer, range:-128 to 127

Smallint Mini Integer,-32768mm 32767

Int integer,-232mm 232-1

Bigint big integer, it's very big.

Float single precision floating point number

Double double precision floating point number

Decimal precision 38-bit decimal number

Date date

Timestamp timestamp

Char character

String string

Varchar variable length string

Array array type

Map key-value pair type

Structs structured type

ROW FORMAT DELIMITED: line format delimited

FIELDS TERMINATED BY'': separated by spaces between fields

COLLECTION ITEMS TERMINATED BY',': collections (which are the data here) are separated by commas

MAP KEYS TERMINATED BY':: key-value pairs are separated by colons

LINES TERMINATED BY'\ n': records are separated by newline characters

Each SQL statement needs to end with a semicolon

In addition, if you remove the semicolon and add: LOCATION "/ test";, you can set the location of the source data in HDFS, if not set in the Hive work directory area by default.

The SQL statement formula for to create a table is as follows, which we will operate separately according to the type of Hive table.

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name-(Note: TEMPORARY available in Hive 0.14.0 and later)

[(col_name data_type [COMMENT col_comment],... [constraint_specification])]

[COMMENT table_comment]

[PARTITIONED BY (col_name data_type [COMMENT col_comment],...)]

[CLUSTERED BY (col_name, col_name,...) [SORTED BY (col_name [ASC | DESC],...)] INTO num_buckets BUCKETS]

[SKEWED BY (col_name, col_name,...)-- (Note: Available in Hive 0.10.0 and later)]

ON ((col_value, col_value,...), (col_value, col_value,...),...)

[STORED AS DIRECTORIES]

[

[ROW FORMAT row_format]

[STORED AS file_format]

| | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]-(Note: Available in Hive 0.6.0 and later) |

]

[LOCATION hdfs_path]

[TBLPROPERTIES (property_name=property_value,...)]-(Note: Available in Hive 0.6.0 and later)

[AS select_statement];-(Note: Available in Hive 0.5.0 and later; not supported for external tables)

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name

LIKE existing_table_or_view_name

[LOCATION hdfs_path]

Data_type

: primitive_type

| | array_type |

| | map_type |

| | struct_type |

| | union_type-(Note: Available in Hive 0.7.0 and later) |

Primitive_type

: TINYINT

| | SMALLINT |

| | INT |

| | BIGINT |

| | BOOLEAN |

| | FLOAT |

| | DOUBLE |

| | DOUBLE PRECISION-(Note: Available in Hive 2.2.0 and later) |

| | STRING |

| | BINARY-(Note: Available in Hive 0.8.0 and later) |

| | TIMESTAMP-(Note: Available in Hive 0.8.0 and later) |

| | DECIMAL-(Note: Available in Hive 0.11.0 and later) |

| | DECIMAL (precision, scale)-- (Note: Available in Hive 0.13.0 and later) |

| | DATE-(Note: Available in Hive 0.12.0 and later) |

| | VARCHAR-(Note: Available in Hive 0.12.0 and later) |

| | CHAR-(Note: Available in Hive 0.13.0 and later) |

Array_type

: ARRAY

< data_type >

Map_type

: MAP

< primitive_type, data_type >

Struct_type

: STRUCT

< col_name : data_type [COMMENT col_comment], ...>

Union_type

: UNIONTYPE

< data_type, data_type, ... >

-(Note: Available in Hive 0.7.0 and later)

Row_format

: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]

[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]

[NULL DEFINED AS char]-(Note: Available in Hive 0.13 and later)

| | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value,...)] |

File_format:

: SEQUENCEFILE

| | TEXTFILE-(Default, depending on hive.default.fileformat configuration) |

| | RCFILE-(Note: Available in Hive 0.6.0 and later) |

| | ORC-(Note: Available in Hive 0.11.0 and later) |

| | PARQUET-(Note: Available in Hive 0.13.0 and later) |

| | AVRO-(Note: Available in Hive 0.14.0 and later) |

| | JSONFILE-(Note: Available in Hive 4.0.0 and later) |

| | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname |

Constraint_specification:

: [, PRIMARY KEY (col_name,...) DISABLE NOVALIDATE]

[, CONSTRAINT constraint_name FOREIGN KEY (col_name,...) REFERENCES table_name (col_name,...) DISABLE NOVALIDATE

View the structure of the first table

looks at the formula for the table structure:

DESCRIBE [EXTENDED | FORMATTED] table_name

EXTENDED minimalist display

FORMATTED formatted to display

only looks at the fields and types of the table, and the result is as follows

DESC gfstbl

DESCRIBE gfstbl

minimally looks at the structure of the table, and the result is as follows

DESCRIBE EXTENDED gfstbl

View the table in format, and the result is as follows

DESCRIBE FORMATTED gfstbl

Load data in the table

Data to be loaded

The contents of data are as follows:

001 zhangsan 19 Girl1,Girl2,Girl3 Girl1:qingdao,Girl2:jinan,Girl3:nanjing zhongguo,shandong,weifang

002 lisi 20 GirlA,GirlB GirlA:American,GirlB: Changsha zhongguo,shandong,qingdao

copies these two lines of data into the gfs.txt file

Vim / root/gfs.txt

Then copy and paste the data

Import data into the Hive table

Load data local inpath'/ root/gfs.txt' into table hive_test.gfstbl

should note that inserting data in load mode will not start the MR task. The essence of load is to upload files to the hdfs directory.

Three ways to view data

Use select statements

Select * from gfstbl

two。 Cluster operation via hdfs

Hdfs dfs-ls / user/hive_meta/warehouse/hive_test.db/gfstbl

Hdfs dfs-cat / user/hive_meta/warehouse/hive_test.db/gfstbl/*

3. Through the webUI page

enter the webUI page to find the corresponding location of the file.

View an array, a key-value pair, a value of structural data

uses the form of column name [index] for arrays (index subscript starts at 0)

uses for key-value pairs: column name ["Key"]

uses: column names for structures. Attribute

Select gfs [0], address ["Girl2"], info.city from gfstbl

How to delete a table

Drop table gfstbl2

note here that if you delete the table's directory through the hdfs operation, the table will still exist in hive. The reason is that the metadata information of the table is also stored in mysql's hive_meta database.

Knowledge accumulation

Three ways to create tables in Hive

Create it by writing a complete CREATE statement, as we did when we created gfstbl above:

CREATE TABLE gfstbl (

Id INT

Name STRING

Age INT

Gfs ARRAY

Address MAP

Info STRUCT

)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY''

COLLECTION ITEMS TERMINATED BY','

MAP KEYS TERMINATED BY':'

LINES TERMINATED BY'\ n'

Create a new table with exactly the same structure as the original table but no data by using the CREATE and LIKE keywords:

Create table gfstbl1 like gfstbl

Create a new table with not only the same table structure but also the same data by using the CREATE, AS, and SELECT keywords:

Create table gfstbl2 AS SELECT id,name,gfs,address,info from gfstbl

note that because this method of inserting data uses insert to insert data, the MR task is started.

Four ways of inserting data into Hive Table

Insert new data, the simplest, most basic and least used way, directly write insert statements.

Insert into table (col...) Values (val...)

one

Load is the most commonly used and most efficient way

LOAD DATA command, which can be divided into LOAD DATA LOCAL INPATH and LOAD DATA INPATH. The difference between the two is that LOCAL imports local files instead of HDFS files without LOCAL.

LOAD DATA INPATH'/ a.txt' INTO TABLE user

one

Query (select) other table data insert to the new table

INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2...)] Select_statement1 FROM from_statement

FROM from_statement

INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2...) [IF NOT EXISTS]] select_statement1

conventionally uses the second syntax, that is, from statements in advance, reducing the redundancy of SQL code.

From user

Insert into user_count

Select count (*)

Insert into user_copy

Select *

Directly use the hdfs command to place the formatted files in the table in the directory corresponding to HDFS

Types of tables in Hive

Internal table

is similar to Table in database in concept, each internal table has a corresponding directory to store data in Hive, and the data of all tables (except External Table) are stored in this directory. Such as the gfstbl table created above.

When deletes a table, both metadata and data are deleted.

Temporary watch

The life cycle of the temporary table is a session, and when the session ends, the temporary table is automatically deleted.

External table

When deletes a table, the metadata is deleted, but the data (stored in HDFS) is not deleted.

Partition table

stores a batch of data into multiple directories to prevent violent scanning. Partition tables are divided into static partition tables and dynamic partition tables.

Bucket table

bucket table is a way to hash column values and store different data in different files. The hash value of the column is divided by the number of buckets to determine which bucket each data is divided into. For each table and partition in hive, the bucket can be further divided.

The above is all the content of this article "what are the basic operations of hive?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.