In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Editor to share with you what the basic operation of hive, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to understand it!
Preface
Hive supports common SQL statements, but it also has its own unique parts. Common SQL statements are consistent with syntax in relational databases such as MySQL, so I won't elaborate here. Let's focus on the different parts of Hive.
All operations of Hive can be viewed on the official website of Hive. Click here to LanguageManual
Hive is actually not difficult, and it will be easier for people who have written SQL statements. When learning Hive, it is recommended that everyone knock while learning and make more contact.
Preparatory work
Since wants to use Hive, it is necessary to have a Hive environment. According to this blog, you can install Hive and the environment related to Hive (Hadoop, Yarn) big data biosphere.
here I use the remote mode based on Mysql to operate Hive.
Operation flow
Create the first table
After the connection enters the Hive shell, we first build a new database, and then we practice most of the operations in this database.
Create:
Create database hive_test
Use the database:
Use hive_test
then, create the first table gfstbl, or you can use the database. The format of the table (for example, hive_test.gfstbl).
CREATE TABLE gfstbl (
Id INT
Name STRING
Age INT
Gfs ARRAY
Address MAP
Info STRUCT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY''
COLLECTION ITEMS TERMINATED BY','
MAP KEYS TERMINATED BY':'
LINES TERMINATED BY'\ n'
explains:
In the create statement, fields are separated by commas and field names and types are separated by spaces. The data types of fields here are integers, strings, arrays, key-value pairs, and structures.
Data types supported by Hive:
Data type comment
Tinyint short integer, range:-128 to 127
Smallint Mini Integer,-32768mm 32767
Int integer,-232mm 232-1
Bigint big integer, it's very big.
Float single precision floating point number
Double double precision floating point number
Decimal precision 38-bit decimal number
Date date
Timestamp timestamp
Char character
String string
Varchar variable length string
Array array type
Map key-value pair type
Structs structured type
ROW FORMAT DELIMITED: line format delimited
FIELDS TERMINATED BY'': separated by spaces between fields
COLLECTION ITEMS TERMINATED BY',': collections (which are the data here) are separated by commas
MAP KEYS TERMINATED BY':: key-value pairs are separated by colons
LINES TERMINATED BY'\ n': records are separated by newline characters
Each SQL statement needs to end with a semicolon
In addition, if you remove the semicolon and add: LOCATION "/ test";, you can set the location of the source data in HDFS, if not set in the Hive work directory area by default.
The SQL statement formula for to create a table is as follows, which we will operate separately according to the type of Hive table.
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name-(Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [COMMENT col_comment],... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment],...)]
[CLUSTERED BY (col_name, col_name,...) [SORTED BY (col_name [ASC | DESC],...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name,...)-- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value,...), (col_value, col_value,...),...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]-(Note: Available in Hive 0.6.0 and later) |
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value,...)]-(Note: Available in Hive 0.6.0 and later)
[AS select_statement];-(Note: Available in Hive 0.5.0 and later; not supported for external tables)
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name
LIKE existing_table_or_view_name
[LOCATION hdfs_path]
Data_type
: primitive_type
| | array_type |
| | map_type |
| | struct_type |
| | union_type-(Note: Available in Hive 0.7.0 and later) |
Primitive_type
: TINYINT
| | SMALLINT |
| | INT |
| | BIGINT |
| | BOOLEAN |
| | FLOAT |
| | DOUBLE |
| | DOUBLE PRECISION-(Note: Available in Hive 2.2.0 and later) |
| | STRING |
| | BINARY-(Note: Available in Hive 0.8.0 and later) |
| | TIMESTAMP-(Note: Available in Hive 0.8.0 and later) |
| | DECIMAL-(Note: Available in Hive 0.11.0 and later) |
| | DECIMAL (precision, scale)-- (Note: Available in Hive 0.13.0 and later) |
| | DATE-(Note: Available in Hive 0.12.0 and later) |
| | VARCHAR-(Note: Available in Hive 0.12.0 and later) |
| | CHAR-(Note: Available in Hive 0.13.0 and later) |
Array_type
: ARRAY
< data_type >Map_type
: MAP
< primitive_type, data_type >Struct_type
: STRUCT
< col_name : data_type [COMMENT col_comment], ...>Union_type
: UNIONTYPE
< data_type, data_type, ... >-(Note: Available in Hive 0.7.0 and later)
Row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char]-(Note: Available in Hive 0.13 and later)
| | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value,...)] |
File_format:
: SEQUENCEFILE
| | TEXTFILE-(Default, depending on hive.default.fileformat configuration) |
| | RCFILE-(Note: Available in Hive 0.6.0 and later) |
| | ORC-(Note: Available in Hive 0.11.0 and later) |
| | PARQUET-(Note: Available in Hive 0.13.0 and later) |
| | AVRO-(Note: Available in Hive 0.14.0 and later) |
| | JSONFILE-(Note: Available in Hive 4.0.0 and later) |
| | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname |
Constraint_specification:
: [, PRIMARY KEY (col_name,...) DISABLE NOVALIDATE]
[, CONSTRAINT constraint_name FOREIGN KEY (col_name,...) REFERENCES table_name (col_name,...) DISABLE NOVALIDATE
View the structure of the first table
looks at the formula for the table structure:
DESCRIBE [EXTENDED | FORMATTED] table_name
EXTENDED minimalist display
FORMATTED formatted to display
only looks at the fields and types of the table, and the result is as follows
DESC gfstbl
Or
DESCRIBE gfstbl
minimally looks at the structure of the table, and the result is as follows
DESCRIBE EXTENDED gfstbl
View the table in format, and the result is as follows
DESCRIBE FORMATTED gfstbl
Load data in the table
Data to be loaded
The contents of data are as follows:
001 zhangsan 19 Girl1,Girl2,Girl3 Girl1:qingdao,Girl2:jinan,Girl3:nanjing zhongguo,shandong,weifang
002 lisi 20 GirlA,GirlB GirlA:American,GirlB: Changsha zhongguo,shandong,qingdao
copies these two lines of data into the gfs.txt file
Vim / root/gfs.txt
Then copy and paste the data
Import data into the Hive table
Load data local inpath'/ root/gfs.txt' into table hive_test.gfstbl
should note that inserting data in load mode will not start the MR task. The essence of load is to upload files to the hdfs directory.
Three ways to view data
Use select statements
Select * from gfstbl
two。 Cluster operation via hdfs
Hdfs dfs-ls / user/hive_meta/warehouse/hive_test.db/gfstbl
Hdfs dfs-cat / user/hive_meta/warehouse/hive_test.db/gfstbl/*
3. Through the webUI page
enter the webUI page to find the corresponding location of the file.
View an array, a key-value pair, a value of structural data
uses the form of column name [index] for arrays (index subscript starts at 0)
uses for key-value pairs: column name ["Key"]
uses: column names for structures. Attribute
Select gfs [0], address ["Girl2"], info.city from gfstbl
How to delete a table
Drop table gfstbl2
note here that if you delete the table's directory through the hdfs operation, the table will still exist in hive. The reason is that the metadata information of the table is also stored in mysql's hive_meta database.
Knowledge accumulation
Three ways to create tables in Hive
Create it by writing a complete CREATE statement, as we did when we created gfstbl above:
CREATE TABLE gfstbl (
Id INT
Name STRING
Age INT
Gfs ARRAY
Address MAP
Info STRUCT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY''
COLLECTION ITEMS TERMINATED BY','
MAP KEYS TERMINATED BY':'
LINES TERMINATED BY'\ n'
Create a new table with exactly the same structure as the original table but no data by using the CREATE and LIKE keywords:
Create table gfstbl1 like gfstbl
Create a new table with not only the same table structure but also the same data by using the CREATE, AS, and SELECT keywords:
Create table gfstbl2 AS SELECT id,name,gfs,address,info from gfstbl
note that because this method of inserting data uses insert to insert data, the MR task is started.
Four ways of inserting data into Hive Table
Insert new data, the simplest, most basic and least used way, directly write insert statements.
Insert into table (col...) Values (val...)
one
Load is the most commonly used and most efficient way
LOAD DATA command, which can be divided into LOAD DATA LOCAL INPATH and LOAD DATA INPATH. The difference between the two is that LOCAL imports local files instead of HDFS files without LOCAL.
LOAD DATA INPATH'/ a.txt' INTO TABLE user
one
Query (select) other table data insert to the new table
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2...)] Select_statement1 FROM from_statement
Or
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2...) [IF NOT EXISTS]] select_statement1
conventionally uses the second syntax, that is, from statements in advance, reducing the redundancy of SQL code.
From user
Insert into user_count
Select count (*)
Insert into user_copy
Select *
Directly use the hdfs command to place the formatted files in the table in the directory corresponding to HDFS
Types of tables in Hive
Internal table
is similar to Table in database in concept, each internal table has a corresponding directory to store data in Hive, and the data of all tables (except External Table) are stored in this directory. Such as the gfstbl table created above.
When deletes a table, both metadata and data are deleted.
Temporary watch
The life cycle of the temporary table is a session, and when the session ends, the temporary table is automatically deleted.
External table
When deletes a table, the metadata is deleted, but the data (stored in HDFS) is not deleted.
Partition table
stores a batch of data into multiple directories to prevent violent scanning. Partition tables are divided into static partition tables and dynamic partition tables.
Bucket table
bucket table is a way to hash column values and store different data in different files. The hash value of the column is divided by the number of buckets to determine which bucket each data is divided into. For each table and partition in hive, the bucket can be further divided.
The above is all the content of this article "what are the basic operations of hive?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.