What are the common Sql in Hive 10/26 Update SLTechnology News&Howtos

What are the common Sql in Hive

2025-10-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly shows you "what are the common Sql in Hive", the content is simple and clear, and I hope it can help you solve your doubts. Let the editor lead you to study and learn this article "what is the common Sql in Hive?"

Data base

Show databases

CREATE DATABASE IF NOT EXISTS test

Drop database test

Use test

Build a table

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name

[(col_name data_type [COMMENT col_comment],...)]

[COMMENT table_comment]

[PARTITIONED BY (col_name data_type [COMMENT col_comment],...)]

[CLUSTERED BY (col_name, col_name,...)

[SORTED BY (col_name [ASC | DESC],...)] INTO num_buckets BUCKETS]

[ROW FORMAT row_format]

[STORED AS file_format]

[LOCATION hdfs_path]

CREATE TABLE creates a table with a specified name. If a table with the same name already exists, an exception is thrown; the user can use the IF NOT EXIST option to ignore the exception

The EXTERNAL keyword allows the user to create an external table and specify a path to the actual data (LOCATION) while creating the table.

LIKE allows users to copy existing table structures, but not data

COMMENT can add descriptions to tables and fields

ROW FORMAT

DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]

[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]

| | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value,...)] |

Users can customize SerDe or use their own SerDe when creating tables. If no ROW FORMAT or ROW FORMAT DELIMITED is specified, the built-in SerDe will be used. When you create a table, you also need to specify columns for the table. When you specify the columns of the table, you will also specify a custom SerDe,Hive to determine the data of the specific columns of the table through SerDe.

STORED AS

SEQUENCEFILE

| | TEXTFILE |

| | RCFILE |

| | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname |

If the file data is plain text, you can use STORED AS TEXTFILE. If the data needs to be compressed, use STORED AS SEQUENCE.

Field types supported by hive

TINYINT

SMALLINT

INT

BIGINT

BOOLEAN

FLOAT

DOUBLE

STRING

Create a simple table

CREATE TABLE IF NOT EXISTS pokes (foo STRING, bar STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY','

STORED AS TEXTFILE

Create an external table

CREATE EXTERNAL TABLE pokes (foo STRING, bar STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY','

STORED AS TEXTFILE

LOCATION'/ test/pokes'

Build partition table

CREATE TABLE IF NOT EXISTS invites (foo STRING, bar STRING)

PARTITIONED BY (d STRING,s STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY','

STORED AS TEXTFILE

Build Bucket table

CREATE TABLE IF NOT EXISTS buckets (foo STRING, bar STRING)

CLUSTERED BY (foo) into 4 buckets

ROW FORMAT DELIMITED FIELDS TERMINATED BY','

STORED AS TEXTFILE

Copy an empty table

CREATE TABLE invites_copy LIKE invites

Create tables and import data from other tables (mapreduce)

CREATE TABLE parts AS SELECT * FROM invites

Hbase table

CREATE EXTERNAL TABLE workStatisticsNone (

Id string

Num int

) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES ("hbase.columns.mapping" = ": key,f:c")

TBLPROPERTIES ("hbase.table.name" = "workStatisticsNone", "hbase.mapred.output.outputtable" = "workStatisticsNone")

Delete tabl

Drop table pokes

Drop table invites

Modify table structure

Add / replace / modify columns

ALTER TABLE table_name ADD | REPLACE COLUMNS (col_name data_ type [comment col_comment],...)

ALTER TABLE pokes ADD COLUMNS (d STRING COMMENT'd comment')

ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENTcol_comment] [FIRST | (AFTER column_name)]

Alter table pokes change d s string comment 'change column name' first

Change the table name:

ALTER TABLE pokes RENAME TO poke

Repair the table partition:

MSCK REPAIR TABLE invites

ALTER TABLE invites RECOVER PARTITIONS

Create / delete views

CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT column_comment],...)] [COMMENT view_comment] [TBLPROPERTIES (property_name = property_value,...)] AS SELECT

Create view v_invites (foo,bar) as select foo,bar from invites

DROP VIEW v_invites

Show command

SHOW TABLES

SHOW TABLES'. * swords; (regular expression)

Desc pokes

SHOW FUNCTIONS

DESCRIBE FUNCTION

DESCRIBE FUNCTION EXTENDED

Load data

Load data to the specified table

LOAD DATA LOCAL INPATH 'kv.txt' OVERWRITE INTO TABLE pokes

LOAD DATA LOCAL INPATH 'kv1.txt' INTO TABLE pokes

LOAD DATA INPATH'/ test/kv.txt' INTO TABLE pokes

The keyword [OVERWRITE] means to overwrite the data in the original table. If you don't write it, it won't be overwritten.

The keyword [LOCAL] refers to the file that you load from a local file, or hdfs if you don't write it.

Load to the partition of the specified table

LOAD DATA LOCAL INPATH 'kv.txt' OVERWRITE INTO TABLE invites PARTITION (dumbbell 1)

LOAD DATA LOCAL INPATH 'kv1.txt' INTO TABLE invites PARTITION (dumbbell 1)

LOAD DATA LOCAL INPATH 'kv.txt' OVERWRITE INTO TABLE invites PARTITION (dudes 1 girls 2')

Query results are imported into hive

INSERT overwrite TABLE pokes SELECT foo,bar FROM invites; overwrites the files in the corresponding directory

INSERT INTO TABLE pokes SELECT foo,bar FROM invites

INSERT INTO TABLE invites_copy PARTITION (dudes 1) SELECT * FROM invites

Dynamic partition insertion, off by default

Set hive.exec.dynamic.partition.mode=nonstrict

INSERT INTO TABLE invites_copy PARTITION (dmems) SELECT * FROM invites

Multi-insert mode

FROM from_statement

INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2...)] Select_statement1

[INSERT OVERWRITE TABLE tablename2 [PARTITION...] Select_statement2]...

Query results are written to the file system

INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1

Insert overwrite local DIRECTORY 'test.txt' select * from invites_copy

Data query

SELECT [ALL | DISTINCT] select_expr, select_expr,...

FROM table_reference

[WHERE where_condition]

[GROUP BY col_list [HAVING condition]]

[CLUSTER BY col_list

| | [DISTRIBUTE BY col_list] [SORT BY | ORDER BY col_list] |

]

[LIMIT number]

Select * from invites limit 2 and 5

The difference between ORDER BY and SORT BY

ORDER BY global sort, with only one Reduce task

SORT BY only sorts on the local machine

Hive distributes data according to the following columns of distribute by and the number of reduce. Hash algorithm is used by default.

Cluster by not only has the function of distribute by, but also has the function of sort by, but sorting can only be in reverse order.

Select * from invites where foo=1 or bar=2

Where condition supports AND,OR, between,IN, NOT IN,EXIST,NOT EXIST

JOIN

Hive only supports equivalent connections (equality joins), external connections (outer joins), and (left semi joins). Hive does not support all non-equivalent connections because they are very difficult to convert to map/reduce tasks

Join on belongs to common join

The most common join strategy, which is not affected by the amount of data, can also be called reduce side join

Left semi joins

Left semi join is a variant of map join (broadcast join). Left semi join passes only the join key of the table to the map phase, and executes map join if the key is small enough, or common join if not, instead of the in condition

Select a.* from invites a left semi join invites_copy b on (a.bar=b.bar)

Map Join

SELECT / * + MAPJOIN (smalltable) * / .key, value

FROM smalltable JOIN bigtable ON smalltable.key = bigtable.key

After 0. 7, there is no need for / * + MAPJOIN (smalltable) * /. This calculation is automated and automatically determines which is a small table and which is a large table.

Whether set hive.auto.convert.join=true; # is automatically converted to mapjoin

The maximum file size of the set hive.mapjoin.smalltable.filesize=300000000; # mini table. Default is 25000000, that is, 25m.

Whether set hive.auto.convert.join.noconditionaltask=true; # merges multiple mapjoin into one

Set hive.auto.convert.join.noconditionaltask.size=300000000

# the maximum value of the total file size of all small tables when more than one mapjoin is converted to 1. For example, a large table sequentially associates three small tables a (10m), b (8m), c (12m)

FULL [OUTER] JOIN does not use MapJoin optimization

Bucket Map Join

When the join key of the two joined tables is bucket column

Hive.optimize.bucketmapjoin= true

The above is all the contents of the article "what are the common Sql in Hive?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.