In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly shows you "what are the common Sql in Hive", the content is simple and clear, and I hope it can help you solve your doubts. Let the editor lead you to study and learn this article "what is the common Sql in Hive?"
Data base
Show databases
CREATE DATABASE IF NOT EXISTS test
Drop database test
Use test
Build a table
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment],...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment],...)]
[CLUSTERED BY (col_name, col_name,...)
[SORTED BY (col_name [ASC | DESC],...)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
CREATE TABLE creates a table with a specified name. If a table with the same name already exists, an exception is thrown; the user can use the IF NOT EXIST option to ignore the exception
The EXTERNAL keyword allows the user to create an external table and specify a path to the actual data (LOCATION) while creating the table.
LIKE allows users to copy existing table structures, but not data
COMMENT can add descriptions to tables and fields
ROW FORMAT
DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
| | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value,...)] |
Users can customize SerDe or use their own SerDe when creating tables. If no ROW FORMAT or ROW FORMAT DELIMITED is specified, the built-in SerDe will be used. When you create a table, you also need to specify columns for the table. When you specify the columns of the table, you will also specify a custom SerDe,Hive to determine the data of the specific columns of the table through SerDe.
STORED AS
SEQUENCEFILE
| | TEXTFILE |
| | RCFILE |
| | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname |
If the file data is plain text, you can use STORED AS TEXTFILE. If the data needs to be compressed, use STORED AS SEQUENCE.
Field types supported by hive
TINYINT
SMALLINT
INT
BIGINT
BOOLEAN
FLOAT
DOUBLE
STRING
Create a simple table
CREATE TABLE IF NOT EXISTS pokes (foo STRING, bar STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY','
STORED AS TEXTFILE
Create an external table
CREATE EXTERNAL TABLE pokes (foo STRING, bar STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY','
STORED AS TEXTFILE
LOCATION'/ test/pokes'
Build partition table
CREATE TABLE IF NOT EXISTS invites (foo STRING, bar STRING)
PARTITIONED BY (d STRING,s STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY','
STORED AS TEXTFILE
Build Bucket table
CREATE TABLE IF NOT EXISTS buckets (foo STRING, bar STRING)
CLUSTERED BY (foo) into 4 buckets
ROW FORMAT DELIMITED FIELDS TERMINATED BY','
STORED AS TEXTFILE
Copy an empty table
CREATE TABLE invites_copy LIKE invites
Create tables and import data from other tables (mapreduce)
CREATE TABLE parts AS SELECT * FROM invites
Hbase table
CREATE EXTERNAL TABLE workStatisticsNone (
Id string
Num int
) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ": key,f:c")
TBLPROPERTIES ("hbase.table.name" = "workStatisticsNone", "hbase.mapred.output.outputtable" = "workStatisticsNone")
Delete tabl
Drop table pokes
Drop table invites
Modify table structure
Add / replace / modify columns
ALTER TABLE table_name ADD | REPLACE COLUMNS (col_name data_ type [comment col_comment],...)
ALTER TABLE pokes ADD COLUMNS (d STRING COMMENT'd comment')
ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENTcol_comment] [FIRST | (AFTER column_name)]
Alter table pokes change d s string comment 'change column name' first
Change the table name:
ALTER TABLE pokes RENAME TO poke
Repair the table partition:
MSCK REPAIR TABLE invites
ALTER TABLE invites RECOVER PARTITIONS
Create / delete views
CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT column_comment],...)] [COMMENT view_comment] [TBLPROPERTIES (property_name = property_value,...)] AS SELECT
Create view v_invites (foo,bar) as select foo,bar from invites
DROP VIEW v_invites
Show command
SHOW TABLES
SHOW TABLES'. * swords; (regular expression)
Desc pokes
SHOW FUNCTIONS
DESCRIBE FUNCTION
DESCRIBE FUNCTION EXTENDED
Load data
Load data to the specified table
LOAD DATA LOCAL INPATH 'kv.txt' OVERWRITE INTO TABLE pokes
LOAD DATA LOCAL INPATH 'kv1.txt' INTO TABLE pokes
LOAD DATA INPATH'/ test/kv.txt' INTO TABLE pokes
LOAD DATA INPATH'/ test/kv.txt' INTO TABLE pokes
The keyword [OVERWRITE] means to overwrite the data in the original table. If you don't write it, it won't be overwritten.
The keyword [LOCAL] refers to the file that you load from a local file, or hdfs if you don't write it.
Load to the partition of the specified table
LOAD DATA LOCAL INPATH 'kv.txt' OVERWRITE INTO TABLE invites PARTITION (dumbbell 1)
LOAD DATA LOCAL INPATH 'kv1.txt' INTO TABLE invites PARTITION (dumbbell 1)
LOAD DATA LOCAL INPATH 'kv.txt' OVERWRITE INTO TABLE invites PARTITION (dudes 1 girls 2')
Query results are imported into hive
INSERT overwrite TABLE pokes SELECT foo,bar FROM invites; overwrites the files in the corresponding directory
INSERT INTO TABLE pokes SELECT foo,bar FROM invites
INSERT INTO TABLE invites_copy PARTITION (dudes 1) SELECT * FROM invites
Dynamic partition insertion, off by default
Set hive.exec.dynamic.partition.mode=nonstrict
INSERT INTO TABLE invites_copy PARTITION (dmems) SELECT * FROM invites
Multi-insert mode
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2...)] Select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION...] Select_statement2]...
Query results are written to the file system
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
Insert overwrite local DIRECTORY 'test.txt' select * from invites_copy
Data query
SELECT [ALL | DISTINCT] select_expr, select_expr,...
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list [HAVING condition]]
[CLUSTER BY col_list
| | [DISTRIBUTE BY col_list] [SORT BY | ORDER BY col_list] |
]
[LIMIT number]
Select * from invites limit 2 and 5
The difference between ORDER BY and SORT BY
ORDER BY global sort, with only one Reduce task
SORT BY only sorts on the local machine
Hive distributes data according to the following columns of distribute by and the number of reduce. Hash algorithm is used by default.
Cluster by not only has the function of distribute by, but also has the function of sort by, but sorting can only be in reverse order.
Select * from invites where foo=1 or bar=2
Where condition supports AND,OR, between,IN, NOT IN,EXIST,NOT EXIST
JOIN
Hive only supports equivalent connections (equality joins), external connections (outer joins), and (left semi joins). Hive does not support all non-equivalent connections because they are very difficult to convert to map/reduce tasks
Join on belongs to common join
The most common join strategy, which is not affected by the amount of data, can also be called reduce side join
Left semi joins
Left semi join is a variant of map join (broadcast join). Left semi join passes only the join key of the table to the map phase, and executes map join if the key is small enough, or common join if not, instead of the in condition
Select a.* from invites a left semi join invites_copy b on (a.bar=b.bar)
Map Join
SELECT / * + MAPJOIN (smalltable) * / .key, value
FROM smalltable JOIN bigtable ON smalltable.key = bigtable.key
After 0. 7, there is no need for / * + MAPJOIN (smalltable) * /. This calculation is automated and automatically determines which is a small table and which is a large table.
Whether set hive.auto.convert.join=true; # is automatically converted to mapjoin
The maximum file size of the set hive.mapjoin.smalltable.filesize=300000000; # mini table. Default is 25000000, that is, 25m.
Whether set hive.auto.convert.join.noconditionaltask=true; # merges multiple mapjoin into one
Set hive.auto.convert.join.noconditionaltask.size=300000000
# the maximum value of the total file size of all small tables when more than one mapjoin is converted to 1. For example, a large table sequentially associates three small tables a (10m), b (8m), c (12m)
FULL [OUTER] JOIN does not use MapJoin optimization
Bucket Map Join
When the join key of the two joined tables is bucket column
Hive.optimize.bucketmapjoin= true
The above is all the contents of the article "what are the common Sql in Hive?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.