In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly shows you "what is the grammar of Hive", which is easy to understand and clear. I hope it can help you solve your doubts. Let me lead you to study and learn this article "what is the grammar of Hive?"
Hive grammar learning
1. Hive tables are divided into external tables, partition tables and buckets.
External tables: external tables in Hive are similar to tables, but their data is not placed in the directory to which their own tables belong
Instead, it is stored somewhere else, and the advantage is that if you want to delete this external table, the data that the external table points to is
It will not be deleted, it will only delete the metadata corresponding to the external table; and if you want to delete the table, the corresponding
Any data including metadata will be deleted.
B, partition: in Hive, each partition of a table corresponds to the corresponding directory under the table, and the data of all partitions are stored
In the corresponding directory. For example, if the wyp table has two partitions, dt and city, then the corresponding dt=20131218,city=BJ table
The directory is / user/hive/warehouse/dt=20131218/city=BJ, and all data belonging to this partition is stored
Put it in this directory.
C, bucket: calculates its hash for the specified column and splits the data according to the hash value, in order to be parallel, each bucket pair
Should be a file (note the difference between partition and partition). For example, if you divide the id column of the wyp table into 16 buckets, first of all, the id column
Value calculates hash, corresponding to the HDFS directory of data stores with hash values of 0 and 16
HDFS entries for data stores with: / user/hive/warehouse/wyp/part-00000; and a hash value of 2
Recorded as: / user/hive/warehouse/wyp/part-00002.
2. SORT BY only sorts locally, ORDER BY sorts globally, and there is only one Reduce task
3. Hive supports select query
A) Select * from user; does not perform mapreduce tasks
B) Select count (*) from user; performs mapreduce tasks
C) Select * from user limit 2; do not perform mapreduce tasks
D) Select * from user sort by user_id desc limit 2; execute mapreduce tasks
E) regular column, unsuccessful
F) Select * from user where user_id=100636 executes conditional query and reports an error
Problem tracking, found to be hive-1.1.1 's problem, https://issues.apache.org/jira/browse/HIVE-12872
Solution: change the environment variable to hive-1.2.0, and the following problem occurs again
The table name user is special in hive-1.2.0. After re-establishing the table user_info, the query is successful.
Build a table sentence:
Create table user_info (user_id int,cid string,ckid string,username string)
Row format delimited fields terminated by't 'lines terminated by'\ n'
Load data:
Load data local inpath'/ opt/data.txt' into table user_info
G) output the query results to the local directory
Perform mapreduce tasks
H) grouping queries (performing mapreduce tasks)
J) enter the query results into another table (the number of mapreduce tasks is the same as the number of fields)
K) failed to insert, the number of columns of the query is different from that of the table
L) query based on Partition
i. In general, a SELECT query scans the entire table and uses the PARTITIONED BY clause to build the table.
You can take advantage of the characteristics of input pruning.
ii. The current implementation of Hive is that only partition assertions appear on the WHERE closest to the FROM clause
Clause, partition pruning is enabled
M) load partition data
I. load data local inpath'/ home/hadoop/input/hive/partitions/file1' into table logs
Partition (dt='2001-01-01-01)
N) Hive does not support update and delete statements, see the following figure
I. Update
ii. Delete
O) insert in Hive can edit files to import data, and update and delete can rewrite data.
P) Union query in Hive (execute a mapreduce task)
Q) when querying Join, there are several key points to pay attention to
i. Only equivalent join is supported
1. SELECT a.* FROM a JOIN b ON (a.id = b.id)
2. SELECT a.* FROM a JOIN b
ON (a.id = b.id AND a.department = b.department)
ii. You can join more than 2 tables, for example
SELECT a.val, b.val, c.val FROM a JOIN b
ON (a.key = b.key1) JOIN c ON (c.key = b.key2)
iii. If the join key of multiple tables in join is the same, the join is converted to a single map/reduce task
iv. LEFT,RIGHT and FULL OUTER
Examples
SELECT a.val, b.val FROM a LEFT OUTER JOIN b ON (a.key=b.key)
v. If you want to limit the output of join, you should write the filter condition in the WHERE clause-or in the join clause
Write in a sentence
The confusing problem is the situation of table partitioning.
SELECT c.val, d.val FROM c LEFT OUTER JOIN d ON (c.key=d.key)
WHERE a.ds='2010-07-07 'AND b.ds='2010-07-07'
vi. If no record for the corresponding c table is found in the d table, all columns of the d table list NULL, including the ds column.
That is, join will filter all records in the d table that cannot be found matching the c table join key. In that case,
LEFT OUTER makes the query result irrelevant to the WHERE clause.
Solution.
SELECT c.val, d.val FROM c LEFT OUTER JOIN d
ON (c.key=d.key AND d.ds='2009-07-07 'AND c.ds='2009-07-07')
vii. LEFT SEMI JOIN
1. The restriction of LEFT SEMI JOIN is that the table on the right in the JOIN clause can only set filtering in the ON clause
Condition, which cannot be filtered in WHERE clause, SELECT clause or other place
R) Hive does not support equivalent connections (test hive-1.2.1 support)
I. in SQL, the inlining of two tables can be written as:
Select * from dual a dint dual b where a.key = b.key
ii. In Hive should be
Select * from dual a join dual b on a.key = b.key
Instead of the traditional format:
SELECT t1.a1 as c1, t2.b1 as c2FROM t1, t2
WHERE t1.a2 = t2.b2
S) semicolon character, hql ends with a semicolon, but the understanding of the semicolon is not very wise
Solution: replace the semicolon ASCII code
T) Hive does not support inserting data into existing tables or partitions, (tested)
Only overwrite and rewrite the entire table is supported
Insert into overwrite table a select * from b
U) hive does not support INSERT INTO table Values (), UPDATE, DELETE operations (tested)
V) hive supports embedding mapreduce programs to handle complex logic
W) hive supports writing converted data directly to different tables, as well as to partitions, hdfs, and local directories
X) the method of loading sql text by Hive
I.. / hive-f'/ opt/test'
. / hive enters hive shell, then source'/ opt/test'
The above is all the content of this article "what is the Grammar of Hive?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.