In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Basic syntax of 1.hive: create databases mydb # create database show databases # view all libraries use mydb # switch database create table t_user (id int, name string,age int) # create table create table t_user (id int, name string,age int) row format delimited fields terminated by 'delimiter' # specify the table creation statement insert into table t_user values (value 1, value 1) Value 1) # insert data select * from t_table # query statement load data inpath 'HDFS path' into table t_name # Import data in hdfs load data local inpath' linux path' into table t_name # Import Linux data into hive2.hive DDL operation: (1) Operation on hive library:
Build a database
Create database if not exists myhive # if it does not exist, create the database create database if not exists myhive2 localtion 'hdfs path' # specify the location of the library
Check the library:
Show databases; # View all database desc databases dbname in hive; # display database details select current_database (); # View database show create database db_name in use; # View database building statement
Delete the library:
Drop databases db_name restrict;drop database if exists dbname;# Note: by default, hive does not allow you to delete libraries that contain tables. There are two ways: 1. Delete all tables manually, and then delete library 2. Use the cascade keyword: drop database myhive cascade; (2) Operation on the hive table:
Build a table:
Syntax analysis:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data_type [COMMENT col_comment],...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment],...)] [CLUSTERED BY (col_name, col_name,...) [SORTED BY (col_name [ASC | DESC],...)] INTO num_buckets BUCKETS] [ROW FORMAT row_format] [STORED AS file_format] [LOCATION hdfs_path] [EXTERNAL] TABLE # indicates whether an internal table or an external table [IF NOT EXISTS] table_name # prevents errors [(col_name data_type [COMMENT col_comment],...)] # Table field [COMMENT table_comment] # Table description information [PARTITIONED BY (col_name data_type [COMMENT col_comment]) .)] # specify the partition table [CLUSTERED BY (col_name, col_name,...) [SORTED BY (col_name [ASC | DESC],...)] INTO num_buckets BUCKETS] # specify buckets, collation, and number of buckets [ROW FORMAT row_format] # specify delimiter fields terminated by'# specify column separator lines terminated by'# specify row separator [STORED AS file_format] # specify data storage format [LOCATION hdfs_path] # specify data storage directory (used when creating external tables)
Examples of table building:
# Internal table create table if not exists student (id int, name string) row format delimited fields terminated by','# external table create external table if not exists student (id int, name string) row format delimited fields terminated by', 'location' / hive/data/';# partition table create table if not exists student (id int, name string) partitioned by (age int conmment 'partitioned comment') row format delimited fields terminated by',' # the field name of the partition field cannot be any of the fields in the table # create a bucket table create table if not exists student (id int, name string,age int) clustered by (age) sort by (age desc) into 10 buckets row format delimited fields terminated by','; # the bucket field must be a property field in the table # like mode create table student like t_student # copy a table structure, partitioned tables and bucket splitting tables can also be copied (partitioned tables can only copy the information when the table was created, and the information added later cannot be copied) # CTAScreate table student as select * from t_student # create the table and copy it
Modify table properties
Alter table old_name rename to new_name; # modify table name alter table t_name set tb_properties (property_name=roperty_val) # modify table attribute alter table t_name set serdeproperties ('field.delim'='-') # modify column delimiter alter table t_name add cloumns (f_name type) # add a field alter table t_name drop # hive itself does not support alter table t_name replace columns (id int Name string) # replace all columns alter table t_name change old_field_name new_field_name type [first | after field] # modify the name of the field, Type and location # next is the operation on the partition table: alter table t_name add partition (partition field = 'value') # add a partition alter table t_name add partition (partition field =' value') partition (partition field = 'value') # add multiple partition alter table t_name drop partition (partition field =' value') # delete partition alter table t_name partition (partition field = 'value') set location 'hdfs path' # modify the partition path alter table t_name partition (partition field =' value') enable no_drop # prevent partition from being deleted alter table t_name partition (partition field = 'value') enable offline # prevent partition from being queried
Delete the table:
Drop tab;e if exists t_name; # Delete the table Note: 1. Internal table deletion: both metadata and data are deleted by 2. Internal table deletion: both metadata and data are deleted 3. 5%. Partition table (internal table): all partitions are deleted, metadata and data are deleted 4. 5%. There is no difference between the deletion of the bucket table and the deletion of the regular table truncate table t_name; # clear the contents of the table
Common actions on tables:
Show tables; # View all tables under the library show partitions tsignname; # View the partition show partitions table name partition (partition field = 'value') # View a partition desc tsignname; # View table details desc extended table name # view table details desc formatted table name # View table details 3.hive DML operation: (1) data loading: load data local inpath' name # Local import local data inpath 'hdfs path' into table t_name # Import from hdfs # Note: if you import an internal table in hdfs, the original data will be moved to the load data local inpath' linux path' overwrite into table table name under the corresponding table directory # overwrite import (2) insert data:
Note:
Insert into / indicates the append operation insert overwrite / / indicates the override insert operation insert into table t_name (fields1,fields2,fields3) values (value1,value2,value3) # inserts a piece of data insert into table t_name select * from tt_name; # uses the query to import the results into the table # multiple inserts of insert into talbe student_ptn partition (department='SC') select id, name,age, sex from student where dept=' SC' Insert into talbe student_ptn partition (department='AC') select id, name,age, sex from student where dept=' AC';insert into talbe student_ptn partition (department='ES') select id, name,age, sex from student where dept=' ES'; the above method is to use a single sql to query the table, but here every sql needs to scan all the data in the student table, which is too inefficient! Conversion: from student insert into table student_ptn partition (department='SC') select id, name,age, sex where dept=' SC'; insert into talbe student_ptn partition (department='AC') select id, name,age, sex where dept=' AC';insert into talbe student_ptn partition (department='ES') select id, name,age, sex where dept=' ES' To process the data in this way, you only need to scan the table once, and the whole MR program is an input with multiple outputs. If the specified partition does not exist, it will be automatically created when the statement is executed. # data insertion of sub-bucket table, where sub-bucket table can only insert data using insert * from table name is the same as ordinary insert * * split-bucket principle: the hashcode value of sub-bucket field% number of sub-buckets = the same values in a group
Dynamic partition insertion and static partition insertion:
Static partition insertion: the definition of the data to be inserted is manually specified (the partition is specified before insertion)
Dynamic partition insertion: used to solve the shortcomings of static partition insertion. Judge according to the value of a partition field, and each time a different value is encountered, the current program makes its own judgment to create the corresponding partition.
For example:
# static partition insert: load data local inpath "path" into table table name partition (dpt='') insert into talbe student_ptn partition (department='SC') select id, name,age, sex where dept=' SC';# dynamic partition insert: insert into table t_name partition (field name) select * from tt_name # the last field of the table queried here needs to be a partition field. # Multi-partition dynamic insertion: insert into table stu_ptn01 partition (sex,department) select id,name,age,sex,department from student_manager; # as long as the last few fields of the query field are partition fields, the order cannot be reversed
Note: if you want to use dynamic partition insertion, you need to enable several parameters in hive:
Set hive.exec.dynamic.partiton=true; # turns on the dynamic partition switch set hive.exec.dynamic.partition.mode=nonstrict; # turns off illegal constraints for dynamic partition insertion.
The difference between static partition insertion and dynamic partition insertion:
-static partitions need to specify the name of the partition after inserting data, while dynamic partitions do not
-there may be a partition in a static partition that has no data, and the directory of the partition is an empty directory. In dynamic partitions, partitions are generated based on the actual data, and each partition has at least one piece of data.
-3) when dynamic partitioning, each partition corresponds to the number of reducetask set in the configuration file
Set reducetask=3
(3) data export: # single export insert overwrite local directory 'linux path' select * from tweenametrap # multiple export DQL operation of from t_name insert overwrite local directory' linux path' select * where... insert overwrite local directory 'linux path' select * where...4.hive:
The writing order of the query statement is: select fields... From [join] where group by having order by limit
The execution order of query statements: from-join-group by-having-select-ordey by-limit
(1) join in hive
Features:
Connections in -Hive. Only equivalent connections are supported, not non-equivalent connections.
And connection in -Hive, or is not supported
-Hive supports multi-table associations, but Cartesian products are avoided when making associations in hive.
-Hive supports in and exists, but is very inefficient
For example:
# connecting to select a.id aid,a.name name,b.id bid,b.score score from test_a an inner join test_b b on a.id=b.id (intersection) # left outer link: based on the table on the left side of join, all the data of the table on the left side will show that what can be associated on the right will not be able to complement the null supplementary select a.id aid,a.name name,b.id bid,b.score score from test_a a left join test_b b on a.id=b.id. # right outer link: based on the table on the right side of join, select a.id aid,a.name name,b.id bid,b.score score from test_a a right join test_b b on a.idbearb.idscape # full external link: take the union of two tables select a.id aid,a.name name,b.id bid,b.score score from test_a a full join test_b b on a.id=b.id # semi-join, which means that the inner join takes the data from the left half of the table, and the associated data select * from test_a a left semi join test_b b on a.id=b.id appears in the left table in the right table.
About the characteristics of left semi join:
Left semi join is a more advanced operation on exists/in in hive.
The limitation of -left semi join is that the table on the right in the JOIN clause can only set filtering conditions in the ON clause, not in the WHERE clause, SELECT clause, or anywhere else.
-left semi join is the phase that only passes the join key of the table to map, so the result of the last select in left semi join can only appear in the left table.
-because left semi join is in (keySet), the left table will skip when it encounters duplicate records in the right table, while join will traverse all the time. This causes left semi join to generate only one entry if there are duplicates in the right table, and join will generate multiple entries, which will also lead to higher performance of left semi join.
(2) sorting in hive
Order by
Features: Bureau ranking
Example: select * from table name order by field desc; (sort in descending order)
Sort by
Features: sort by is a local sort, which is sorted in each reduce. When the number of reduceTask is 1, it is the same as the global sort.
Principle: sort by allocates randomly selected fields when dividing the data in each reduceTask.
Example: select * from table name sort by field
Distribute by
Features: separate buckets according to the specified fields and sort in each bucket.
Example 1:select * from table name distribute by field (field hash% number of buckets)
For example, 2:select * from table name distribute by sub-bucket field sort by sort field # sort each bucket according to the specified field
Cluster by
Features: separate buckets and sort
Example: select * from table name cluster by bucket sort field
Note: when the bucket field and sort field are the same: distribute by+ sort by= cluster by, otherwise distribute by+ sort by is more powerful!
(3) the difference between union and union all
Union and union all: both concatenate the query results (the structure of the two joined tables must be the same)
Select * from xxx union selecet * from xxxselect * from xxx union all selecet * from xxx
Union: indicates de-reconnection
Union all: indicates no reconnection
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.