In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Create a database
Create database if not exists sopdm
Comment 'this is test database'
With dbproperties ('creator'='gxw','date'='2014-11-12')
-- Database key-value pair attribute information
Location'/ my/preferred/directory'
Show all tables
Show tables
Display the description information of the table
Desc [extended,formatted] tablename
Show table statement
Show create table tablename
Delete tabl
Drop table tablename
Create another table from one table, which is equivalent to replication, table structure replication, data not replicated
Create table test3 like test2
Create tables by other table queries
Create table test4 as select name,addr from test5
Stored as textfile
You can view it directly
Stored as sequencefile
Must be viewed with hadoop fs-text
Stored as rcfile
Hive-service rcfilecat path view
Stored as inputformat 'class' (custom)
Load jar package
Shell window add jar path (scope of this shell)
Loaded into a distributed cache for use by each node
Or copy it directly to the lib directory under the hive installation directory
SerDe (hive uses SerDe to read and write rows of tables)
Read and write order:
HDFS file-- > InputFileFormat-- >-- > Deserializer-- > Row object (for use by hive)
Row object-- > Serializer-- >-- > OutputFileFormat-- > HDFS file
Hive comes with RegexSerDe.class regular expressions that match the data of each row
Create table apachelog (
Host STRING
Identity STRING
User STRING
Time STRING
Request STRING
Status STRING
Size STRING
Refer STRING
Agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex=" ([^] *) ([0-9] *) ([0-9] *) ([^] *) ([^] *) "
) stored AS TEXTFILE
Partitioned table (equivalent to a subdirectory of the table)
Create table tablename (name string) partitioned by (key type,...)
Create external table employees (
Name string
Salary float
Subordinates array
Deductions map
Address struct / home/data/result
Hive command line enter list jar
Show which jar packages are available in the current distributed cache (i.e. jar packages loaded by the add jar command)
The command line executes the hql file (similar to hive-f), which is often used to set initialization parameters
Source / home/data/hql/select_hql
Configuration variable
Set val=''
Hql uses the hive variable
${hiveconf:val}
Select * from testtext where name ='${hiveconf:val}'
Env view linux environment variables
HOME=/root
Hql uses linux environment variables
Select'${env:HOME} 'from testtext
1. Inner table data loading (overwrite and into cannot exist at the same time, only one can exist)
(1) load when creating a table
Create table newtable as select col1,col2 from oldtable
(2) specify the data location when creating the table (have ownership of the data under the location, and delete the data when the inner table is deleted)
Create table tablename () location''
(3) Local data loading
Load data local inpath 'localpath' [overwrite] into table tablename
(4) load hdfs data (move the data, move the original data to the location of the table)
Load data inpath 'hdfspath' [overwrite] into table tablename
The hive command line executes the linux shell command, preceded by a!
! ls / home/data
(5) load data through query statements
Insert into table test_m select name,addr from testtext where name = 'wer'
Or
From testtext insert into table test_m select name,addr where name = 'wer'
Or
Select name,addr from testtext where name = 'wer' insert into table test_m
two。 External table data loading
(1) specify the data location when creating the table
Create external table tablename () location''
(2) query insertion, same as the inner table
(3) use the hadoop command to copy data to the specified location (shell execution in hive and shell execution in linux)
3. Partition table data loading
(1) data loading of internal partition table is similar to that of internal table.
(2) the loading mode of the table data of the outer part is similar to that of the appearance.
Note: the path hierarchy of the data storage should be the same as the partition of the table, and the table should add the corresponding partition to find the data.
Load data local inpath 'localpath' [overwrite] into table tablename partition (dt='20140905')
4. Data type correspondence problem
Load data. When field types cannot be converted to each other, the query returns NULL
Select query input, when field types cannot be converted to each other, insert data as NULL (file save is\ N)
Select query input data, field names can be inconsistent, data loading does not check, check when querying
Data export
Export to local, default delimiter ^ A
Insert overwrite local directory'/ home/data3'
Row format delimited fields terminated by'\ t'
Select name,addr from testtext
Export to hdfs. Row format delimited fields terminated by'\ tcodes is not supported. The default delimiter is I.
Insert overwrite directory'/ home/data3'
Select name,addr from testtext
Table attribute operation
1. Modify table name
Alter table table_name rename to new_table_name
two。 Modify column name
Alter table tablename change column C1 c2 int comment 'xxx' after severity
C1 old column, c2 new column, int represents the new column data type
After severity; can place the column after the specified column, or use 'first' to put it in the first place
3. Add columns (default add columns to the end)
Alter table tablename add column (C1 string comment 'xxxx',c2 string comment' yyyy')
4. Modify tblproperties
Alter table test set tblproperties ('comment'='xxxx')
5. Modify the delimiter (partition table is special)
Mode one
Alter table city set serdeproperties ('field.delim'='\ t'); (invalid for raw data of the partition table, valid for new partitions)
Method 2 also uses the latest delimiters for the raw data of the partitioned table
Alter table city partitin (dt='20140908') set serdeproperties ('field.delim'='\ t')
6. Modify location
Alter table city [partition (...)] Set location' hdfs://master:9000/location'
7. Conversion of internal and external tables
Alter table test set tblproperties ('EXTERNAL'='TRUE'); internal table to external table
Alter table test set tblproperties ('EXTERNAL'='FALSE'); external table to internal table
Aggregation operation
1.count count
All fields of count (*) are not all null, and all are null without adding 1.
Count (1) No matter what the record is, add 1 as long as there is this record.
Count (col) column is not empty plus 1
2.sum summation
Sum (value that can be converted to a number) returns bigint
Sum (col) + cast (1 as bigint)
3.avg
Avg (value that can be converted to a number) returns double
The where condition is executed on the map side
Group by is the combination of columns that perform grouping on the reduce side as key
The judgment is performed after the having sentence aggregation operation, and also on the reduce side.
Groupby data skew optimization
Hive.groupby.skewindata=true; (more than one job)
Join operation (ordinary join does not support unequal links)
Optimize the parameter set hive.optimize.skewjoin=true
Sample
Select m.col as col,m.col2 as col2,n.col3 as col3
From
(select col,col2
From test
Where... (executed on map side)
) m
[left outer | right outer | left semi] join
N (right table)
On m.col=n.col
Where condition (executed on reduce side)
LEFT SEMI JOIN is a more efficient implementation of IN/EXISTS subquery
Hive currently does not implement IN/EXISTS subqueries, so you can rewrite your subqueries with LEFT SEMI JOIN. The limitation of LEFT SEMI JOIN is that the table on the right in the JOIN clause can only be in the
Filter conditions are set in the ON clause, but not in the WHERE clause, SELECT clause, or anywhere else.
SELECT a.key, a.value
FROM a
WHERE a.key in
(SELECT b.key
FROM b)
Can be rewritten as:
SELECT a.key, a.val
FROM a LEFT SEMI JOIN b on (a.key = b.key)
Order by global sort, with only one reduce
Distribute by col distributes data to different reduce according to col
Combined with sort by to ensure that each reduce output is orderly
Union all (no weight removal) and union (weight removal)
Hive only supports union all, and aliases are not allowed for subqueries.
Select col from
(select an as col from T1
Union all
Select b as col from t2
) tmp
Request:
1. The field name is the same
two。 Same field type
3. The number of fields is the same
4. Child tables cannot have aliases
5. If you need to query data from the merged table, the merged table must have an alias
Hive creates an index table
Hive > create index user_index on table user (id) > as' org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' > with deferred rebuild > IN TABLE user_index_table; hive > alter index user_index on user rebuild; hive > select * from user_index_table limit 5
After you directly use the hadoop command to copy and delete hive stored data, you need add partition or alter to synchronize the source data information, otherwise the metadata metastore will be queried during operations such as drop table, and an error will be reported if the metastore information is inconsistent with the hdfs information. This can also be regarded as hive's bug, which has not yet been fixed, but it can also be understood that direct manipulation of hdfs data is not recommended in the original intention.
Currently, you can use commands:
MSCK REPAIR TABLE table_name
This command adds the corresponding partition to the data that is not added to the partition. Synchronize source data information metadata.
Recover Partitions
Hive stores a list of partitions for each table in metastore, and if new partitions are added to HDFS, metastore will not notice these partitions unless
ALTER TABLE table_name ADD PARTITION
Of course you can pass.
MSCK REPAIR TABLE table_name
Similar to the following command in the EMR version
ALTER TABLE table_name RECOVER PARTITIONS
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.