In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces the example analysis of management table, external table and partition table in Hive. The article is very detailed and has a certain reference value. Interested friends must read it!
* manage tables
The tables we have created so far are all management tables, also known as internal tables
-- Hive controls the lifecycle of managing table data, and Hive defaults to storing table data in the / user/hive/ warehouse subdirectory
When you delete a management table, Hive also deletes the data in that table
-- Managing tables is not convenient to share data with other tools
Eg: we have a piece of data created by Pig or other tools and used primarily by this tool, and we also want to use Hive to execute queries. We can create an external table pointing to this data without ownership.
* external table
-- Files are located in / data/test of the distributed file system
Eg:CREATE EXTERNAL TABLE IF NOT EXISTS app (
Hour string
Name string
Pv string
Uv string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY','
LOCATION'/ data/test'
Analyze the above statements:
1. The keyword EXTERNAL tells Hive that the table is external, and LOCATION tells Hive under which path the data is located.
two。 Because it is an external table, Hive does not believe that it fully owns the data, and deleting the table will not delete the data, only the metadata information that describes the table
* manage tables VS external tables
You can use the output of the DESCRIBE EXTENDED tablename statement to see whether the table is a management table or an external table
For the management table, you can see the following information
... TableType:MANAGED_TABLE)
For external tables, you can see the following information
... TableType:EXTERNAL_TABLE)
Note: if the statement omits the EXTERNAL keyword and the source table is an external table, then the new table generated is also an external table
If the statement omits the EXTERNAL keyword and the source table is a management table, then the new table generated is also a management table
If the statement has the EXTERNAL keyword and the source table is a management table, the new table generated is an external table
* Partition management table
-- both management tables and external tables can be partitioned
Eg:CREATE TABLE IF NOT EXISTS tmp.table1 (
UserId string COMMENT 'user ID'
Name string COMMENT 'user name'
Createtime string COMMENT 'creation time'
)
PARTITIONED BY (country string,state string)
Partitioned tables change the way Hive organizes data storage. If we create this table under the tmp library, there will be only one table1 directory corresponding to the table:
/ user/hive/warehouse/tmp/table1
However, Hive will build subdirectories that reflect the partition structure under the table directory
Eg:/table1/country=CA/state=AB
/ table1/country=CA/state=BC
...
/ table1/country=US/state=AL
/ table1/country=US/state=AK
...
These are the actual directory names, and the state directory will contain zero or more files that store user information for those states.
Once a partition field is created, it behaves just like an ordinary field, and you don't need to care about whether it is a partition field unless you optimize query performance
If you want to check users in a country, you only need to scan the corresponding directory in that country.
-- View all partitions that exist in the table
Eg: hive > SHOW PARTITIONS table1
Country=CA/state=AB
Country=CA/state=BC
...
Country=US/state=AL
Country=US/state=AK
...
View a specific partition
Eg: hive > SHOW PARTITIONS table1 PARTITION (country='US')
Country=US/state=AL
Country=US/state=AK
...
* external partition table
-- create an external partition table
Eg: CREATE EXTERNAL TABLE IF NOT EXISTS app (
Hour string
Name string
Pv string
Uv string)
PARTITIONED BY (timetype string,clct_day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY'\ t'
-- increase the value of the specified partition
Eg: ALTER TABLE app ADD PARTITION (timetype=hour, clct_day='2018-07-26')
LOCATION'/ data/test/table1/hour/'2018-07-26'
* customize the storage format of the table
The default storage format for Hive is a text file format, or it can be specified in STORED AS TEXTFILE, and various delimiters are specified when creating the table
Using TEXTFILE means that each line is considered a separate record
It can also be saved to other file formats supported by Hive, including SEQUENCEFILE and RCFILE, both of which use binary encoding and compression to optimize disk space usage and bandwidth performance
Eg: CREATE TABLE IF NOT EXISTS tmp.table1 (
UserId string COMMENT 'user ID'
Name string COMMENT 'user name'
Createtime string COMMENT 'creation time')
ROW FORMAT DELIMITED FIELDS TERMINATED BY'\ t'
STORED AS TEXTFILE
These are all the contents of the article "sample Analysis of Management tables, external tables and Partition tables in Hive". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.