Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of Management Table, external Table and Partition Table in Hive

2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the example analysis of management table, external table and partition table in Hive. The article is very detailed and has a certain reference value. Interested friends must read it!

* manage tables

The tables we have created so far are all management tables, also known as internal tables

-- Hive controls the lifecycle of managing table data, and Hive defaults to storing table data in the / user/hive/ warehouse subdirectory

When you delete a management table, Hive also deletes the data in that table

-- Managing tables is not convenient to share data with other tools

Eg: we have a piece of data created by Pig or other tools and used primarily by this tool, and we also want to use Hive to execute queries. We can create an external table pointing to this data without ownership.

* external table

-- Files are located in / data/test of the distributed file system

Eg:CREATE EXTERNAL TABLE IF NOT EXISTS app (

Hour string

Name string

Pv string

Uv string)

ROW FORMAT DELIMITED FIELDS TERMINATED BY','

LOCATION'/ data/test'

Analyze the above statements:

1. The keyword EXTERNAL tells Hive that the table is external, and LOCATION tells Hive under which path the data is located.

two。 Because it is an external table, Hive does not believe that it fully owns the data, and deleting the table will not delete the data, only the metadata information that describes the table

* manage tables VS external tables

You can use the output of the DESCRIBE EXTENDED tablename statement to see whether the table is a management table or an external table

For the management table, you can see the following information

... TableType:MANAGED_TABLE)

For external tables, you can see the following information

... TableType:EXTERNAL_TABLE)

Note: if the statement omits the EXTERNAL keyword and the source table is an external table, then the new table generated is also an external table

If the statement omits the EXTERNAL keyword and the source table is a management table, then the new table generated is also a management table

If the statement has the EXTERNAL keyword and the source table is a management table, the new table generated is an external table

* Partition management table

-- both management tables and external tables can be partitioned

Eg:CREATE TABLE IF NOT EXISTS tmp.table1 (

UserId string COMMENT 'user ID'

Name string COMMENT 'user name'

Createtime string COMMENT 'creation time'

)

PARTITIONED BY (country string,state string)

Partitioned tables change the way Hive organizes data storage. If we create this table under the tmp library, there will be only one table1 directory corresponding to the table:

/ user/hive/warehouse/tmp/table1

However, Hive will build subdirectories that reflect the partition structure under the table directory

Eg:/table1/country=CA/state=AB

/ table1/country=CA/state=BC

...

/ table1/country=US/state=AL

/ table1/country=US/state=AK

...

These are the actual directory names, and the state directory will contain zero or more files that store user information for those states.

Once a partition field is created, it behaves just like an ordinary field, and you don't need to care about whether it is a partition field unless you optimize query performance

If you want to check users in a country, you only need to scan the corresponding directory in that country.

-- View all partitions that exist in the table

Eg: hive > SHOW PARTITIONS table1

Country=CA/state=AB

Country=CA/state=BC

...

Country=US/state=AL

Country=US/state=AK

...

View a specific partition

Eg: hive > SHOW PARTITIONS table1 PARTITION (country='US')

Country=US/state=AL

Country=US/state=AK

...

* external partition table

-- create an external partition table

Eg: CREATE EXTERNAL TABLE IF NOT EXISTS app (

Hour string

Name string

Pv string

Uv string)

PARTITIONED BY (timetype string,clct_day string)

ROW FORMAT DELIMITED FIELDS TERMINATED BY'\ t'

-- increase the value of the specified partition

Eg: ALTER TABLE app ADD PARTITION (timetype=hour, clct_day='2018-07-26')

LOCATION'/ data/test/table1/hour/'2018-07-26'

* customize the storage format of the table

The default storage format for Hive is a text file format, or it can be specified in STORED AS TEXTFILE, and various delimiters are specified when creating the table

Using TEXTFILE means that each line is considered a separate record

It can also be saved to other file formats supported by Hive, including SEQUENCEFILE and RCFILE, both of which use binary encoding and compression to optimize disk space usage and bandwidth performance

Eg: CREATE TABLE IF NOT EXISTS tmp.table1 (

UserId string COMMENT 'user ID'

Name string COMMENT 'user name'

Createtime string COMMENT 'creation time')

ROW FORMAT DELIMITED FIELDS TERMINATED BY'\ t'

STORED AS TEXTFILE

These are all the contents of the article "sample Analysis of Management tables, external tables and Partition tables in Hive". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report