How to understand hive partition partition 07/16 Update SLTechnology News&Howtos

How to understand hive partition partition

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article is to share with you about how to understand the hive partition partition, the editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

I. background

1. In a Hive Select query, the contents of the entire table are generally scanned, which consumes a lot of time to do unnecessary work. Sometimes only a portion of the data you care about in the table needs to be scanned, so the concept of partition is introduced when creating the table.

2. The partition table refers to the partition space of the partition specified when the table is created.

3. If you need to create a partitioned table, you need to call the optional parameter partitioned by in the create table. For more information, please see the syntax structure created by the table.

Second, technical details

1. A table can have one or more partitions, each in the form of a folder under the directory of the table folder.

2. Table and column names are not case-sensitive.

3. The partition exists in the table structure in the form of a field. You can see the existence of the field through the describe table command, but the field does not store the actual data content, only the representation of the partition.

4. Syntax for creating a table (for more information on partition, please see PARTITIONED BY parameter):

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data_type [COMMENT col_comment],...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment],...)] [CLUSTERED BY (col_name, col_name,...) [SORTED BY (col_name [ASC | DESC],...)] INTO num_buckets BUCKETS] [ROW FORMAT row_format] [STORED AS file_format] [LOCATION hdfs_path]

5. There are two types of partition tables, one is a single partition, that is to say, there is only a first-level folder directory under the table folder directory. The other is multi-partition, where there is a multi-folder nesting pattern under the table folder.

A, single partition table statement: create table day_table (id int, content string) partitioned by (dt string); single partition table, partitioned by day, there are three id,content,dt columns in the table structure.

B, double-partition table statement: create table day_hour_table (id int, content string) partitioned by (dt string, hour string); double-partition table, partitioned by day and hour, adding dt and hour columns to the table structure.

Schematic diagram of table folder directory (multi-partition table):

6. Add partition table syntax (table has been created, add partition on this basis):

ALTER TABLE table_name ADD partition_spec [LOCATION 'location1'] partition_spec [LOCATION' location2']... Partition_spec:: PARTITION (partition_col = partition_col_value, partition_col = partiton_col_value,...)

Users can use ALTER TABLE ADD PARTITION to add partitions to a table. Put quotation marks when the partition name is a string. Example:

ALTER TABLE day_table ADD PARTITION (dt='2008-08-08, hour='08') location'/ path/pv1.txt' PARTITION (dt='2008-08-08, hour='09') location'/ path/pv2.txt'

7. Delete partition syntax:

ALTER TABLE table_name DROP partition_spec, partition_spec,...

Users can use ALTER TABLE DROP PARTITION to delete partitions. The metadata and data of the partition will be deleted together. Example:

ALTER TABLE day_hour_table DROP PARTITION (dt='2008-08-08, hour='09')

8. The data is loaded into the partition table syntax:

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2...)]

Example:

LOAD DATA INPATH'/ user/pv.txt' INTO TABLE day_hour_table PARTITION (dt='2008-08-08, hour='08'); LOAD DATA local INPATH'/ user/hua/*' INTO TABLE day_hour partition (dt='2010-07-07')

When the data is loaded into the table, no conversion is made to the data. The Load operation simply copies the data to the location corresponding to the Hive table. When the data is loaded, a directory is automatically created under the table, and the files are stored under the partition.

9. Statements for partition-based queries:

SELECT day_table.* FROM day_table WHERE day_table.dt > = '2008-08-08'

10. View the partition statement:

Hive > show partitions day_hour_table; OK dt=2008-08-08/hour=08 dt=2008-08-08/hour=09 dt=2008-08-09/hour=09

III. Summary

1. In Hive, a Partition in a table corresponds to a directory under the table, and all Partition data is stored in the directory of the most word set.

2. Generally speaking, partition is an auxiliary query, narrowing the scope of query, speeding up the speed of data retrieval and managing the data according to certain specifications and conditions.

Operation of partition in hive: hive > create table mp (a string) partitioned by (b string, c string)

Time taken: 0.044 seconds

Hive > alter table mp add partition (baked goods 1', cymbal 1')

Time taken: 0.079 seconds

Hive > alter table mp add partition (baked 1, cased 2')

Time taken: 0.052 seconds

Hive > alter table mp add partition (baked goods 2', caterpillar 2')

Time taken: 0.056 seconds

Hive > show partitions mp

B=1/c=1

B=1/c=2

B=2/c=2

Time taken: 0.046 seconds

Hive > explain extended alter table mp drop partition (baked goods 1')

ABSTRACT SYNTAX TREE:

(TOK_ALTERTABLE_DROPPARTS mp (TOK_PARTSPEC (TOK_PARTVAL b'1')

STAGE DEPENDENCIES:

Stage-0 is a root stage

STAGE PLANS:

Stage: Stage-0

Drop Table Operator:

Drop Table

Table: mp

Time taken: 0.048 seconds

Hive > alter table mp drop partition (baked goods 1')

FAILED: Error in metadata: table is partitioned but partition spec is not specified or tab: {bread1}

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

Hive > show partitions mp

B=1/c=1

B=1/c=2

B=2/c=2

Time taken: 0.044 seconds

Hive > alter table mp add partition (baked goods 1, cased'3') partition (baked 1, cased 4')

Time taken: 0.168 seconds

Hive > show partitions mp

B=1/c=1

B=1/c=2

B=1/c=3

B=1/c=4

B=2/c=2

B=2/c=3

Time taken: 0.066 seconds

Hive > insert overwrite table mp partition (baked 1s, cased 1') select cnt from tmp_et3

Hive > alter table mp add columns (newcol string)

Location specifies the directory structure

Hive > alter table alter2 add partition (insertdate='2008-01-01') location '2008 Universe 01 Universe 01'

Hive > alter table alter2 add partition (insertdate='2008-01-02') location '2008Universe 02'

The above is how to understand the hive partition partition, the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.