How to create a Hive Partition 07/13 Update SLTechnology News&Howtos

How to create a Hive Partition

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article focuses on "how to create a Hive partition", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to create a Hive partition.

The concept of Hive partitioning is different from traditional relational database partitioning.

Traditional database partitioning: as far as oracle is concerned, partitions exist independently in segments, where real data is stored, and partitions are automatically assigned when the data is inserted.

Hive partitioning method: because Hive is actually an abstraction stored on HDFS, a partition name of Hive corresponds to a directory name, and the sub-partition name is a subdirectory name, not an actual field.

So it can be understood that when we specify a partition when we insert data, we actually create a new directory or subdirectory, or add data files to the original directory.

Creation of Hive Partition

The Hive partition is defined with the Partitioned by keyword when the table is created, but note that the columns defined in the Partitioned by clause are formal columns in the table, but these columns are not included in the data file under Hive because they are directory names.

Static partition

Create a static partition table par_tab, a single partition

Create table par_tab (name string,nation string) partitioned by (sex string) row format delimited fields terminated by','

At this point, the structure of the table viewed through desc is as follows

Hive > desc par_tab OKname string nation string sex string # Partition Information # col_name data_type Comment sex string Time taken: 0.038 seconds Fetched: 8 row (s)

Prepare the local data file par_tab.txt, with the content "name / nationality", with gender (sex) as the partition

Jan,chinamary,americalilei,chinaheyong,chinayiku,japanemoji,japan

Insert data into the table (in fact, load operation is equivalent to moving files to the Hive directory of HDFS)

Load data local inpath'/ home/hadoop/files/par_tab.txt' into table par_tab partition (sex='man')

At this time, query the par_tab table under hive and change it to 3 columns. Note.

Hive > select * from par_tab;OKjan china manmary america manlilei china manheyong china manyiku japan manemoji japan manTime taken: 0.076 seconds, Fetched: 6 row (s)

View par_tab directory structure

As you can see, when you create a new partition table, the system creates a directory (table name) under the default path / user/hive/warehouse/ of the hive data warehouse, creates a subdirectory of the directory, sex=man (partition name), and finally stores the actual data file under the partition name.

If you insert another data file data, such as a file

Lily,chinanancy,chinahanmeimei,america

Insert data

Load data local inpath'/ home/hadoop/files/par_tab_wm.txt' into table par_tab partition (sex='woman')

View the par_tab table directory structure

[hadoop@hadoop001 files] $hadoop dfs-lsr / user/hive/warehouse/par_tabdrwxr-xr-x-hadoop supergroup 0 2017-03-29 08:25 / user/hive/warehouse/par_tab/sex=man-rwxr-xr-x 1 hadoop supergroup 71 2017-03-29 08:25 / user/hive/warehouse/par_tab/sex=man/par_tab.txtdrwxr-xr-x-hadoop supergroup 0 2017-03-29 08:35 / user/hive / warehouse/par_tab/sex=woman-rwxr-xr-x 1 hadoop supergroup 41 2017-03-29 08:35 / user/hive/warehouse/par_tab/sex=woman/par_tab_wm.txt

Finally, check the results of the two inserts, including man and woman

Hive > select * from par_tab;OKjan china manmary america manlilei china manheyong china manyiku japan manemoji japan manlily china womannancy china womanhanmeimei america womanTime taken: 0.136 seconds, Fetched: 9 row (s)

Because partitioned columns are actually defined by the table, when querying partitioned data

Hive > select * from par_tab where sex='woman';OKlily china womannancy china womanhanmeimei america womanTime taken: 0.515 seconds, Fetched: 3 row (s)

Create a static partition table, par_tab_muilt, with multiple partitions (gender + date)

Hive > create table par_tab_muilt (name string, nation string) partitioned by (sex string,dt string) row format delimited fields terminated by','; hive > load data local inpath'/ home/hadoop/files/par_tab.txt' into table par_tab_muilt partition (sex='man',dt='2017-03-29') [hadoop@hadoop001 files] $hadoop dfs-lsr / user/hive/warehouse/par_tab_muiltdrwxr-xr-x-hadoop supergroup 0 2017-03-29 08:45 / user/hive/warehouse/par_tab_muilt/sex=mandrwxr-xr-x-hadoop supergroup 0 2017-03-29 08:45 / user/hive/warehouse/par_tab_muilt/sex=man/dt=2017-03-29-rwxr-xr-x 1 hadoop supergroup 71 2017-03-29 08 : 45 / user/hive/warehouse/par_tab_muilt/sex=man/dt=2017-03-29/par_tab.txt

It can be seen that the partition order defined when the table is created determines the file directory order (who is the parent directory and who is the subdirectory). Because of this hierarchy, when we query all man, the data under all dates below man will be found. If only date partitions are queried, but both the parent directory sex=man and sex=woman have data for that date, Hive prunes the input path so that only date partitions are scanned and gender partitions are not filtered (that is, the query results include all genders).

Dynamic partition

If you use the static partition above, you must first know what type of partition you have when you insert it, and it's annoying to write a load data for each partition. The above problems can be solved by using dynamic partitioning, which can be dynamically allocated to the partition according to the data obtained by the query. In fact, the difference between dynamic and static partitions is that they do not specify partition directories and are chosen by the system itself.

First, start the dynamic partitioning function

Hive > set hive.exec.dynamic.partition=true

Suppose you already have a table par_tab, the first two columns are name name and nationality nation, and the last two columns are partition columns, gender sex and date dt. The data is as follows

Hive > select * from par_tab;OKlily china man 2013-03-28nancy china man 2013-03-28hanmeimei america man 2013-03-28jan china man 2013-03-29mary america man 2013-03-29lilei china man 2013-03-29heyong china man 2013-03-29yiku japan man 2013-03-29emoji japan man 2013-03-29Time taken: 1.141 seconds, Fetched: 9 row (s)

Now I insert the contents of this table directly into another table par_dnm, and realize that sex is a static partition and dt dynamic partition (do not specify the date, let the system allocate the decision)

Hive > insert overwrite table par_dnm partition (sex='man',dt) > select name, nation, dt from par_tab

Take a look at the directory structure after insertion

Drwxr-xr-x-hadoop supergroup 0 2017-03-29 10:32 / user/hive/warehouse/par_dnm/sex=mandrwxr-xr-x-hadoop supergroup 0 2017-03-29 10:32 / user/hive/warehouse/par_dnm/sex=man/dt=2013-03-28-rwxr-xr-x 1 hadoop supergroup 41 2017-03-29 10:32 / user/hive/warehouse/par_dnm/sex=man/dt=2013-03-28/000000_0drwxr-xr -x-hadoop supergroup 0 2017-03-29 10:32 / user/hive/warehouse/par_dnm/sex=man/dt=2013-03-29-rwxr-xr-x 1 hadoop supergroup 71 2017-03-29 10:32 / user/hive/warehouse/par_dnm/sex=man/dt=2013-03-29

Check the number of partitions again

Hive > show partitions par_dnm;OKsex=man/dt=2013-03-28sex=man/dt=2013-03-29Time taken: 0.065 seconds, Fetched: 2 row (s)

It proves that the dynamic partition is successful.

Note that dynamic partitions do not allow primary partitions to use dynamic columns and secondary partitions to use static columns, which will cause all primary partitions to create partitions defined by secondary partition static columns.

Dynamic partitioning allows all partitioning columns to be dynamic partitioning columns, but first set a parameter hive.exec.dynamic.partition.mode:

Hive > set hive.exec.dynamic.partition.mode;hive.exec.dynamic.partition.mode=strict

Its default value is strick, that is, all partition columns are not allowed to be dynamic, which is to prevent users from dynamically building partitions only within subpartitions, but due to negligence forgetting the value specified in the primary partition column, this will cause a dml statement to create a large number of new partitions (corresponding to a large number of new folders) in a short time, which will affect system performance.

So we're going to set up:

Hive > set hive.exec.dynamic.partition.mode=nostrick; so far, I believe you have a deeper understanding of "how to create Hive partitions". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.