How to use Hive dynamic partitioning tables 09/13 Update SLTechnology News&Howtos

How to use Hive dynamic partitioning tables

2025-09-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly introduces "how to use Hive dynamic partition table". In daily operation, I believe many people have doubts about how to use Hive dynamic partition table. Xiaobian consulted all kinds of information and sorted out simple and easy operation methods. I hope to help you answer the doubts about "how to use Hive dynamic partition table"! Next, please follow the small series to learn together!

role

If we insert partition data by day, then we can specify the name of the static partition to insert data into. But when we can't determine partition names, we use dynamic partitioning to handle partition tables.

examples

Prepare data as follows, for customer data. The fields are id,name,orderdate.

1,jack,2016/11/11

2,michael,2016/11/12

3,summer,2016/11/13

4,spring,2016/11/14

5,nero,2016/11/15

6,book,2016/12/21

7,node,2016/12/22

8,tony,2016/12/23

9,green,2016/12/24

10,andy,2016/12/25

11,kaith,2016/12/26

12,spring,2016/12/27

13,andy,2016/12/28

14,tony,2016/12/29

15,green,2016/12/30

16,andy,2016/12/31

17,kaith,2017/1/1

18,xiaoming,2017/1/2

We put the data into a table called t_temp.

create table t_temp(id int,name string,orderdate string)

row format delimited

fields terminated by ',';

load date local inpath '/home/spark/jar/testdata/Customer.txt' into table t_temp;

Then create partition table t_part

create table if not exists t_part

(id int ,name string ,orderdate string)

partitioned by (year string,month string)

row format delimited

fields terminated by ',';

Using static partitioning we might insert data by executing the following statements:

insert into t_part partition(year = '2016',month = '12')

select id,name,orderdate from t_temp

where substring(orderdate,1,7) = '2016/12'

When the number of partitions is small, we can insert data in this way. We need to use dynamic partitioning when there are too many partitions or when the partition name is unknown.

Hive parameter configuration

Before using dynamic partitioning, we need to configure some parameters.

hive.exec.dynamic.partition

Default: false

Whether to enable dynamic partition function, the default false off.

This parameter must be set to true when dynamic partitioning is used;

hive.exec.dynamic.partition.mode

Default: strict

Dynamic partitioning mode, strict by default, means that at least one partition must be specified as static, nonstrict mode means that all partition fields can use dynamic partitioning.

Generally, it needs to be set to nonstrict

hive.exec.max.dynamic.partitions.pernode

Default value: 100

The maximum number of dynamic partitions that can be created on each node performing MR.

This parameter needs to be set according to actual data.

For example, if the source data contains data for one year, that is, the day field has 365 values, then the parameter needs to be set to greater than 365. If the default value of 100 is used, an error will be reported.

hive.exec.max.dynamic.partitions

Default value: 1000

The maximum number of dynamic partitions that can be created on all nodes executing MR.

Same as parameter explanation.

hive.exec.max.created.files

Default value: 100000

The maximum number of HDFS files that can be created in an MR Job.

Generally, the default value is sufficient, unless you have a very large amount of data and the number of files you need to create is greater than 100000, which can be adjusted according to the actual situation.

hive.error.on.empty.partition

Default: false

Whether to throw an exception when an empty partition is generated.

Generally do not need to set.

After setting these parameters, we can execute the following insert command to use dynamic partitioning

set hive.exec.dynamic.partition = true;

set hive.exec.dynamic.partition.mode = nonstrict;

insert overwrite table t_part partition(year,month)

select id,name,orderdate,substring(orderdate,1,4),substring(orderdate,6,2) from t_temp;

The results are as follows:

Loading data to table test_neil.t_part partition (year=null, month=null)

Time taken for load dynamic partitions : 651

Loading partition {year=2016, month=12}

Loading partition {year=2017, month=01}

Loading partition {year=2016, month=11}

Time taken for adding to write entity : 1

Partition test_neil.t_part{year=2016, month=11} stats: [numFiles=1, numRows=5, totalSize=97, rawDataSize=92]

Partition test_neil.t_part{year=2016, month=12} stats: [numFiles=1, numRows=11, totalSize=210, rawDataSize=199]

Partition test_neil.t_part{year=2017, month=01} stats: [numFiles=1, numRows=2, totalSize=43, rawDataSize=41]

We can look at the partition of this table:

show partitions t_part;

The following is a list of the following:

partition

year=2016/month=11

year=2016/month=12

year=2017/month=01

year=__HIVE_DEFAULT_PARTITION__/month=__HIVE_DEFAULT_PARTITION__

At this point, the study of "how to use Hive dynamic partition table" is over, hoping to solve everyone's doubts. Theory and practice can better match to help everyone learn, go and try it! If you want to continue learning more relevant knowledge, please continue to pay attention to the website, Xiaobian will continue to strive to bring more practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.