In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly introduces "how to use Hive dynamic partition table". In daily operation, I believe many people have doubts about how to use Hive dynamic partition table. Xiaobian consulted all kinds of information and sorted out simple and easy operation methods. I hope to help you answer the doubts about "how to use Hive dynamic partition table"! Next, please follow the small series to learn together!
role
If we insert partition data by day, then we can specify the name of the static partition to insert data into. But when we can't determine partition names, we use dynamic partitioning to handle partition tables.
examples
Prepare data as follows, for customer data. The fields are id,name,orderdate.
1,jack,2016/11/11
2,michael,2016/11/12
3,summer,2016/11/13
4,spring,2016/11/14
5,nero,2016/11/15
6,book,2016/12/21
7,node,2016/12/22
8,tony,2016/12/23
9,green,2016/12/24
10,andy,2016/12/25
11,kaith,2016/12/26
12,spring,2016/12/27
13,andy,2016/12/28
14,tony,2016/12/29
15,green,2016/12/30
16,andy,2016/12/31
17,kaith,2017/1/1
18,xiaoming,2017/1/2
We put the data into a table called t_temp.
create table t_temp(id int,name string,orderdate string)
row format delimited
fields terminated by ',';
load date local inpath '/home/spark/jar/testdata/Customer.txt' into table t_temp;
1
2
3
4
5
Then create partition table t_part
create table if not exists t_part
(id int ,name string ,orderdate string)
partitioned by (year string,month string)
row format delimited
fields terminated by ',';
1
2
3
4
5
Using static partitioning we might insert data by executing the following statements:
insert into t_part partition(year = '2016',month = '12')
select id,name,orderdate from t_temp
where substring(orderdate,1,7) = '2016/12'
1
2
3
When the number of partitions is small, we can insert data in this way. We need to use dynamic partitioning when there are too many partitions or when the partition name is unknown.
Hive parameter configuration
Before using dynamic partitioning, we need to configure some parameters.
hive.exec.dynamic.partition
Default: false
Whether to enable dynamic partition function, the default false off.
This parameter must be set to true when dynamic partitioning is used;
hive.exec.dynamic.partition.mode
Default: strict
Dynamic partitioning mode, strict by default, means that at least one partition must be specified as static, nonstrict mode means that all partition fields can use dynamic partitioning.
Generally, it needs to be set to nonstrict
hive.exec.max.dynamic.partitions.pernode
Default value: 100
The maximum number of dynamic partitions that can be created on each node performing MR.
This parameter needs to be set according to actual data.
For example, if the source data contains data for one year, that is, the day field has 365 values, then the parameter needs to be set to greater than 365. If the default value of 100 is used, an error will be reported.
hive.exec.max.dynamic.partitions
Default value: 1000
The maximum number of dynamic partitions that can be created on all nodes executing MR.
Same as parameter explanation.
hive.exec.max.created.files
Default value: 100000
The maximum number of HDFS files that can be created in an MR Job.
Generally, the default value is sufficient, unless you have a very large amount of data and the number of files you need to create is greater than 100000, which can be adjusted according to the actual situation.
hive.error.on.empty.partition
Default: false
Whether to throw an exception when an empty partition is generated.
Generally do not need to set.
After setting these parameters, we can execute the following insert command to use dynamic partitioning
set hive.exec.dynamic.partition = true;
set hive.exec.dynamic.partition.mode = nonstrict;
insert overwrite table t_part partition(year,month)
select id,name,orderdate,substring(orderdate,1,4),substring(orderdate,6,2) from t_temp;
1
2
3
4
The results are as follows:
Loading data to table test_neil.t_part partition (year=null, month=null)
Time taken for load dynamic partitions : 651
Loading partition {year=2016, month=12}
Loading partition {year=2017, month=01}
Loading partition {year=2016, month=11}
Time taken for adding to write entity : 1
Partition test_neil.t_part{year=2016, month=11} stats: [numFiles=1, numRows=5, totalSize=97, rawDataSize=92]
Partition test_neil.t_part{year=2016, month=12} stats: [numFiles=1, numRows=11, totalSize=210, rawDataSize=199]
Partition test_neil.t_part{year=2017, month=01} stats: [numFiles=1, numRows=2, totalSize=43, rawDataSize=41]
1
2
3
4
5
6
7
8
9
We can look at the partition of this table:
show partitions t_part;
1
The following is a list of the following:
partition
year=2016/month=11
year=2016/month=12
year=2017/month=01
year=__HIVE_DEFAULT_PARTITION__/month=__HIVE_DEFAULT_PARTITION__
1
2
3
4
5
At this point, the study of "how to use Hive dynamic partition table" is over, hoping to solve everyone's doubts. Theory and practice can better match to help everyone learn, go and try it! If you want to continue learning more relevant knowledge, please continue to pay attention to the website, Xiaobian will continue to strive to bring more practical articles for everyone!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.