In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article shows you how to achieve dynamic partitioning in Hive, the content is concise and easy to understand, it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.
Dynamic partition adjustment
Dynamic partitioning attribute: set to true to enable dynamic partitioning (default is false) hive.exec.dynamic.partition=true
Dynamic partition property: set to nonstrict, which means that all partitions are allowed to be dynamic (default is strict) set to strict, indicating that at least one partition must be guaranteed to be static hive.exec.dynamic.partition.mode=strict
Dynamic Partition attribute: the maximum number of dynamic partitions that can be created per mapper or reducer hive.exec.max.dynamic.partitions.pernode=100
Dynamic partition attribute: the maximum number of dynamic partitions that can be created by a dynamic partition creation statement hive.exec.max.dynamic.partitions=1000
Dynamic Partition attribute: the maximum number of files that can be created globally hive.exec.max.created.files=100000
Control the number of files that DataNode can open at a time. This parameter must be set in the $HADOOP_HOME/conf/hdfs-site.xml file of DataNode.
Dfs.datanode.max.xcievers 8192
Be careful
In Hive, dynamic partitioning can cause too many small files to be generated in the process of inserting data
Dynamic partition insertion
If you need to create a lot of partitions, the user needs to write a lot of conditional queries sql to insert data into the corresponding partition. Fortunately, Hive provides dynamic partitioning, which can automatically create partitions based on the value of the partition field. The dynamic partition hive.exec.dynamic.partition listed above is enabled, and the hive.exec.dynamic.partition.mode needs to be in non-strict mode. Usually, if there are many partitions, the hive.exec.max.dynamic.partitions.pernode needs to be set to a large number, otherwise there will be an error alert.
There is now a sql:
Insert overwrite table employees partitions (country, state) select..., se.cnty, se.st from staged_employees se
As you can see, Hive determines the values of the partition fields country and state based on the last two columns in the select statement, and different names are deliberately used here to emphasize that the relationship between the source table field and the output partition value is matched by location rather than naming.
Combination of dynamic and static zones
You can also use a mix of dynamic and static partitions. In the above example, we can specify that the partition value five of the country is the static value US, while the partition field state is the dynamic value:
Insert overwrite table employees partitions (country = 'US', state) select..., se.cnty, se.st from staged_employees se where se.cnty =' US'
Note: static partitions need to appear before the dynamic partition field.
Dynamic partitioning is not turned on by default and is executed in strict mode, which requires at least one column of partition fields to be static. The advantage of this is that it can prevent a large number of partitions due to design or other incorrect queries, such as sql boy accidentally using a timestamp as a partition field, which would be a disaster. In the daily import of a day's data, usually the specified date is static partition, hour is dynamic partition, mixed mode import.
Examples
Build a table
Create table if not exists test.test (id string, name string) partitioned by (dt string,hour string) row format delimited fields terminated by'\ t'; create table if not exists test.test2 (id string, name string) partitioned by (dt string,hour string) row format delimited fields terminated by'\ t 'stored as orc
Import data into the test.test table
Load data local inpath'/ home/hadoop/data/test.txt' into table test.test partition (dt = '2019-09-10, hour =' 02'); test.txt 001 keguang 002 kg 003 kk 004 ikeguang
Using dynamic Partition insertion
Insert overwrite table test.test2 partition (dt,hour) select `(dt | hour)? +. +`, dt,hour from test.test; the above is how to implement dynamic partitioning in Hive. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.