Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to realize dynamic Partition in Hive

2025-01-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article shows you how to achieve dynamic partitioning in Hive, the content is concise and easy to understand, it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Dynamic partition adjustment

Dynamic partitioning attribute: set to true to enable dynamic partitioning (default is false) hive.exec.dynamic.partition=true

Dynamic partition property: set to nonstrict, which means that all partitions are allowed to be dynamic (default is strict) set to strict, indicating that at least one partition must be guaranteed to be static hive.exec.dynamic.partition.mode=strict

Dynamic Partition attribute: the maximum number of dynamic partitions that can be created per mapper or reducer hive.exec.max.dynamic.partitions.pernode=100

Dynamic partition attribute: the maximum number of dynamic partitions that can be created by a dynamic partition creation statement hive.exec.max.dynamic.partitions=1000

Dynamic Partition attribute: the maximum number of files that can be created globally hive.exec.max.created.files=100000

Control the number of files that DataNode can open at a time. This parameter must be set in the $HADOOP_HOME/conf/hdfs-site.xml file of DataNode.

Dfs.datanode.max.xcievers 8192

Be careful

In Hive, dynamic partitioning can cause too many small files to be generated in the process of inserting data

Dynamic partition insertion

If you need to create a lot of partitions, the user needs to write a lot of conditional queries sql to insert data into the corresponding partition. Fortunately, Hive provides dynamic partitioning, which can automatically create partitions based on the value of the partition field. The dynamic partition hive.exec.dynamic.partition listed above is enabled, and the hive.exec.dynamic.partition.mode needs to be in non-strict mode. Usually, if there are many partitions, the hive.exec.max.dynamic.partitions.pernode needs to be set to a large number, otherwise there will be an error alert.

There is now a sql:

Insert overwrite table employees partitions (country, state) select..., se.cnty, se.st from staged_employees se

As you can see, Hive determines the values of the partition fields country and state based on the last two columns in the select statement, and different names are deliberately used here to emphasize that the relationship between the source table field and the output partition value is matched by location rather than naming.

Combination of dynamic and static zones

You can also use a mix of dynamic and static partitions. In the above example, we can specify that the partition value five of the country is the static value US, while the partition field state is the dynamic value:

Insert overwrite table employees partitions (country = 'US', state) select..., se.cnty, se.st from staged_employees se where se.cnty =' US'

Note: static partitions need to appear before the dynamic partition field.

Dynamic partitioning is not turned on by default and is executed in strict mode, which requires at least one column of partition fields to be static. The advantage of this is that it can prevent a large number of partitions due to design or other incorrect queries, such as sql boy accidentally using a timestamp as a partition field, which would be a disaster. In the daily import of a day's data, usually the specified date is static partition, hour is dynamic partition, mixed mode import.

Examples

Build a table

Create table if not exists test.test (id string, name string) partitioned by (dt string,hour string) row format delimited fields terminated by'\ t'; create table if not exists test.test2 (id string, name string) partitioned by (dt string,hour string) row format delimited fields terminated by'\ t 'stored as orc

Import data into the test.test table

Load data local inpath'/ home/hadoop/data/test.txt' into table test.test partition (dt = '2019-09-10, hour =' 02'); test.txt 001 keguang 002 kg 003 kk 004 ikeguang

Using dynamic Partition insertion

Insert overwrite table test.test2 partition (dt,hour) select `(dt | hour)? +. +`, dt,hour from test.test; the above is how to implement dynamic partitioning in Hive. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report