Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Introduction to Hive Partition Table

2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Brief introduction:

If there is a lot of data in a table, our query is very slow and takes a lot of time, what if we want to query some of the data, then we introduce the concept of partitioning.

There are two types of partition tables in Hive: static partitions and dynamic partitions.

1. Static partitions: partition tables can be created according to PARTITIONED BY. A table can have one or more partitions, and each partition is stored as a folder under the directory of the table folder. Partitions exist in the table structure in the form of fields, which can be seen through the describe table command, but the field does not hold the actual data content, only the representation of the partition. There are two types of partition tables, one is a single partition, that is to say, there is only a first-level folder directory under the table folder directory. The other is multi-partition, where there is a multi-folder nesting pattern under the table folder.

Single-level partition table demonstration:

# create a single partition table hive > create table order_partition (> ordernumber string, > eventtime string >) > partitioned by (event_month string) > row format delimited fields terminated by'\ tpartition Oktime taken: 0.82 seconds# load the data from the order.txt file into the order_ partitioning table hive > load data local inpath'/ home/hadoop/order.txt' overwrite into table order_partition partition (event_month='2014-05') Loading data to table default.order_partition partition (event_month=2014-05) Partition default.order_partition {event_month=2014-05} stats: [numFiles=1, numRows=0, totalSize=208, rawDataSize=0] OKTime taken: 1.749 seconds# View order_partition partition data hive > select * from order_partition where event_month='2014-05' OK10703007267488 2014-05-01 06 seconds 01seconds 12.3341461 2014-051010435096 2014-05-01 071409747 2014-05 0103043509747 2014-05-01 0714061 2014-0510104043514061 2014-05 092014-05104043514061 2014-05 0914-05Time taken: 0.208 seconds, Fetched: 5 row (s) # View mysql > select * from partitions in metadata MySQL +-+ | PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID | +- -+-+ | 1 | 1530498328 | 0 | event_month=2014-05 | 32 | 31 | +-+- -+ 1 row in set (0.00 sec) mysql > select * from partition_key_vals +-+ | PART_ID | PART_KEY_VAL | INTEGER_IDX | +-+ | 1 | 2014-05 | 0 | +- View the directory [hadoop@hadoop000 ~] $hadoop fs-ls / user/hive/warehouse/order_partition/ Found 1 itemsdrwxr-xr-x-hadoop supergroup 0 2018-07-02 10:29 / user/hive/warehouse/order_partition/event_month=2014-05 in +-+-+ 1 row in set (sec) # HDFS

Note: data can also be loaded using hadoop shell, which is demonstrated below:

Create a partition, that is, there will be a partition directory under the HDFS folder directory, so can we directly create a directory on the HDFS and load the data into it?

# create a directory and upload the file [hadoop@hadoop000 ~] $hadoop fs-mkdir-p / user/hive/warehouse/order_partition/event_month=2014-06 [hadoop@hadoop000 ~] $hadoop fs-put / home/hadoop/order.txt / user/hive/warehouse/order_partition/event_month=2014-06 [hadoop@hadoop000 ~] $hadoop fs-ls / user/hive/warehouse/order_partition/ Found 2 Itemsdrwxr-xr-x-hadoop supergroup 0 2018-07-02 10:29 / user/hive/warehouse/order_partition/event_month=2014-05drwxr-xr-x-hadoop supergroup 0 2018-07-02 10:54 / user/hive/warehouse/order_partition/event_month=2014-01found no data hive > select * from order_partition where event_month='2014-06' in the partition table OKTime taken: 0.21 seconds# because we uploaded the file to hdfs,hdfs, but not in the metadata in hive. Execute the following command to update hive > msck repair table order_partition;OKPartitions not in metastore: order_partition:event_month=2014-06Repair: Added partition to metastore order_partition:event_month=2014-06Time taken: 0.178 seconds, Fetched: 2 row (s) # View partition data hive > select * from order_partition where event_month='2014-06' again OK10703007267488 2014-05-01 06 seconds 01 seconds, Fetched: 5 row (s) # View Table Partition hive OKevent_month=2014-05event_month=2014-06Time taken: 0.164 seconds, Fetched: 2 row (s)

Note: after the msck repair table command is executed, Hive detects partition meta-information that exists in the HDFS directory but does not exist in the metastore of the table, and updates it to metastore. If you have a table that has been stored for several years, you can't respond for half a day with this command, so this order is too violent and is not recommended in production. You can use Add partition to add partitions.

[hadoop@hadoop000 ~] $hadoop fs-mkdir-p / user/hive/warehouse/order_partition/event_month=2014-07 [hadoop@hadoop000 ~] $hadoop fs-put / home/hadoop/order.txt / user/hive/warehouse/order_partition/event_month=2014-07 [hadoop@hadoop000 ~] $hadoop fs-ls / user/hive/warehouse/order_partition/Found 3 itemsdrwxr-xr-x-hadoop supergroup 0 2018-07-02 10:29 / user/hive/warehouse/order_partition / event_month=2014-05drwxr-xr-x-hadoop supergroup 0 2018-07-02 10:54 / user/hive/warehouse/order_partition/event_month=2014-06drwxr-xr-x-hadoop supergroup 0 2018-07-02 11:09 / user/hive/warehouse/order_partition/event_month=2014-01View the new partition hive > select * partition-07' OKTime taken: 0.188 seconds# add partition hive > ALTER TABLE order_partition ADD IF NOT EXISTS PARTITION (event_month='2014-07'); OKTime taken: 0.22seconds# check hive > select * from order_partition where event_month='2014-07' again OK10703007267488 2014-05-01 06 seconds 01 seconds, Fetched: 5 row (s) hive 1575 2014-05 09 0975 2014-05 09 0975 2014-05 09 1461 2014-05 01 2014-07104043514061 2014-05 09 2014-05 09 2014-01 09 2014-05 0914-061 2014-05 09 2014-07Time taken: 0.206 seconds, Fetched: 5 row (s) OKevent_month=2014-05event_month=2014-06event_month=2014-07Time taken: 0.151 seconds, Fetched: 3 row (s)

Multi-level partition table demonstration:

# create a multi-level partition table hive > create table order_mulit_partition (> ordernumber string, > eventtime string >) > partitioned by (event_month string,event_day string) > row format delimited fields terminated by'\ tpartitioning OkTime taken: 0.133 seconds# load data hive > load data local inpath'/ home/hadoop/order.txt' overwrite into table order_mulit_partition partition (event_month='2014-05) # View partition hive > select * from order_mulit_partition where event_month='2014-05' and event_day='01' OK10703007267488 2014-05-01 06 show partitions order_mulit_partition 01Partition 12.33414001 2014-05 011010435096 2014-05-01 0714061 2014-05 010103043509747 2014-05-01 0714012014-05 01043501575 2014-05 0914061 2014-05 0110104043514061 2014-05 0914061 2014-05 0914 12.32401 2014-05 01hive > show partitions order_mulit_partition OKevent_month=2014-05/event_day=01Time taken: 0.158 seconds, Fetched: 1 row (s) # HDFS directory structure of multi-level partitions [hadoop@hadoop000 ~] $hadoop fs-ls / user/hive/warehouse/order_mulit_partition/event_month=2014-05Found 1 itemsdrwxr-xr-x-hadoop supergroup 0 2018-07-02 11:17 / user/hive/warehouse/order_mulit_partition/event_month=2014-05/event_day=01

Summary: single-level partitions have a single-level file directory on HDFS; multiple partitions have multi-level file directories on HDFS.

two。 Dynamic partitioning:

Reference: official documentation

Let's first take a look at what the official explanation for us is dynamic partitioning:

Static Partition (SP) columns static Partition

Dynamic Partition (DP) columns dynamic partitioning. DP columns are specified the same way as it is for SP columns-in the partition clause. The only difference is that DP columns do not have values, while SP columns do. In the partition clause, we need to specify all partitioning columns, even if all of them are DP columns.In INSERT... SELECT... Queries, the dynamic partition columns must be specified last among the columns in the SELECT statement and in the same order in which they appear in the PARTITION () clause.

Briefly summarize the differences:

The 1.DP column is specified in the same way as the SP column-in the partition clause (after the Partition keyword), except that the DP column has no value, while the SP column has a value (only key without value after the Partition keyword)

two。 In INSERT... SELECT... In the query, the dynamic partitioning column must be specified last in the column in the SELECT statement and arranged in the order in which it appears in the PARTITION () clause

3. All DP columns-only allowed in non-strict mode. In strict mode, we should throw a mistake

4. If dynamic partitions and static partitions are used together, the fields of dynamic partitions must come first and the fields of static partitions last.

Here are a few examples to demonstrate:

Note: in order to demonstrate the difference between dynamic and static partitions and compare the complexity of static partitions, we first operate on static partitions and then demonstrate dynamic partitions.

# create employee static partition table hive > CREATE TABLE emp_static_partition (> empno int, > ename string, > job string, > mgr int, > hiredate string, > salary double, > comm double >) > PARTITIONED BY (deptno int) > ROW FORMAT DELIMITED FIELDS TERMINATED BY'\ tpartition Oktime taken: 0.198 seconds# insert the data in emp table into static partition hive > select * from emp OK7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 207499 ALLEN SALESMAN 7698 1981-2-20 1600.0 1600.0 307521 WARD SALESMAN 7698 1980-2-22 1250.0 500.0 307566 JONES MANAGER 78391981-4-2 2975.0 NULL 207654 MARTIN SALESMAN 76981981-9-28 1250.0 1400.0 307698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 307782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 107788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 207839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 107844 TURNER SALESMAN 76981981-9-8 1500.0 0.0307876 ADAMS CLERK 77881987-5-23 1100.0 NULL 207900 JAMES CLERK 7698 1981-12-3 950.0 NULL 307902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 207934 MILLER CLERK 77821982-1-23 1300.0 NULL 10Time taken: 0.164 seconds Fetched: 14 row (s) # each partition should write an insert statement hive > insert into table emp_static_partition partition (deptno=10) > select empno,ename, job, mgr, hiredate, salary, comm from emp where deptno=10 Query ID = hadoop_20180702100505_f0566585-06b2-4c53-910a-b6a58791fc2dTotal jobs = 3Launching Job 1 out of 3...OKTime taken: 15.265 secondshive > insert into table emp_static_partition partition (deptno=20) > select empno,ename, job, mgr, hiredate, salary, comm from emp where deptno=20 Query ID = hadoop_20180702100505_f0566585-06b2-4c53-910a-b6a58791fc2dTotal jobs = 3Launching Job 1 out of 3...OKTime taken: 18.527 secondshive > insert into table emp_static_partition partition (deptno=30) > select empno,ename, job, mgr, hiredate, salary, comm from emp where deptno=30;Query ID = hadoop_20180702100505_f0566585-06b2-4c53-910a-b6a58791fc2dTotal jobs = 3Launching Job 1 out of 3...OKTime taken: 14.062 seconds# View each partition hive > select * from emp_static_partition where deptno='10' OK7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 107839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 107934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10Time taken: 0.219 seconds, Fetched: 3 row (s) hive > select * from emp_static_partition where deptno='20' OK7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 207566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 207788 SCOTT ANALYST 7566 4-19 3000.0 NULL 207876 ADAMS CLERK 77881987-5-23 1100.0 NULL 207902 FORD ANALYST 7566 3000.0 NULL 20Time taken: 0.197 seconds, Fetched: 5 row (s) hive > select * from emp_static_partition where deptno='30' OK7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 307521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 307654 MARTIN SALESMAN 7698 9-28 1250.0 1400.0 307698 BLAKE MANAGER 78391981-5-1 2850.0 NULL 307844 TURNER SALESMAN 76981981-9-8 1500.0 0.0307900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30Time taken: 0.181 seconds Fetched: 6 row (s)

Static partition tables have a fatal disadvantage that each partition insert has to write a separate insert statement.

The following is a demonstration using dynamic partitioning

Set up before demonstration: hive defaults to static partitions. If you want to use dynamic partitions, you need to set the following parameters. You can use temporary settings, or you can write them in the configuration file (hive-site.xml) and take effect permanently. The temporary configuration is as follows

Set hive.exec.dynamic.partition=true;-enable dynamic partition defaults to false, but does not enable set hive.exec.dynamic.partition.mode=nonstrict -- specify dynamic partition mode. Default is strict, that is, at least one partition must be specified as static partition. Nonstrict mode allows all partition fields to use dynamic partition # to create employee dynamic partition table. Partition fields are deptnohive > CREATE TABLE emp_dynamic_partition (> empno int, > ename string, > job string, > mgr int, > hiredate string, > salary double, > comm double >) > PARTITIONED BY (deptno int) > ROW FORMAT DELIMITED FIELDS TERMINATED BY'\ t'. OKTime taken: 0.165 seconds# insert one sentence completes hive > insert into table emp_dynamic_partition partition (deptno) > select empno,ename, job, mgr, hiredate, salary, comm, deptno from emp;Query ID = hadoop_20180702100505_f0566585-06b2-4c53-910a-b6a58791fc2dTotal jobs = 3Launching Job 1 out of 3...OKTime taken: 17.982 seconds# View each partition hive > show partitions emp_dynamic_partition;OKdeptno=10deptno=20deptno=30Time taken: 0.176 seconds, Fetched: 3 row (s) hive > select * from emp_dynamic_partition where deptno='10' OK7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 107839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 107934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10Time taken: 2.662 seconds, Fetched: 3 row (s) hive > select * from emp_dynamic_partition where deptno='20' OK7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 207566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 207788 SCOTT ANALYST 7566 4-19 3000.0 NULL 207876 ADAMS CLERK 77881987-5-23 1100.0 NULL 207902 FORD ANALYST 7566 3000.0 NULL 20Time taken: 0.178 seconds, Fetched: 5 row (s) hive > select * from emp_dynamic_partition where deptno='30' OK7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 307521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 307654 MARTIN SALESMAN 7698 9-28 1250.0 1400.0 307698 BLAKE MANAGER 78391981-5-1 2850.0 NULL 307844 TURNER SALESMAN 76981981-9-8 1500.0 0.0307900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30Time taken: 0.146 seconds Fetched: 6 row (s) View the file directory structure on HDFS [hadoop@hadoop000 ~] $hadoop fs-ls / user/hive/warehouseFound 6 itemsdrwxr-xr-x-hadoop supergroup 0 2018-06-24 15:38 / user/hive/warehouse/empdrwxr-xr-x-hadoop supergroup 0 2018-07-02 13:55 / user/hive/warehouse/emp_dynamic_partitiondrwxr-xr-x-hadoop supergroup 0 2018-07-02 13:50 / User/hive/warehouse/emp_static_partitiondrwxr-xr-x-hadoop supergroup 0 2018-07-02 11:17 / user/hive/warehouse/order_mulit_partitiondrwxr-xr-x-hadoop supergroup 0 2018-07-02 11:09 / user/hive/warehouse/order_partitiondrwxr-xr-x-hadoop supergroup 0 2018-06-24 15:35 / user/hive/warehouse/stu [hadoop@hadoop000 ~] $hadoop fs-ls / user/hive/ Warehouse/emp_static_partitionFound 3 itemsdrwxr-xr-x-hadoop supergroup 0 2018-07-02 13:47 / user/hive/warehouse/emp_static_partition/deptno=10drwxr-xr-x-hadoop supergroup 0 2018-07-02 13:50 / user/hive/warehouse/emp_static_partition/deptno=20drwxr-xr-x-hadoop supergroup 0 2018-07-02 13:51 / user/hive/warehouse/emp_static_partition/deptno= 30 [Hadoop @ hadoop000 ~] $hadoop fs-ls / user/hive/warehouse/emp_dynamic_partitionFound 3 itemsdrwxr-xr-x-hadoop supergroup 0 2018-07-02 13:55 / user/hive/warehouse/emp_dynamic_partition/deptno=10drwxr-xr-x-hadoop supergroup 0 2018-07-02 13:55 / user/hive/warehouse/emp_dynamic_partition/deptno=20drwxr-xr-x-hadoop supergroup 0 2018-07-02 13:55 / user/hive/warehouse/emp_dynamic_partition/deptno=30

Add: the two partitions can also be mixed with the following for a brief understanding:

Mixed SP & DP columns (mixed use of dynamic and static partitions) hive > create table student (> id int, > name string, > tel string, > age int >) > row format delimited fields terminated by'\ tactile OKTime taken: 0.125 secondshive > insert into student values. Query ID = hadoop_20180702100505_f0566585-06b2-4c53-910a-b6a58791fc2dTotal jobs = 3Launching Job 1 out of 3...OKTime taken: 15.375 secondshive > select * from student OK1 zhangsan 183111111202 lisi 1822222222222222303 wangwu 15733333333 40Time taken: 0.106 seconds, Fetched: 3 row (s) # create mixed partition table hive > create table stu_mixed_partition (> id int, > name string, > tel string >) > partitioned by (ds string,age int) > row format delimited fields terminated by'\ t' OKTime taken: 0.171 seconds# insert data hive > insert into stu_mixed_partition partition (ds='2010-03-03 recording recording age) > select id,name,tel,age from student;Query ID = hadoop_20180702100505_f0566585-06b2-4c53-910a-b6a58791fc2dTotal jobs = 3Launching Job 1 out of 3...OKTime taken: 18.887 seconds# View Partition hive > show partitions stu_mixed_partition OKds=2010-03-03/age=20ds=2010-03-03/age=30ds=2010-03-03/age=40hive > select * from stu_mixed_partition where ds='2010-03-03 'and age=20;OK1 zhangsan 183111111 2010-03-03 20Time taken: 0.184 seconds, Fetched: 1 row (s) hive > select * from stu_mixed_partition where ds='2010-03-03' and age=30 OK2 lisi 18222222222 2010-03-03 30Time taken: 0.188 seconds, Fetched: 1 row (s) hive > select * from stu_mixed_partition where ds='2010-03-03 'and age=40 OK3 wangwu 15733333333 2010-03-03 40Time taken: 0.186 seconds Fetched: 1 row (s) # View the HDFS directory [hadoop@oradb3 ~] $hadoop fs-ls / user/hive/warehouse/stu_mixed_partition/ds=2010-03-03Found 3 itemsdrwxr-xr-x-hadoop supergroup 0 2018-07-02 14:10 / user/hive/warehouse/stu_mixed_partition/ds=2010-03-03/age=20drwxr-xr-x-hadoop supergroup 0 2018-07-02 14:10 / user/hive/warehouse/stu_mixed_partition/ Ds=2010-03-03/age=30drwxr-xr-x-hadoop supergroup 0 2018-07-02 14:10 / user/hive/warehouse/stu_mixed_partition/ds=2010-03-03/age=40

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report