In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "the partition operation method of Hive partition table", interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Now let the editor to take you to learn "the partition operation method of Hive partition table"!
There is no addition or deletion of row-level data in Hive. The only way to load data into the table is to use a large amount of data to load. You can insert through load.
Dynamic partition, dynamic static
So hive provides a dynamic partition function, which can infer the name of the partition based on the location of the query parameters, thus establishing the partition.
Note: when using insert...select to import data into the table, the number of fields queried must be the same as the number of target fields, neither more nor less, otherwise an error will be reported. However, if the type of the field is inconsistent, it will be populated with null values and will not report an error. When you load data into the hive table in load data form, it is not checked. If there are more fields, they will be discarded, and if they are less, the null values will be populated. Similarly, if the field type is inconsistent, it is also populated with null values.
When there are multiple partition fields, implement semi-automatic partition (static partition of some fields, note that the static partition field should be in front of the dynamic)
In order to manage tables properly and improve query efficiency, Hive can organize tables into "partitions". A partition is actually a directory under a table, a table can be partitioned in multiple dimensions, and the relationship between partitions is the relationship of the directory tree.
1. Create a partition table
Specified by the PARTITIONED BY clause, the order of partitions determines who is the parent directory and who is the subdirectory.
Create a partition table with one partition:
CREATE TABLE IF NOT EXISTS part_test (C1 string, c2 string, c3 string, c4 string) PARTITIONED BY (day_id string) ROW FORMAT DELIMITED FIELDS TERMINATED BY'| 'STORED AS TEXTFILE
Create a partition table with two partitions:
CREATE TABLE IF NOT EXISTS part_test_1 (C1 string, c2 string, c3 string, c4 string) PARTITIONED BY (month_id string,day_id string) ROW FORMAT DELIMITED FIELDS TERMINATED BY'| 'STORED AS TEXTFILE
2. External partition table
External tables can also be built into partition tables, such as under the hdfs directory / user/tuoming/part
There are two directories 201805 and 201806, there is a 20180509 subdirectory under 201805, and there are 20180609 and 20180610 subdirectories under 201806.
Create an external partition table that maps to the / user/tuoming/part directory:
CREATE EXTERNAL TABLE IF NOT EXISTS part_test_2 (C1 string, c2 string, c3 string, c4 string) PARTITIONED BY (month_id string,day_id string) ROW FORMAT DELIMITED FIELDS TERMINATED BY', 'STORED AS TEXTFILE
LOCATION'/ user/tuoming/part'; adds partitions to part_test_2:
Alter table part_test_2 add partition (month_id='201805',day_id='20180509') location'/ user/tuoming/part/201805/20180509';alter table part_test_2 add partition (month_id='201806',day_id='20180609') location'/ user/tuoming/part/201806/20180609';alter table part_test_2 add partition (month_id='201806',day_id='20180610') location'/ user/tuoming/part/201806/20180610'
Use the show partitions statement to see which partitions part_test_2 has:
Show partitions part_test_2;3, internal partition table
Create an internal partition table with primary partition month_id and child partition day_id:
CREATE TABLE IF NOT EXISTS part_test_3 (C1 string, c2 string, c3 string, c4 string) PARTITIONED BY (month_id string,day_id string) ROW FORMAT DELIMITED FIELDS TERMINATED BY'| 'STORED AS TEXTFILE
Load data for internal partition tables
(1) use load data inpath... The overwrite into table partition statement is loaded from the hdfs directory:
Load data inpath'/ user/tuoming/test/test' overwrite into table part_test_3 partition (month_id='201805',day_id='20180509')
The data are as follows:
(2) use insert overwrite table/ insert into... The partition statement loads from the query results:
Override insert:
Insert overwrite table part_test_3 partition (month_id='201805',day_id='20180509') select * from part_test_temp
Append insert:
Insert into part_test_3 partition (month_id='201805',day_id='20180509') select * from part_test_temp
Note: using the above two methods to load data for internal partition tables does not require pre-creation of partitions, and the corresponding partitions are automatically created when the data is loaded. If you want to create a partition for the internal table in advance, you need to use the hadoop fs-mkdir command to create the corresponding partition directory under the table directory, and then use the alter table add partition statement to increase the partition:
4. Delete the partition
Use alter table... The drop partition statement deletes the corresponding partition:
Alter table part_test_3 drop partition (day_id='20180509')
Note: external partition tables use alter table. The drop partition statement deletes the partition, only the metadata is deleted, and the corresponding directories and files are not deleted. Internal tables use this statement to delete partitions, deleting both metadata and corresponding directories and data files.
5. Dynamic Partition
The above use of insert overwrite table … Partition... To load data into a partition from the query results, a specific partition must be specified, and each partition needs to use an insert statement. When you need to insert data from multiple partitions at a time, you can use dynamic partitions to dynamically allocate to partitions according to the data obtained by the query. The difference between a dynamic partition and a static partition is that no partition directory is specified, and hive chooses which partition to insert based on the actual data.
# start dynamic partitioning function set hive.exec.dynamic.partition=true; # allow all partitions to be dynamic partitions set hive.exec.dynamic.partition.mode=nostrick
# month_id is a static partition and day_id is a dynamic partition:
Insert overwrite table dynamic_test partition (month_id='201710',day_id) select C1 from kafka_offsetwhere substr (day_id,1,6) = '201710'
# month_id and day_id are both dynamic partitions:
Insert overwrite table dynamic_test partition (month_id,day_id) select C1, c2, day_id,1,6, c3, c3, 4, c5, 6, 6, 7, substr (day_id,1,6) as month_id,day_id from kafka_offset
In order to make the data with the same value of the partition column in the same mapreduce as far as possible, so that each mapreduce can generate as few new folders as possible, you can use the function of distribute by to put the data with the same value of the partition column together.
Insert overwrite table dynamic_test partition (month_id,day_id) select C1, c2, day_id,1,6, c3, c3, 4, c5, 6, 6, 7, substr (day_id,1,6) as month_id,day_id from kafka_offsetdistribute by month_id,day_id
Several different ways to export data in Hive. These methods can be divided into three types depending on where they are exported:
Export to local file system
Export to HDFS
Export to another table in Hive.
In order to avoid simple words, I will explain them step by step with commands.
Article catalogue
1 Export to the local file system
2 Export to HDFS
3 Export to another table in Hive
Export to the local file system hive > insert overwrite local directory'/ home/wyp/wyp' select * from wyp
The execution of this HQL needs to be completed by enabling Mapreduce. After running this statement, a file will be generated in the / home/wyp/wyp directory of the local file system. This file is the result of Reduce (the file name generated here is 0000000,0). We can take a look at the contents of this file:
[wyp@master ~ / wyp] $vim 00000005 ^ A23 ^ A23 ^ A13121212126 ^ Awyp2 ^ A24 ^ A1345353535357 ^ A25 ^ A1324535353538 ^ Awyp4 ^ A26 ^ A15424343551 ^ Awyp4 ^ A2518888888882 ^ Atest ^ A30 ^ A13888888883 ^ Azs ^ A34 ^ A8993 14121
Unlike importing data into Hive, you cannot use insert into to export data: as you can see, this is all the data in the wyp table. The delimiter between columns in the data is ^ A (ascii code is\ 00001).
Export to HDFS
As simple as importing data into the local file system, you can do this with the following statement:
Hive > insert overwrite directory'/ home/wyp/hdfs' select * from wyp
The exported data will be saved in the / home/wyp/hdfs directory of HDFS. Note that if there is one less local than the HQL that exports the file to the local file system, the path to the data is different.
Export to another table in Hive
In fact, this is used in the article "several data Import methods of Hive", which is also the data import method of Hive, as follows:
Hive > insert into table test > partition (age='25') > select id, name, tel > from wyp # A bunch of Mapreduce task information is output here Omit # Total MapReduce CPU Time Spent: 1 seconds 310 msecOKTime taken: 19.125 secondshive > select * from test OK5 wyp1 1312121212256 wyp2 13453535353535 257 wyp3 132453535353 258 wyp4 1542434355 251 wyp 13188888888888 252 test 138888888888253 zs 899314121 25Time taken: 0.126 seconds, Fetched: 7 row (s)
If you are using the Hive version is 0.11.0, then you can export the data to specify the separator between the columns (see this blog's "Hive0.11 query results saved to the file and specify the separator between the columns"), the operation is as follows: careful readers may ask, how to import data into the file, why the column between the data is not the wyp table set column separator? In fact, between Hive version 0.11.0, the export of data can not specify the delimiter between columns, but can only be divided by the default column delimiter, that is, the above ^ A, so the derived data is not very intuitive and looks very inconvenient!
Hive > insert overwrite local directory'/ home/iteblog/local'
> row format delimited
> fields terminated by'\ t'
> select * from wyp
[wyp@master ~ / local] $vim 000000000000
5 wyp1 23 131212121212
6 wyp2 24 134535353535
7 wyp3 25 132453535353
8 wyp4 26 154243434355
1 wyp 25 13188888888888
2 test 30 13888888888888
3 zs 34 899314121
This is good, isn't it?
In fact, we can also use the-e and-f parameters of hive to export the data. Where-e represents a sql statement followed by double quotes, while-f is followed by a file, and the content of the file is a sql statement, as follows:
[wyp@master ~ / local] $hive-e "select * from wyp" > > local/wyp.txt [wyp@master ~ / local] $cat wyp.txt5 wyp1 23 13121212126 wyp2 24 1345353535357 wyp3 25 1324535353538 wyp4 261542434343551 wyp 25 1318888888882 test 30 138888888883 zs 34899314121 [wyp@master ~ / local] $cat wyp.sqlselect * from wyp [wyp@master ~ / local] $hive-f wyp.sql > > local/wyp2.txt
The result is also segmented by\ t. It can also be implemented with the-f parameter:
The result of the above statement is also\ t segmented.
At this point, I believe that everyone on the "Hive partition table partition operation method" have a deeper understanding, might as well to the actual operation of it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.