In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article focuses on "how to create a Hive partition", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to create a Hive partition.
The concept of Hive partitioning is different from traditional relational database partitioning.
Traditional database partitioning: as far as oracle is concerned, partitions exist independently in segments, where real data is stored, and partitions are automatically assigned when the data is inserted.
Hive partitioning method: because Hive is actually an abstraction stored on HDFS, a partition name of Hive corresponds to a directory name, and the sub-partition name is a subdirectory name, not an actual field.
So it can be understood that when we specify a partition when we insert data, we actually create a new directory or subdirectory, or add data files to the original directory.
Creation of Hive Partition
The Hive partition is defined with the Partitioned by keyword when the table is created, but note that the columns defined in the Partitioned by clause are formal columns in the table, but these columns are not included in the data file under Hive because they are directory names.
Static partition
Create a static partition table par_tab, a single partition
Create table par_tab (name string,nation string) partitioned by (sex string) row format delimited fields terminated by','
At this point, the structure of the table viewed through desc is as follows
Hive > desc par_tab OKname string nation string sex string # Partition Information # col_name data_type Comment sex string Time taken: 0.038 seconds Fetched: 8 row (s)
Prepare the local data file par_tab.txt, with the content "name / nationality", with gender (sex) as the partition
Jan,chinamary,americalilei,chinaheyong,chinayiku,japanemoji,japan
Insert data into the table (in fact, load operation is equivalent to moving files to the Hive directory of HDFS)
Load data local inpath'/ home/hadoop/files/par_tab.txt' into table par_tab partition (sex='man')
At this time, query the par_tab table under hive and change it to 3 columns. Note.
Hive > select * from par_tab;OKjan china manmary america manlilei china manheyong china manyiku japan manemoji japan manTime taken: 0.076 seconds, Fetched: 6 row (s)
View par_tab directory structure
[hadoop@hadoop001 files] $hadoop dfs-lsr / user/hive/warehouse/par_tabdrwxr-xr-x-hadoop supergroup 0 2017-03-29 08:25 / user/hive/warehouse/par_tab/sex=man-rwxr-xr-x 1 hadoop supergroup 71 2017-03-29 08:25 / user/hive/warehouse/par_tab/sex=man/par_tab.txt
As you can see, when you create a new partition table, the system creates a directory (table name) under the default path / user/hive/warehouse/ of the hive data warehouse, creates a subdirectory of the directory, sex=man (partition name), and finally stores the actual data file under the partition name.
If you insert another data file data, such as a file
Lily,chinanancy,chinahanmeimei,america
Insert data
Load data local inpath'/ home/hadoop/files/par_tab_wm.txt' into table par_tab partition (sex='woman')
View the par_tab table directory structure
[hadoop@hadoop001 files] $hadoop dfs-lsr / user/hive/warehouse/par_tabdrwxr-xr-x-hadoop supergroup 0 2017-03-29 08:25 / user/hive/warehouse/par_tab/sex=man-rwxr-xr-x 1 hadoop supergroup 71 2017-03-29 08:25 / user/hive/warehouse/par_tab/sex=man/par_tab.txtdrwxr-xr-x-hadoop supergroup 0 2017-03-29 08:35 / user/hive / warehouse/par_tab/sex=woman-rwxr-xr-x 1 hadoop supergroup 41 2017-03-29 08:35 / user/hive/warehouse/par_tab/sex=woman/par_tab_wm.txt
Finally, check the results of the two inserts, including man and woman
Hive > select * from par_tab;OKjan china manmary america manlilei china manheyong china manyiku japan manemoji japan manlily china womannancy china womanhanmeimei america womanTime taken: 0.136 seconds, Fetched: 9 row (s)
Because partitioned columns are actually defined by the table, when querying partitioned data
Hive > select * from par_tab where sex='woman';OKlily china womannancy china womanhanmeimei america womanTime taken: 0.515 seconds, Fetched: 3 row (s)
Create a static partition table, par_tab_muilt, with multiple partitions (gender + date)
Hive > create table par_tab_muilt (name string, nation string) partitioned by (sex string,dt string) row format delimited fields terminated by','; hive > load data local inpath'/ home/hadoop/files/par_tab.txt' into table par_tab_muilt partition (sex='man',dt='2017-03-29') [hadoop@hadoop001 files] $hadoop dfs-lsr / user/hive/warehouse/par_tab_muiltdrwxr-xr-x-hadoop supergroup 0 2017-03-29 08:45 / user/hive/warehouse/par_tab_muilt/sex=mandrwxr-xr-x-hadoop supergroup 0 2017-03-29 08:45 / user/hive/warehouse/par_tab_muilt/sex=man/dt=2017-03-29-rwxr-xr-x 1 hadoop supergroup 71 2017-03-29 08 : 45 / user/hive/warehouse/par_tab_muilt/sex=man/dt=2017-03-29/par_tab.txt
It can be seen that the partition order defined when the table is created determines the file directory order (who is the parent directory and who is the subdirectory). Because of this hierarchy, when we query all man, the data under all dates below man will be found. If only date partitions are queried, but both the parent directory sex=man and sex=woman have data for that date, Hive prunes the input path so that only date partitions are scanned and gender partitions are not filtered (that is, the query results include all genders).
Dynamic partition
If you use the static partition above, you must first know what type of partition you have when you insert it, and it's annoying to write a load data for each partition. The above problems can be solved by using dynamic partitioning, which can be dynamically allocated to the partition according to the data obtained by the query. In fact, the difference between dynamic and static partitions is that they do not specify partition directories and are chosen by the system itself.
First, start the dynamic partitioning function
Hive > set hive.exec.dynamic.partition=true
Suppose you already have a table par_tab, the first two columns are name name and nationality nation, and the last two columns are partition columns, gender sex and date dt. The data is as follows
Hive > select * from par_tab;OKlily china man 2013-03-28nancy china man 2013-03-28hanmeimei america man 2013-03-28jan china man 2013-03-29mary america man 2013-03-29lilei china man 2013-03-29heyong china man 2013-03-29yiku japan man 2013-03-29emoji japan man 2013-03-29Time taken: 1.141 seconds, Fetched: 9 row (s)
Now I insert the contents of this table directly into another table par_dnm, and realize that sex is a static partition and dt dynamic partition (do not specify the date, let the system allocate the decision)
Hive > insert overwrite table par_dnm partition (sex='man',dt) > select name, nation, dt from par_tab
Take a look at the directory structure after insertion
Drwxr-xr-x-hadoop supergroup 0 2017-03-29 10:32 / user/hive/warehouse/par_dnm/sex=mandrwxr-xr-x-hadoop supergroup 0 2017-03-29 10:32 / user/hive/warehouse/par_dnm/sex=man/dt=2013-03-28-rwxr-xr-x 1 hadoop supergroup 41 2017-03-29 10:32 / user/hive/warehouse/par_dnm/sex=man/dt=2013-03-28/000000_0drwxr-xr -x-hadoop supergroup 0 2017-03-29 10:32 / user/hive/warehouse/par_dnm/sex=man/dt=2013-03-29-rwxr-xr-x 1 hadoop supergroup 71 2017-03-29 10:32 / user/hive/warehouse/par_dnm/sex=man/dt=2013-03-29
Check the number of partitions again
Hive > show partitions par_dnm;OKsex=man/dt=2013-03-28sex=man/dt=2013-03-29Time taken: 0.065 seconds, Fetched: 2 row (s)
It proves that the dynamic partition is successful.
Note that dynamic partitions do not allow primary partitions to use dynamic columns and secondary partitions to use static columns, which will cause all primary partitions to create partitions defined by secondary partition static columns.
Dynamic partitioning allows all partitioning columns to be dynamic partitioning columns, but first set a parameter hive.exec.dynamic.partition.mode:
Hive > set hive.exec.dynamic.partition.mode;hive.exec.dynamic.partition.mode=strict
Its default value is strick, that is, all partition columns are not allowed to be dynamic, which is to prevent users from dynamically building partitions only within subpartitions, but due to negligence forgetting the value specified in the primary partition column, this will cause a dml statement to create a large number of new partitions (corresponding to a large number of new folders) in a short time, which will affect system performance.
So we're going to set up:
Hive > set hive.exec.dynamic.partition.mode=nostrick; so far, I believe you have a deeper understanding of "how to create Hive partitions". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.