In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces "the introduction of Hive partition bucket and custom function". In daily operation, I believe many people have doubts about the introduction of Hive partition bucket and custom function. Xiaobian consulted all kinds of data and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "Hive partition bucket and custom function introduction". Next, please follow the editor to study!
Import data:
1 、
Load data local inpath'/ root/tes.txt' into table test.usr
Import local data into hive
2. Import data from hdfs cluster
Load data inpath 'hdfs://node01:9000/user/tes.txt' into table test.te
LOAD DATA command, which can be divided into LOAD DATA LOCAL INPATH and LOAD DATA INPATH. The difference between the two is that LOCAL imports local files and imports HDFS files without LOCAL-equivalent to uploading files directly.
3. Insert into--- internal and external tables, which are not suitable for partitioning
4 、
From table1insert into (overwrite) tables2select id, name
Partition table
Hive partition partition (divided into different file directories for storage)
1. Static partition:
The corresponding partition field must be specified when the table is defined-the partition field must not duplicate the field in the table
A. Single partition table creation statement:
Create table day_table (id int, content string) partitioned by (dt int); upload data: load data local inpath'/ root/tes.txt' into table test.usr partition (age=10)
A single partitioned table, partitioned by day, has three id,content,dt columns in the table structure.
Use dt as a folder to distinguish
In order to reduce the content in each partition and improve the calculation efficiency, we should set the year, month, day, hour, minute and second in advance according to the business requirements.
B. Double partition table building statement:
Create table hour (id int, content string) partitioned by (dt int, hour int)
Double partitioned tables, partitioned by day and hour, add dt and hour columns to the table structure.
First take dt as the folder, and then distinguish it by hoursubfolder
Increase zoning
Alter table hour add partition (dt=10,hour=40)
Alert table tablename add partiton (dt=20,hour=40)
That is to say, when you add a partition, you can't add it directly, but you need to include the original partition and complete the corresponding sorting.
Delete partition
Alter table tablename drop partition (sex='boy')
Alert table tablename drop partiton (dt=20,hour=40)
Note: when you delete a partition, all existing partitions are deleted.
2. Dynamic partitioning:
How to modify permissions:
1 、 conf/hive-site.xml
2. Use set inside hive to set the appropriate settings.
3. Set hive-- conf hive.exec.dynamic.partiton=true when hive starts
The method of modifying permissions
1. Modify permissions
Set hive.exec.dynamic.partiton=true / / enable dynamic partitioning
2. Modify the default state
Set hive.exec.dynamic.partiton.mode=nostrict / / default strict. At least one static partition
Create a partition table:
Create table psn22 (id int, name string, likes array, address map) partitioned by (age int, sex string) ROW FORMAT DELIMITED FIELDS TERMINATED BY', 'COLLECTION ITEMS TERMINATED BY', 'MAP KEYS TERMINATED BY': 'LINES TERMINATED BY'\ t
Write data
From psn21 / / existing table with data insert overwrite table pas22 partiton (age,sex) select * distribute by age,sex
Bucket table:
test data
1,tom,11
Open the split bucket
Set hive.enforce.bucketing=true
Create a bucket
Create table psnbucket1 (id int,name string,age int) clustered by (age) into 4 bucketsrow format delimited fields terminated by','
Load data
Insert into table psnbucket select id,name,age from psn31
Sampling
Select * from bucket_table tablesample (bucket 1 out of 4 by colimes)
Custom function
UDF: one-on-one
1. Inherit UDF
2. Rewrite evaluate
(implement the passed parameters and encapsulate a lot of methods)
Public class AddPrefix extends UDF {/ * here we implement adding custom prefix information to any input * / public String evaluate (String str) {return "HIVE UDF Prefix:" + str;}}
UDAF: many to one
UDTF: one to many
1. Create a udf custom function
2. Complete the jar package and upload it to the linux cluster
3. Upload the jar package in the cluster to hive: add jar / opt/software/jars/UDF.jar
4. Create my own function
Create temporary function bws as "com.hpe.TestUdf.TestUdf"
5. Use function to execute
But when creating a custom function, it is usually a temporary function that is created, so how do you create a permanent function?
Register a permanent hive custom function
Add jar only imports temporary jar files, so you cannot create permanent custom functions. If you want to add permanent custom functions, you need to modify the introduction of jar in the configuration file.
Add to the hive-site.xml file
Hive.aux.jars.path file:///opt/module/hive/lib/app_logs_hive.jar
Note: the value in value is your udf.jar uploaded to the specified path of linux. If there are multiple jar packages, split them with (comma).
At this point, on the "Hive partition bucket and custom function introduction" study is over, I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.