Introduction of Hive Partition Bucket and Custom function 04/27 Update SLTechnology News&Howtos

Introduction of Hive Partition Bucket and Custom function

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "the introduction of Hive partition bucket and custom function". In daily operation, I believe many people have doubts about the introduction of Hive partition bucket and custom function. Xiaobian consulted all kinds of data and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "Hive partition bucket and custom function introduction". Next, please follow the editor to study!

Import data:

1 、

Load data local inpath'/ root/tes.txt' into table test.usr

Import local data into hive

2. Import data from hdfs cluster

Load data inpath 'hdfs://node01:9000/user/tes.txt' into table test.te

LOAD DATA command, which can be divided into LOAD DATA LOCAL INPATH and LOAD DATA INPATH. The difference between the two is that LOCAL imports local files and imports HDFS files without LOCAL-equivalent to uploading files directly.

3. Insert into--- internal and external tables, which are not suitable for partitioning

4 、

From table1insert into (overwrite) tables2select id, name

Partition table

Hive partition partition (divided into different file directories for storage)

1. Static partition:

The corresponding partition field must be specified when the table is defined-the partition field must not duplicate the field in the table

A. Single partition table creation statement:

Create table day_table (id int, content string) partitioned by (dt int); upload data: load data local inpath'/ root/tes.txt' into table test.usr partition (age=10)

A single partitioned table, partitioned by day, has three id,content,dt columns in the table structure.

Use dt as a folder to distinguish

In order to reduce the content in each partition and improve the calculation efficiency, we should set the year, month, day, hour, minute and second in advance according to the business requirements.

B. Double partition table building statement:

Create table hour (id int, content string) partitioned by (dt int, hour int)

Double partitioned tables, partitioned by day and hour, add dt and hour columns to the table structure.

First take dt as the folder, and then distinguish it by hoursubfolder

Increase zoning

Alter table hour add partition (dt=10,hour=40)

Alert table tablename add partiton (dt=20,hour=40)

That is to say, when you add a partition, you can't add it directly, but you need to include the original partition and complete the corresponding sorting.

Delete partition

Alter table tablename drop partition (sex='boy')

Alert table tablename drop partiton (dt=20,hour=40)

Note: when you delete a partition, all existing partitions are deleted.

2. Dynamic partitioning:

How to modify permissions:

1 、 conf/hive-site.xml

2. Use set inside hive to set the appropriate settings.

3. Set hive-- conf hive.exec.dynamic.partiton=true when hive starts

The method of modifying permissions

1. Modify permissions

Set hive.exec.dynamic.partiton=true / / enable dynamic partitioning

2. Modify the default state

Set hive.exec.dynamic.partiton.mode=nostrict / / default strict. At least one static partition

Create a partition table:

Create table psn22 (id int, name string, likes array, address map) partitioned by (age int, sex string) ROW FORMAT DELIMITED FIELDS TERMINATED BY', 'COLLECTION ITEMS TERMINATED BY', 'MAP KEYS TERMINATED BY': 'LINES TERMINATED BY'\ t

Write data

From psn21 / / existing table with data insert overwrite table pas22 partiton (age,sex) select * distribute by age,sex

Bucket table:

test data

1,tom,11

Open the split bucket

Set hive.enforce.bucketing=true

Create a bucket

Create table psnbucket1 (id int,name string,age int) clustered by (age) into 4 bucketsrow format delimited fields terminated by','

Load data

Insert into table psnbucket select id,name,age from psn31

Sampling

Select * from bucket_table tablesample (bucket 1 out of 4 by colimes)

Custom function

UDF: one-on-one

1. Inherit UDF

2. Rewrite evaluate

(implement the passed parameters and encapsulate a lot of methods)

Public class AddPrefix extends UDF {/ * here we implement adding custom prefix information to any input * / public String evaluate (String str) {return "HIVE UDF Prefix:" + str;}}

UDAF: many to one

UDTF: one to many

1. Create a udf custom function

2. Complete the jar package and upload it to the linux cluster

3. Upload the jar package in the cluster to hive: add jar / opt/software/jars/UDF.jar

4. Create my own function

Create temporary function bws as "com.hpe.TestUdf.TestUdf"

5. Use function to execute

But when creating a custom function, it is usually a temporary function that is created, so how do you create a permanent function?

Add jar only imports temporary jar files, so you cannot create permanent custom functions. If you want to add permanent custom functions, you need to modify the introduction of jar in the configuration file.

Add to the hive-site.xml file

Hive.aux.jars.path file:///opt/module/hive/lib/app_logs_hive.jar

Note: the value in value is your udf.jar uploaded to the specified path of linux. If there are multiple jar packages, split them with (comma).

At this point, on the "Hive partition bucket and custom function introduction" study is over, I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.