What is the development method of UDF and UDAF 04/27 Update SLTechnology News&Howtos

What is the development method of UDF and UDAF

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what is the development method of UDF and UDAF". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what is the development method of UDF and UDAF".

UDF custom function

Custom functions include three types of UDF, UDAF, and UDTF

UDF (User-Defined-Function) one in and one out

UDAF (User- Defined Aggregation Funcation) aggregate function, one more in and one out. Count/max/min

UDTF (User-Defined Table-Generating Functions) has more than one input, such as lateral view explore ()

How to use it: add the jar file of the custom function in a HIVE session, then create a function and then use the function

Mode of use:

In a HIVE session, add the jar file for the custom function, then create the function, and then use the function

UDF development

1. The UDF function can be directly applied to the select statement. After formatting the query structure, the content is output.

2. When writing UDF functions, you need to pay attention to the following points:

A) Custom UDF needs to inherit org.apache.hadoop.hive.ql.UDF.

B) the evaluate function needs to be implemented, and the evaluate function supports overloading.

3. Steps

A) package the program on the target machine

B) enter the hive client and add the jar package: hive > add jar/ run/jar/udf_test.jar

C) create a temporary function: hive > CREATE TEMPORARY FUNCTION add_example AS 'hive.udf.Add'

D) query HQL statement:

SELECT add_example (8,9) FROM scores

SELECT add_example (scores.math, scores.art) FROM scores

SELECT add_example (6,7,8,6.8) FROM scores

E) destroy temporary function: hive > DROP TEMPORARY FUNCTION add_example

Note: UDF can only implement one-in-one-out operation. If you need to implement multiple input and output, you need to implement UDAF.

Udf implements the interception of strings

Package hive;import java.util.regex.Matcher;import java.util.regex.Pattern;import org.apache.hadoop.hive.ql.exec.UDF;public class GetCmsID extends UDF {public String evaluate (String url) {String cmsid = null; if (url = = null | | ".equals (url)) {return cmsid;} Pattern pat = Pattern.compile (" topicId= [0-9] + "); Matcher matcher = pat.matcher (url) If (matcher.find ()) {cmsid=matcher.group (). Split ("topicId=") [1];} return cmsid;} public String evaluate (String pattern,String url) {String cmsid= null; if (url = = null | | ".equals (url)) {return cmsid;} Pattern pat = Pattern.compile (pattern+" [0-9] + ") Matcher matcher = pat.matcher (url); if (matcher.find ()) {cmsid=matcher.group () .split (pattern) [1];} return cmsid;} public static void main (String [] args) {String url = "http://www.baidu.com/cms/view.do?topicId=123456"; GetCmsID getCmsID = new GetCmsID () System.out.println (getCmsID.evaluate (url)); System.out.println (getCmsID.evaluate ("topicId=", url));}}

UDAF custom set function

Multiple rows and one exit, such as sum (), min (), used in group by

1. Must inherit

} org.apache.hadoop.hive.ql.exec.UDAF (function class inheritance)

} org.apache.hadoop.hive.ql.exec.UDAFEvaluator (inner class Evaluator implements UDAFEvaluator interface)

2.Evaluator needs to implement init, iterate, terminatePartial, merge and terminate functions.

} init (): similar to constructor, used for initialization of UDAF

} iterate (): receives the incoming parameters, performs internal rotation, and returns boolean

} terminatePartial (): no parameter, which means that after the rotation of the iterate function ends, the rotation data is returned, similar to Combiner of hadoop

} merge (): receives the returned result of terminatePartial and performs data merge operation. The return type is boolean.

} terminate (): returns the final aggregate function result

} developing a function is the same as:

} wm_concat () function of Oracle

} group_concat of Mysql ()

Package hive;import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;public class Wm_concat {public static class myUDAFEval implements UDAFEvaluator {private PartialResult partial = new PartialResult (); public static class PartialResult {String result = ""; String delimiter = null;} @ Override public void init () {partial.result = "" } public boolean iterate (String value, String deli) {if (value = = null | | "null" .equalsIgnoreCase (value)) {return true;} if (partial.delimiter = = null) {partial.delimiter = deli } if (partial.result.length () > 0) {partial.result = partial.result.concat (partial.delimiter); / / splicing} partial.result = partial.result.concat (value); / / stitching return true;} public PartialResult terminatePartial () {return partial } public boolean merge (PartialResult other) {if (other = = null) {return true;} if (partial.delimiter = = null) {partial.delimiter = other.result; partial.result = other.result } else {if (partial.result.length () > 0) {partial.result = partial.result.concat (partial.delimiter);} partial.result = partial.result.concat (other.result);} return true } public String terminate () {if (partial==null | | partial.result.length () = = 0) {return null;} return partial.result;}

Test:

Create table test (id string, name string) row format delimited fields terminated by'\ t'

Insert data

1 a

1 b

2 b

3 c

1 c

2 a

4 b

2 d

1 d

4 c

3 b

Execute the function in hive as follows

Select id,concat (name,',') from wm_concat where id is not null group by id

Thank you for reading, the above is the content of "what is the development method of UDF and UDAF". After the study of this article, I believe you have a deeper understanding of what the development method of UDF and UDAF is, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.