How hive customizes functions 07/09 Update SLTechnology News&Howtos

How hive customizes functions

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly shows you "hive how to customize the function", the content is easy to understand, clear, hope to help you solve the doubt, the following let the editor lead you to study and learn "hive how to customize the function" this article.

Custom functions include three types of UDF, UDAF, and UDTF

UDF (User-Defined-Function) one in and one out

UDAF (User- Defined Aggregation Funcation) aggregate function, one more in and one out. Count/max/min

UDTF (User-Defined Table-Generating Functions) has more than one input, such as lateral view explore ()

How to use it: add the jar file of the custom function in a HIVE session, then create a function and then use the function

UDF

This is a common user-defined function. Accepts single-line input and produces single-line output.

1. The UDF function can be directly applied to the select statement. After formatting the query structure, the content is output.

2. When writing UDF functions, you need to pay attention to the following points:

A) Custom UDF needs to inherit org.apache.hadoop.hive.ql.UDF.

B) the evaluate function needs to be implemented, and the evaluate function supports overloading.

Note: UDF can only implement one-in-one-out operation. If you need to implement multiple input and output, you need to implement UDAF.

Udf implements the interception of strings

Package hive;import java.util.regex.Matcher;import java.util.regex.Pattern;import org.apache.hadoop.hive.ql.exec.UDF;public class GetCmsID extends UDF {public String evaluate (String url) {String cmsid = null; if (url = = null | | ".equals (url)) {return cmsid;} Pattern pat = Pattern.compile (" topicId= [0-9] + "); Matcher matcher = pat.matcher (url) If (matcher.find ()) {cmsid=matcher.group (). Split ("topicId=") [1];} return cmsid;} public String evaluate (String pattern,String url) {String cmsid= null; if (url = = null | | ".equals (url)) {return cmsid;} Pattern pat = Pattern.compile (pattern+" [0-9] + ") Matcher matcher = pat.matcher (url); if (matcher.find ()) {cmsid=matcher.group () .split (pattern) [1];} return cmsid;} public static void main (String [] args) {String url = "http://www.baidu.com/cms/view.do?topicId=123456"; GetCmsID getCmsID = new GetCmsID () System.out.println (getCmsID.evaluate (url)); System.out.println (getCmsID.evaluate ("topicId=", url));}}

UDAF

User-defined aggregate function (User-defined aggregate function). Accepts multiple lines of input and produces a single line of output. Such as the MAX,COUNT function.

1. Must inherit

Org.apache.hadoop.hive.ql.exec.UDAF (function class inheritance)

Org.apache.hadoop.hive.ql.exec.UDAFEvaluator (inner class Evaluator implements UDAFEvaluator interface)

2.Evaluator needs to implement init, iterate, terminatePartial, merge and terminate functions.

Init (): similar to constructor, used for initialization of UDAF

Iterate (): receives the incoming parameters for aggregation. When each new value is aggregated, this function is called and returns boolean

TerminatePartial (): has no arguments, and the function is called after the partial aggregation is complete. This function is called when hive wants to get the aggregate result of some records.

Merge (): receives the return result of terminatePartial, which is used to merge some of the previously obtained aggregate results (which can also be understood as the aggregate result of partitioned records). The return type is boolean.

Terminate (): returns the final aggregate function result

The input parameter type of merge must be the same as the return value type of the terminatePartial function.

Packagecom.oserp.hiveudf;importorg.apache.hadoop.hive.ql.exec.UDAF;importorg.apache.hadoop.hive.ql.exec.UDAFEvaluator;importorg.apache.hadoop.hive.serde2.io.DoubleWritable;importorg.apache.hadoop.io.IntWritable;public class HiveAvg extends UDAF {public static class AvgEvaluate implements UDAFEvaluator {public static class PartialResult {public intcount; public doubletotal Public PartialResult () {count = 0; total = 0;}} private PartialResult partialResult; @ Override public void init () {partialResult = new PartialResult () } public boolean iterate (IntWritable value) {/ / it must be judged here whether partialResult is empty, otherwise an error will be reported / / the reason is that the init function will only be called once If (partialResult==null) {partialResult= new PartialResult () will not be initialized for each partial aggregation operation / / if there is no judgment here } if (value! = null) {partialResult.total = partialResult.total + value.get (); partialResult.count=partialResult.count + 1;} return true;} public PartialResult terminatePartial () {returnpartialResult } public boolean merge (PartialResult other) {partialResult.total=partialResult.total + other.total; partialResult.count=partialResult.count + other.count; return true;} public DoubleWritable terminate () {return newDoubleWritable (partialResult.total / partialResult.count);}

Deployment and operation

1)。 Package the program on the target machine

2)。 Enter the hive client and add the jar package:

Hive > add jar / home/sfd/udf_test.jar

3)。 Create a temporary function:

Hive > create temporary function

> as' java full class name'

4)。 Destroy temporary functions:

Hive > drop temporary function

These are all the contents of the article "how to customize functions in hive". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.