Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the common custom functions of hive

2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces what the common custom functions of hive are, which can be used for reference by interested friends. I hope you can learn a lot after reading this article.

1.1 Why do you need custom functions

Hive's built-in functions do not meet all business needs. Hive provides many modules that can customize functions, such as custom functions, serde, input and output formats, and so on.

1.2 what are the common custom functions

00001. UDF: user-defined function, user defined function. One-to-one input and output. (most commonly used).

00002. UDTF: user-defined tables generate functions. User defined table-generate function. One-to-many input and output. Lateral view explode

00003. UDAF: user-defined aggregate function. User defined aggregate function . Many-to-one input and output count sum max.

2 self-defined functions to implement 2.1 UDF format

First create a new pom.xml under the project. For more information on the dependency package of the following maven, please see code/pom.xml

The following points should be taken into account when defining UDF functions:

00001. Inherit org.apache.hadoop.hive.ql.exec.UDF

00002. Override evaluate (), this method is not defined by the interface, because the number of parameters it can accept and the data type are uncertain. Hive checks the UDF to see if it can find an evaluate () method that matches the function call.

2.1.1 the first case of a custom function

Public class FirstUDF extends UDF {

Public String evaluate (String str) {

String upper = null

/ / 1. Check input parameter if (StringUtils.isEmpty (str)) {

} else {

Upper = str.toUpperCase ()

}

Return upper

}

/ / debug the custom function public static void main (String [] args) {

System.out.println (new firstUDF () .evaluate ("jiajingwen")

}}

2.2 function loading mode 2.2.1 command loading

This load is only valid for this session

# 1. Upload the jar package of udf to the server, and add the jar package to the class path of hive

# enter the hive client and execute the following command

Add jar / hivedata/udf.jar

# 2. Create a temporary function name in the same session as the above hive:

Create temporary function toUP as' com.qf.hive.FirstUDF'

3. Check whether the function is created successfully

Show functions

4. Test function

Select toUp ('abcdef')

5. Delete function

Drop temporary function if exists tolow

2.2.2 loading of startup parameters

(also valid in this session, temporary function)

1. Upload the jar package of udf to the server

2. Create a configuration file

Vi. / hive-init

Add jar / hivedata/udf.jar

Create temporary function toup as' com.qf.hive.FirstUDF'

# 3. Bring the initialization file with you when you start hive:

Hive-i. / hive-init

Select toup ('abcdef')

2.2.3 profile loading

Through the configuration file, whenever you start with the hive command line, the function will be loaded.

1. Upload the jar package of udf to the server

2. Create a configuration file under the bin directory of the installation directory of hive. The file name is: .hiverc

Vi. / bin/.hiverc

Add jar / hivedata/udf.jar

Create temporary function toup as' com.qf.hive.FirstUDF'

3. Start hive

Hive

2.3 UDTF format

UDTF is an one-to-many input and output. To implement UDTF, you need to complete the following steps

00001. Inherit org.apache.hadoop.hive.ql.udf.generic.GenericUDF

00002. Override initlizer (), getdisplay (), evaluate ().

The execution process is as follows:

UDTF first calls the initialize method, which returns information about the returned rows (number, type) of the UDTF.

After initialization, the process method is called, and the real processing is in the process function. In process, each forward () call produces a row; if you generate multiple columns, you can put the values of multiple columns in an array, and then pass the array to the forward () function.

Finally, the close () method is called to clean up the methods that need to be cleaned.

2.3.1 requirements:

Parse a similar string like "K1Parse v1th K2Plus v2entK3V3" into multiple lines per line, and each line is output in key:value format

2.3.2 Source Code

The custom functions are as follows:

Package com.qf.hive

Public class ParseMapUDTF extends GenericUDTF {

@ Override

Public void close () throws HiveException {

}

@ Override

Public StructObjectInspector initialize (ObjectInspector [] args)

Throws UDFArgumentException {

If (args.length! = 1) {

Throw new UDFArgumentLengthException ("only one parameter can be passed")

}

ArrayList fieldNameList = new ArrayList ()

ArrayList fieldOIs = new ArrayList ()

FieldNameList.add ("map")

FieldOIs.add (PrimitiveObjectInspectorFactory.javaStringObjectInspector)

FieldNameList.add ("key")

FieldOIs.add (PrimitiveObjectInspectorFactory.javaStringObjectInspector)

Return ObjectInspectorFactory.getStandardStructObjectInspector (fieldNameList,fieldOIs)

}

@ Override

Public void process (Object [] args) throws HiveException {

String input = args [0] .toString ()

String [] paramString = input.split (";")

For (int item0; I

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report