In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces what the common custom functions of hive are, which can be used for reference by interested friends. I hope you can learn a lot after reading this article.
1.1 Why do you need custom functions
Hive's built-in functions do not meet all business needs. Hive provides many modules that can customize functions, such as custom functions, serde, input and output formats, and so on.
1.2 what are the common custom functions
00001. UDF: user-defined function, user defined function. One-to-one input and output. (most commonly used).
00002. UDTF: user-defined tables generate functions. User defined table-generate function. One-to-many input and output. Lateral view explode
00003. UDAF: user-defined aggregate function. User defined aggregate function . Many-to-one input and output count sum max.
2 self-defined functions to implement 2.1 UDF format
First create a new pom.xml under the project. For more information on the dependency package of the following maven, please see code/pom.xml
The following points should be taken into account when defining UDF functions:
00001. Inherit org.apache.hadoop.hive.ql.exec.UDF
00002. Override evaluate (), this method is not defined by the interface, because the number of parameters it can accept and the data type are uncertain. Hive checks the UDF to see if it can find an evaluate () method that matches the function call.
2.1.1 the first case of a custom function
Public class FirstUDF extends UDF {
Public String evaluate (String str) {
String upper = null
/ / 1. Check input parameter if (StringUtils.isEmpty (str)) {
} else {
Upper = str.toUpperCase ()
}
Return upper
}
/ / debug the custom function public static void main (String [] args) {
System.out.println (new firstUDF () .evaluate ("jiajingwen")
}}
2.2 function loading mode 2.2.1 command loading
This load is only valid for this session
# 1. Upload the jar package of udf to the server, and add the jar package to the class path of hive
# enter the hive client and execute the following command
Add jar / hivedata/udf.jar
# 2. Create a temporary function name in the same session as the above hive:
Create temporary function toUP as' com.qf.hive.FirstUDF'
3. Check whether the function is created successfully
Show functions
4. Test function
Select toUp ('abcdef')
5. Delete function
Drop temporary function if exists tolow
2.2.2 loading of startup parameters
(also valid in this session, temporary function)
1. Upload the jar package of udf to the server
2. Create a configuration file
Vi. / hive-init
Add jar / hivedata/udf.jar
Create temporary function toup as' com.qf.hive.FirstUDF'
# 3. Bring the initialization file with you when you start hive:
Hive-i. / hive-init
Select toup ('abcdef')
2.2.3 profile loading
Through the configuration file, whenever you start with the hive command line, the function will be loaded.
1. Upload the jar package of udf to the server
2. Create a configuration file under the bin directory of the installation directory of hive. The file name is: .hiverc
Vi. / bin/.hiverc
Add jar / hivedata/udf.jar
Create temporary function toup as' com.qf.hive.FirstUDF'
3. Start hive
Hive
2.3 UDTF format
UDTF is an one-to-many input and output. To implement UDTF, you need to complete the following steps
00001. Inherit org.apache.hadoop.hive.ql.udf.generic.GenericUDF
00002. Override initlizer (), getdisplay (), evaluate ().
The execution process is as follows:
UDTF first calls the initialize method, which returns information about the returned rows (number, type) of the UDTF.
After initialization, the process method is called, and the real processing is in the process function. In process, each forward () call produces a row; if you generate multiple columns, you can put the values of multiple columns in an array, and then pass the array to the forward () function.
Finally, the close () method is called to clean up the methods that need to be cleaned.
2.3.1 requirements:
Parse a similar string like "K1Parse v1th K2Plus v2entK3V3" into multiple lines per line, and each line is output in key:value format
2.3.2 Source Code
The custom functions are as follows:
Package com.qf.hive
Public class ParseMapUDTF extends GenericUDTF {
@ Override
Public void close () throws HiveException {
}
@ Override
Public StructObjectInspector initialize (ObjectInspector [] args)
Throws UDFArgumentException {
If (args.length! = 1) {
Throw new UDFArgumentLengthException ("only one parameter can be passed")
}
ArrayList fieldNameList = new ArrayList ()
ArrayList fieldOIs = new ArrayList ()
FieldNameList.add ("map")
FieldOIs.add (PrimitiveObjectInspectorFactory.javaStringObjectInspector)
FieldNameList.add ("key")
FieldOIs.add (PrimitiveObjectInspectorFactory.javaStringObjectInspector)
Return ObjectInspectorFactory.getStandardStructObjectInspector (fieldNameList,fieldOIs)
}
@ Override
Public void process (Object [] args) throws HiveException {
String input = args [0] .toString ()
String [] paramString = input.split (";")
For (int item0; I
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.