Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Hive uses the UDF function

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Official introduction to the use of UDF: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

Several related concepts of UDF:

UDF: one-to-one row mapping: upper substr [one line in, one line out]

UDAF: Aggregation Many-to-one row mapping, such as sum/min [in multiple lines and out of one line]

UDTF: Table-generating one-to-many for example: lateral view explode () [one to many]

Write the UDF function test code:

Pod.xml add hive:

1.1.0-cdh6.7.0 org.apache.hive hive-exec ${hive.version}

HelloUDF.java:

Package com.ruozedata.hadoop.udf;import org.apache.hadoop.hive.ql.exec.UDF;public class HelloUDF extends UDF {public String evaluate (String input) {/ / TODO... Here is the return "Hello:" + input;} / / the test code public static void main (String [] args) {HelloUDF udf = new HelloUDF (); String output = udf.evaluate ("test data"); System.out.println (output);} Note: the way to implement the UDF function is the same, the first step is to inherit the UDF function, and the second step is to rewrite the evaluate method

After being packaged with maven in idea, upload it to the hive server; the package name is: g6-hadoop-udf.jar

There are several ways for hive to create functions:

Method 1: create a temporary function (Temporary Functions)

Official reference: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateFunction

Cons: Temporary Functions is only valid for the current session (window)

Example: execute in Shell of Hive

ADD JAR / home/hadoop/lib/g6-hadoop-udf.jar

CREATE TEMPORARY FUNCTION sayHello AS 'com.ruozedata.hadoop.udf.HelloUDF'

Show functions; (execute this statement and you can see that sayHello is in the function)

Select sayhello ('abc') from dual; (output: Hello:abc)

Note: another drawback of this approach is that jar requires manual add each time to recognize class_name

Method 2: no manual add jar package is required

Create an auxlib directory under hive's home directory, and put the jar package in this directory.

Whether you create a temporary function or a persistent function, you don't need to load jar manually after you put it into auxlib

Method 3: create a persistent function (Permanent Functions) and use jar; on hdfs to suggest this way

Starting from hive 0.13, it is supported to register the function in metastore, and the stored table is FUNCS (empty by default)

Put the jar package in the / lib directory of hdfs

Example: execute the following command in Shell in Hive

CREATE FUNCTION sayhello2 AS 'com.ruozedata.hadoop.udf.HelloUDF' USING JAR' hdfs://ruozeclusterg6/lib/g6-hadoop-udf.jar'

Note: the sayhello2 function can be used in any window at this time (cannot be found using show functions, but can be seen in the FUNCS table of the metadata)

View the FUNCS table of the hive library in mysql; find that sayhello2 has been registered successfully

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report