Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

0011-how to use UDF in Hive & Impala

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Warm Tip: to see the high-definition no-code picture, please open it with your mobile phone and click the picture to enlarge.

1. Purpose of document writing

This document describes how to develop Hive custom functions (UDF) and how to use Hive's custom functions in Impala. Through this document, you will learn the following:

1. How to use Java to develop custom functions of Hive

two。 How to create custom functions in Hive and use the

3. How to use the custom function of Hive in Impala

This document focuses on the use of UDF in Hive and Impala and is based on the following assumptions:

1. The cluster environment is running normally

two。 Cluster installation of Hive and Impala services

The following is the test environment, but it is not the hard limit of this manual:

1. Operating system: Redhat6.5

Version 5.11.1 for 2.CDH and CM

3. Ec2-user users with sudo authority to operate

2.UDF function development

Use Intellij tool to develop UDF function of Hive and compile it.

1. Use the Intellij tool to create a Java project through Maven

Add dependencies on Hive packages in the 2.pom.xml file

Org.apache.hive hive-exec 1.1.0

The sample code for 3.Java is as follows

Package com.peach.date;import org.apache.hadoop.hive.ql.exec.UDF;import java.text.ParseException;import java.text.SimpleDateFormat;/** * SQL UDF date related tools * Created by peach on 2017-8-24. * / public class DateUtils extends UDF {/ * format the date string into a standard date format * for example: * 2017-8-9 to 2017-08-09 * 2017-08-09 9:23:3 to 2017-08-0909 Vista 23 * @ param sdate * @ param pattern * @ return * / public static String evaluate (Stringsdate, String pattern) {String formatDate = sdate SimpleDateFormat sdf = new SimpleDateFormat ("yyyy-MM-dd HH:mm:ss"); try {formatDate = sdf.format (sdf.parse (sdate));} catch (ParseException e) {e.printStackTrace ();} return formatDate;}}

A simple date conversion custom function is used here as an example. Note that you need to integrate the UDF class and override the evaluate method to implement your own defined function.

4. Compile the jar package

The prerequisite is that the environment variable of Maven has been configured, enter the project directory on the command line, and execute the following command:

Mvn clean package

3.Hive uses custom functions (UDF)

Upload the compiled sql-udf-utils-1.0-SNAPSHOT.jar in Section 2 to the cluster server

3.1 create a temporary UDF

1. Go to the shell command line of Hive and execute the following command to create a temporary function

Add jar / home/ec2-user/sql-udf-utils-1.0-SNAPSHOT.jar;create temporary function parse\ _ date as' com.peach.date.DateUtils'

two。 Test the UDF function on the command line

Select parse\ _ date (dates, 'yyyy-MM-dd HH:mm:ss') from date\ _ test1

3.2 create a permanent UDF

1. Create the appropriate directory in HDFS and upload the sql-udf-utils-1.0-SNAPSHOT.jar package to that directory

Ec2-user@ip-172-31-8-141 ~ $hadoop dfs-mkdir / udfjarec2-user@ip-172-31-8-141 ~ $hadoop dfs-put sql-udf-utils-1.0-SNAPSHOT.jar / udfjar

Note: permissions for directories udfjar and sql-udf-utils-1.0-SNAPSHOT.jar, and the user is hive

two。 Go to the shell command line of Hive and execute the following command to create a permanent UDF

Create function default.parse\ _ date as' com.peach.date.DateUtils' using jar' hdfs://ip-172-31-9-186. APFUSTOT.compute.Compute.Compute.Compact Vufjar.8020Universe UdfjarOnSqlMuyUTFLYUTLMUTLMUTLMUTLLLLLLLLLLLLFLFLYUTSHOT.jar'

Note: if you create it with a database name, the UDF function only works for that library, and other libraries cannot use the UDF function.

3. Test the UDF on the command line

Select parse\ _ date (dates, 'yyyy-MM-dd HH:mm:ss') from date\ _ test1

4. Verify that the permanent UDF function is in effect

Reopening Hive CLI will work properly with the UDF function you created.

4.Impala uses Hive's UDF

1. Execute metadata synchronization commands on the Impala shell command line

Ip-172-31-10-156.ap-southeast-1.compute.internal:21000 > invalidate metadata

two。 Use the UDF function

Ip-172-31-10-156.ap-southeast-1.compute.internal:21000 > select parse\ _ date (dates,'yyyy-MM-dd HH:mm:ss') from date\ _ test1

5. common problem

1. Exception when using UDF custom function through the Impala CLI command line

Connected to ip-172-31-10-156.ap-southeast-1.compute.internal:21000Server version: impalad version 2.7.0-cdh6.10.2 RELEASE (build 38c989c0330ea952133111e41965ff9af96412d3) [ip-172-31-10-156.ap-southeast-1.compute.internal:21000] > select parse_date (dates) from date_test1 Query: select parse_date (dates) from date_test1Query submitted at: 2017-08-24 12:51:44 (Coordinator: http://ip-172-31-10-156.ap-southeast-1.compute.internal:25000)ERROR: AnalysisException: default.parse_date () unknown

If the metadata is not synchronized, execute the following command for metadata synchronization:

[ip-172-31-10-156.ap-southeast-1.compute.internal:21000] > invalidate metadata

two。 Execute on the Impala CLI command line with the following exception

[ip-172-31-10-156.ap-southeast-1.compute.internal:21000] > select parse_date (dates,'yyyy-MM-dd HH:mm:ss') from date_test1 Query: select parse_date (dates 'yyyy-MM-dd HH:mm:ss') from date_test1Query submitted at: 2017-08-24 13:02:14 (Coordinator: http://ip-172-31-10-156.ap-southeast-1.compute.internal:25000)ERROR: Failed to copy hdfs://ip-172-31-9-186.ap-southeast-1.compute.internal:8020/udfjar/sql-udf-utils-1.0-SNAPSHOT.jar to / var/lib/impala/udfs/sql-udf-utils -1.0-SNAPSHOT.2386.2.jar:Error (2): No such file or directory

On the Impala Daemon server, the directory does not exist that causes

Solution:

Create a / var/lib/impala/udfs directory on all Impala Daemon servers

[ec2-user@ip-172-31-10-156lib] $sudo mkdir-p impala/udf [ec2-user@ip-172-31-10-156lib] $sudo chown-R impala:impala impala/

Note: users and groups to which the directory belongs

Warm Tip: to see the high-definition no-code picture, please open it with your mobile phone and click the picture to enlarge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report