In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
[TOC]
The collation of Hive notes (3) the classification of Hive functions Hive functions
The definition of function is the same as java and mysql, there are three kinds of function.
UDF (User Definition Function user-defined function) one input, one output sin (30 °) = 1/2UDAF (User Definition Aggregation Function aggregate function) multiple input, one output max min count sum avg and so on UDTF (User Definition Table Function table function) one input, multiple output explode common function show functions; lists the list of functions available in hive desc function func_name Check the help description of the function case when-> switch or if elseif-> ternary operator explode-> convert the elements in the array to multiple rows of data a = [1,2,3] 4] explode (a) = = > 123 4split-> is the split function in the string array-> collect_setcollect_listconcat_ws-> concatenating elements with a given string-- row_number-> grouping sorting or secondary sorting function case wordcount Analysis: hello You hello me hello he's process of using mr: step1- > split ("\ t")-- > ["hello" "you"] ["hello", "me"] ["hello", "he"] step2- > traverses each array Write out each value in the array as key,value 1 step3,shuffle--- > step 4, reduce = = > reduceByKey use hql step 1 (mydb1) > select split (line, "\ t") from test ["hello", "you"] ["hello", "he"] ["hello", "me"] step 2 converts each row of data in the array into multiple rows (mydb1) > select explode (line ("\ t")) from test Hello you hello he hello me step 3 can select w.word by group by on the basis of step2, count (w.word) as count from (select explode (split (line, "\ t") word from test) w group by w.word order by count desc;case when
Case when displays the following corresponding department names:
1Muhashi-> from group, 2afluto-> administrative group, 3murmura-> sales group, 4fufur-> R & D group, 5murmura-> other hive (mydb1) > select * from T1-- 12345 select id,case id when 1 then "when 2 then" administrative group "when 3 then" sales group "when 4 then" R & D group "else" administrative group "endfrom T1" Classification shows that 1 student group 2 administrative group 3 sales group 4 research and development group 5 other row_number secondary ranking three kinds of connection cross connection across join, there will be Cartesian product So no inner join (equivalent join) inner join will output the data on the left table and right table that can match outer join left outer link (left outer join) right outer link (right outer join) according to employee, part, salary, these three tables, 1, group display the information of each part of the employee (startup display part name, employee name) Employee gender [male | female], employee salary) At the same time, the grouping sorts select e.name, if (sex = = 0, 'female', 'male') as gender, d.name, s.salary according to the descending order of employees' salaries. Row_number () over (partition by e.deptid order by s.salary desc) rank from t_dept d left join t_employee e on d.id = e.deptid left join t_salary s on e.id = s.empid where s.salary is not null 2. Get the employee information select tmp.* from (select e.name, if (sex = = 0, 'female', 'male') as gender, d.name, s.salary that displays the department salary top2 Row_number () over (partition by e.deptid order by s.salary desc) rank from t_dept d left join t_employee e on d.id = e.deptid left join t_salary s on e.id = s.empid where s.salary is not null) tmp where tmp.rank
< 3; 如果查询的是单表,则可以不用子查询,只用用having来获取即可(having rank < 3) 直接看下面的一个例子就可以知道row_number的使用方法了: hive (mydb2)>Create table T9 (> id int, > province string, > salary float >); hive (mydb2) > insert into T9 values OK1 gd 18000.02 gd 16000.03 bj 13000.04 gd 15000.05 bj 17000.06 bj 19000.0Time taken: 0.097 seconds, Fetched: 6 row (s) hive (mydb2) > select > id, > province, > salary > row_number () over (partition by province order by salary desc) as rank > from T9 OK6 bj 19000.0 15 bj 17000.0 23 bj 13000.0 31 gd 18000.0 12 gd 16000.0 24 gd 15000.0 3Time taken: 1.578 seconds, Fetched: 6 row (s) Hive Custom function Custom function steps
There are six steps to follow for a custom function:
1 °, define a Java class to inherit the function of UDF class 2 °and overwrite evaluate (), have the system to call 3 °, make the written program into a jar, upload it to the server 4 °, load the jar in 3 °to the classpathhive terminal of hive to execute add jar jar_path;5 °, and set a temporary name to the custom function, that is, to create a temporary function create temporary function function name as' full class name of the evalutor class'. 6 °. After the execution of the function ends, you can manually destroy the temporary function, or ignore it. Because the current session disappears, the function automatically destroys the UDF case: the corresponding * * and constellations should be counted according to the user's birthday.
The program code is as follows:
Package com.uplooking.bigdata.hive.udf;import org.apache.hadoop.hive.ql.exec.Description;import org.apache.hadoop.hive.ql.exec.UDF;import org.apache.hadoop.io.Text;import java.text.ParseException;import java.text.SimpleDateFormat;import java.util.Calendar;import java.util.Date @ Description (name = "zodic", value = "_ FUNC_ (param1, param2)-returns the × × × or constellation corresponding to a given date", extended = "param1" The param2 parameter can be as follows:\ n "+" 1. Param1 is A string in the format of 'yyyy-MM-dd HH:mm:ss' or' yyyy-MM-dd'.\ n "+" 2. Param1 date value\ n "+" 3. Param1 timestamp value\ n "+ 3. Param2 0 or 1, 0 means constellation 1 means zodica\ n "+" Example:\ n "+" > SELECT _ FUNC_ ('2009-07-30, 0) FROM src LIMIT 1 \ n "+ Leo") public class ZodicaAndConstellationUDF extends UDF {public Text evaluate (java.sql.Date date, int type) {if (type = = 0) {/ / constellation return new Text (getConstellation (new Date (date.getTime ();} else if (type = = 1) {/ / × × return new Text (getZodica (new Date (date.getTime ();} return null } public String [] zodiacArr = {"monkey", "chicken", "dog", "pig", "mouse", "cow", "tiger", "rabbit", "dragon", "snake", "horse", "sheep"} Public String [] constellationArr = {Aquarius, Pisces, Aries, Taurus, Gemini, Cancer, Leo, × ×, Libra, Scorpio, Sagittarius, Capricorn}; public int [] constellationEdgeDay = {20, 19, 21, 21, 22, 23, 23, 23, 22, 22} / * get * @ return * / public String getZodica (Date date) {Calendar cal = Calendar.getInstance (); cal.setTime (date); return zodiacArr [cal.get (Calendar.YEAR)% 12] according to the date } / * obtain constellations by date * @ return * / public String getConstellation (Date date) {if (date = = null) {return ";} Calendar cal = Calendar.getInstance (); cal.setTime (date); int month = cal.get (Calendar.MONTH); int day = cal.get (Calendar.DAY_OF_MONTH) If (day
< constellationEdgeDay[month]) { month = month - 1; } if (month >= 0) {return constellationArr [month];} / / default to return Capricorn return constellationArr [11];}}
Pay attention to relying on the back of the notes.
After uploading to the server, load the classpath to hive in the hive terminal:
Add jar / home/uplooking/jars/hive/udf-zc.jar
Custom function:
Create temporary function zc as' com.uplooking.bigdata.hive.udf.ZodicaAndConstellationUDF'
Create a temporary table for testing:
Hive (mydb1) > create temporary table tmp (> birthday date)
Insert data for testing:
Hive (mydb1) > insert into tmp values ('1994-06-21')
Use the function in the query:
Hive (mydb1) > select zc (birthday,0) from tmp;OKc0 Gemini Time taken: 0.084 seconds, Fetched: 1 row (s) hive (mydb1) > select zc (birthday,1) from tmp;OKc0 dog Time taken: 0.044 seconds, Fetched: 1 row (s)
Here is a simpler UDF function, which can be tested with reference:
Package cn.xpleaf.hive.udf;import org.apache.hadoop.hive.ql.exec.Description;import org.apache.hadoop.hive.ql.exec.UDF;import org.apache.hadoop.io.Text / * * @ author Leaf * @ date 11:11 on 2018-9-18 * / @ Description (name = "addUDF", value = "_ FUNC_ (num1, num2)-returns the sum of the given two numbers") public class AddUDF extends UDF {public Text evaluate (int num1, int num2) {return new Text (String.valueOf (num1 + num2));}} jdbc of Hive
Hive provides a user interface for jdbc in addition to the previous cli user interface, but if you need to use this interface, you need to start the hiveserver2 service first. After starting the service, you can continue to operate hive in the way of cli through the beeline provided by hive (but it should be noted that hive is operated through the jdbc interface at this time), or by writing java code by hand.
Start the hiveserver2 service [uplooking@uplooking01 ~] $hiveserver2 to connect to hiveserver through beeline to operate [uplooking@uplooking01 hive] $beelinewhich: no hbase in (/ usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/jdk/bin:/home/uplooking/bin:/home/uplooking/app/zookeeper/bin:/home/uplooking/app/hadoop/bin:/home/uplooking/app/hadoop/sbin:/home/uplooking/app/) Hive/bin) ls: unable to access / home/uplooking/app/hive/lib/hive-jdbc-*-standalone.jar: there is no file or directory Beeline version 2.1.0 by Apache Hivebeeline >! connect jdbc:hive2://uplooking01:10000/mydb1Connecting to jdbc:hive2://uplooking01:10000/mydb1Enter username for jdbc:hive2://uplooking01:10000/mydb1: uplookingEnter password for jdbc:hive2://uplooking01:10000/mydb1: * SLF4J: Class path Contains multiple SLF4J bindings.SLF4J: SLF4J: Found binding in [jar:file:/home/uplooking/app/hadoop/share/hadoop/common/lib/slf4j-log4j12 jar:file:/home/uplooking/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jarbank Found binding in orgbank slf4jimplclass] SLF4J: See Bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Error: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException (org.apache.hadoop.security.authorize.AuthorizationException): User: uplooking is not allowed to impersonate uplooking (state= Code=0)
You can see that an error has occurred and the solution is as follows:
When executing JDBC, the ThriftServer service that cannot access the remote Hive reports an error: uplooking cannot be disguised as uplooking because of the security policy taken into account when the version is upgraded, we need to manually configure uplooking and connect the uplooking user in hadoop with the uplooking user in hive. The configuration is configured in $HADOOP_HOME/etc/hadoop/core-site.xml: add the following configuration item hadoop.proxyuser.uplooking.hosts * this is the native address accessed by the uplooking user, the hadoop.proxyuser.uplooking.groups root proxy uplooking setting, after the group user configuration is successful, it needs to be synchronized to each node in the cluster For the cluster to reload configuration information, at least hdfs needs to restart
After that, you can use beeline to operate hive normally through the jdbc interface provided by hive:
Beeline >! connect jdbc:hive2://uplooking01:10000/mydb1Connecting to jdbc:hive2://uplooking01:10000/mydb1Enter username for jdbc:hive2://uplooking01:10000/mydb1: uplookingEnter password for jdbc:hive2://uplooking01:10000/mydb1: * SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jarVera filebank Homebank implying Binder.class] SLF4J: Found binding in [jar:file:/home/uplooking/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jarring SLF4J] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Connected to: Apache Hive (version 2.1.0) Driver: Hive JDBC (version 2.1.0) 18max 03 / 23 08:00:15 [main]: WARN jdbc.HiveConnection: Request to set autoCommit to false Hive does not support autoCommit=false.Transaction isolation: TRANSACTION_REPEATABLE_READ0: jdbc:hive2://uplooking01:10000/mydb1 > show databases;+-+--+ | database_name | +-+-+ | default | | mydb1 | +-+-- + 2 rows selected (2.164 seconds) 0: jdbc:hive2://uplooking01:10000/mydb1 > show tables +-+-+ | tab_name | +-+-- + | T1 | | T2 | +-+-- + 2 rows selected (0.118 seconds) 0: jdbc:hive2://uplooking01:10000/mydb1 > select * from T1 +-rows selected (2.143 seconds) 0: jdbc:hive2://uplooking01:10000/mydb1 > connect hiveserver with java code to operate.
The program code is as follows:
Package com.uplooking.bigdata.hive.jdbc;import java.sql.*;public class HiveJDBC {public static void main (String [] args) throws Exception {Class.forName ("org.apache.hive.jdbc.HiveDriver"); Connection conn = DriverManager.getConnection ("jdbc:hive2://uplooking01:10000/mydb1", "uplooking", "uplooking") String sql = "select t.wordbook count (t.word) as count from (select explode (split (line,'') as word from T1) t group by t.word"; PreparedStatement ps = conn.prepareStatement (sql); ResultSet rs = ps.executeQuery (); while (rs.next ()) {String word = rs.getString ("word"); int count = rs.getInt ("count") System.out.println (word + "\ t" + count);} rs.close (); ps.close (); conn.close ();}}
The execution result of the program is as follows:
18-03-23 00:48:16 INFO jdbc.Utils: Supplied authorities: uplooking01:1000018/03/23 00:48:16 INFO jdbc.Utils: Resolved authority: uplooking01:10000he 1hello 3me 1you 1
During this process, observe the output of the hiveserver2 terminal:
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. Spark, tez) or using Hive 1.X releases.Query ID = uplooking_20180323084825_63044683-393d-4625-a3c3-b440109c3d70Total jobs = 1Launching Job 1 out of 1Number of reduce tasks not specified. Estimated from input data size: 1In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=In order to limit the maximum number of reducers: set hive.exec.reducers.max=In order to set a constant number of reducers: set mapreduce.job.reduces=Starting Job = job_1521765850571_0002, Tracking URL = http://uplooking02:8088/proxy/application_1521765850571_0002/Kill Command = / home/uplooking/app/hadoop/bin/hadoop job-kill job_1521765850571_0002Hadoop job information for Stage-1: number of mappers: 1 Number of reducers: 12018-03-23 0848 Stage-1 map 33427 Stage-1 map = 0%, reduce = 0% 2018-03-230848 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.54 sec2018-03-230848 reduce = 100%, reduce = 100% Cumulative CPU 6.84 secMapReduce Total cumulative CPU time: 6 seconds 840 msecEnded Job = job_1521765850571_0002MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 6.84 sec HDFS Read: 8870 HDFS Write: 159 SUCCESSTotal MapReduce CPU Time Spent: 6 seconds 840 msecOKHive Chinese comments garbled resolution
If there is garbled code, you can try the following solution:
Hive Chinese comment garbled resolution: in the hive Metabase, execute the script ALTER TABLE COLUMNS_V2 MODIFY COLUMN COMMENT VARCHAR (256) CHARACTER SET utf8; ALTER TABLE TABLE_PARAMS MODIFY COLUMN PARAM_VALUE VARCHAR (4000) CHARACTER SET utf8; ALTER TABLE PARTITION_PARAMS MODIFY COLUMN PARAM_VALUE VARCHAR (4000) CHARACTER SET utf8; ALTER TABLE PARTITION_KEYS MODIFY COLUMN PKEY_COMMENT VARCHAR (4000) CHARACTER SET utf8 ALTER TABLE INDEX_PARAMS MODIFY COLUMN PARAM_VALUE VARCHAR (4000) CHARACTER SET utf8 At the same time, url Plus utf-8 & useUnicode=true&characterEncoding=UTF-8 javax.jdo.option.ConnectionURL jdbc:mysql://uplooking01:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8 Hive's maven depends on UTF-8 2.1.0 2.6.4 1.2.1 junit junit 4.12 test org.apache.hadoop hadoop-common ${hadoop-api.version} org.apache.hadoop hadoop-mapreduce-client -core ${hadoop-api.version} org.apache.hadoop hadoop-core ${hadoop-core.version} org.apache.hive hive-exec ${hive-api.version} org.apache.hive hive-serde ${hive-api.version} org.apache.hive hive-service ${hive-api.version} org.apache.hive hive-metastore ${hive-api.version} org.apache.hive hive-common ${hive-api.version} org.apache.hive hive- Cli ${hive-api.version} org.apache.hive hive-jdbc ${hive-api.version} org.apache.thrift libfb303 0.9.0
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.