How to use Hive 07/06 Update SLTechnology News&Howtos

How to use Hive

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to use Hive". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to use Hive.

1. Operation mode (cluster and local)

1.1.Cluster mode: > SET mapred.job.tracker=cluster

1.2.Local mode: > SET mapred.job.tracker=local

2. 3 ways to access Hive

2.1. Terminal access

# hive or # hive-- service cli

2.2, web access, port 9999

# hive-- service hwi &

Hive remote service, port 10000

# hive-- service hiveserver &

3. Data type

3.1. Basic data types:

Data type

Occupancy length

Tinyint

1byte (- 128 / 127)

Smallint

2byte (- 2 ^ 16 ~ 2 ^ 16-1)

Int

4byte (- 2 ^ 31 ~ 2 ^ 31-1)

Bigint

8byte (- 2 ^ 63 ~ 2 ^ 63-1) float

4byte single precision

Double

8byte double precision string

Boolean

3.2.Composite data type: ARRAY,MAP,STRUCT,UNION

4. Data storage

4.1Based on HDFS

4.2.Storage structure: database, table, file, view

4.3, specify row and column delimiters to parse the data

5. Basic operation

Create database: > create database db_name

5.2. specify database: > use db

5.3. display table: show tables

5.4. Create a table

5.4.1. Internal table (default): create table table_name (param_name type1,param_name2 type2,...) Row format delimited fields terminated by 'delimiter'

Example: create table trade_detail (id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by'\ t'

The internal table is similar to the database table and is stored on HDFS (the location is viewed through the hive.metastore.warehouse.dir parameter, the table is saved here except the external table). When the table is deleted, the metadata information of the table is deleted.

Load data: load data local inpath 'path' into table table_name

5.4.2. Partition table: create table table_name (param_name type1,param_name2 type2,...) Partitioned by (param_name type) row format delimited fields terminated by 'delimiter'

Example: create table td_part (id bigint, account string, income double, expenses double, time string) partitioned by (logdate string) row format delimited fields terminated by'\ t'

The difference from a regular table: each data is divided into different partition files, and each partition in the table corresponds to a directory under the table, although

Load data: load data local inpath 'path' into table table_name partition (parti_param1='value',parti_param2='value',..)

Add partition: alter table partition_table add partition (daytime='2013-02-04)

Delete partition: alter table partition_table drop partition (daytime='2013-02-04), metadata and data files are deleted, but the directory still exists

5.4.3. External table: create external table td_ext (id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by'\ t 'location' hdfs_path'

Load data: load data inpath 'hdfs_path' table_name

5.4.4, bucket table: hash the data and store it in different files.

Create table: create table bucket_table (id string) clustered by (id) into 4 buckets

Load data:

Set hive.enforce.bucketing = true

The above operations must be performed before data can be loaded.

Insert into table bucket_table select name from stu

Insert overwrite table bucket_table select name from stu

When the data is loaded into the bucket table, the hash value of the field is taken and modeled with the number of buckets. Put the data in the corresponding file.

Sample survey: select * from bucket_table tablesample (bucket 1 out of 4 on id)

6. Create a view: CREATE VIEW v1 AS select * from T1

7. Modify the table: alter table tb_name add columns (param_name,type)

8. Delete table: drop table tb_name

9. Data import

9.1. Load data: LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2...)]

When the data is loaded into the table, the data is not transferred, the LOAD operation only copies the data to the corresponding location of the HIVE table

9.2. Mutual derivation of tables in Hive: INSERT OVERWRITE TABLE tablename [PARTITION (partcol1=val1, partcol2=val2...)] Select_statement FROM from_statement

Create as: CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name (col_name data_type,...)... AS SELECT * FROM TB_NAME

10. Query

10.1. Grammatical structure

SELECT [ALL | DISTINCT] select_expr, select_expr,...

FROM table_reference

[WHERE where_condition]

[GROUP BY col_list]

[CLUSTER BY col_list | [DISTRIBUTE BY col_list] [SORT BY col_list] | [ORDER BY col_list]]

[LIMIT number]

ALL and DISTINCT: removing weight

10.2. Partition query

Using the feature of partition pruning (input pruning), similar to "partition index", partition pruning will be started only when WHERE appears in the statement.

10.3 、 LIMIT Clause

Limit can limit the number of records for a query. The results of the query are selected at random. Syntax: SELECT * FROM T1 LIMIT 5

10.4 、 Top N

SET mapred.reduce.tasks = 1 SELECT * FROM sales SORT BY amount DESC LIMIT 5

11. Table join

11.1. Internal connection: select b.namemena. * from dim_ac a join acinfo b on (a.ac=b.acip) limit 10

11.2. Left outer link: select b.namemena. * from dim_ac a left outer join acinfo b on a.ac=b.acip limit 10

12. Java client

Start the remote service # hive-service hiveserver

12.2. Related codes

Class.forName ("org.apache.hadoop.hive.jdbc.HiveDriver"); Connection con = DriverManager.getConnection ("jdbc:hive://192.168.1.102:10000/wlan_dw", "", ""); Statement stmt = con.createStatement (); String querySQL= "SELECT * FROM wlan_dw.dim_m order by flux desc limit 10"; ResultSet res = stmt.executeQuery (querySQL) While (res.next ()) {System.out.println (res.getString (1) + "\ t" + res.getLong (2) + "\ t" + res.getLong (3) + "\ t" + res.getLong (4) + "\ t" + res.getLong (5));}

13. Custom function (UDF)

13.1. The UDF function can be directly applied to the select statement. After formatting the query structure, the content is output.

13.2. When writing UDF functions, you need to pay attention to the following points:

A) Custom UDF needs to inherit org.apache.hadoop.hive.ql.UDF.

B) the evaluate function needs to be implemented, and the evaluate function supports overloading.

13.3. Steps

A) package the program on the target machine

B) enter the hive client and add the jar package: hive > add jar/ run/jar/udf_test.jar

C) create a temporary function: hive > CREATE TEMPORARY FUNCTION add_example AS 'hive.udf.Add'

D) query HQL statement:

SELECT add_example (8,9) FROM scores

SELECT add_example (scores.math, scores.art) FROM scores

SELECT add_example (6,7,8,6.8) FROM scores

E) destroy temporary function: hive > DROP TEMPORARY FUNCTION add_example

Note: UDF can only implement one-in-one-out operation. If you need to implement multiple input and output, you need to implement UDAF.

13.4. Code

Package cn.itheima.bigdata.hive;import java.util.HashMap;import org.apache.hadoop.hive.ql.exec.UDF;public class AreaTranslationUDF extends UDF {private static HashMap areaMap = new HashMap (); static {areaMap.put ("138"," beijing "); areaMap.put (" 139", "shanghai"); areaMap.put ("137"," guangzhou ") AreaMap.put ("136"," niuyue ");} / / is used to translate the mobile phone number into the place of belonging. The evaluate method must be modified by public, otherwise it cannot be called public String evaluate (String phonenbr) {String area= areaMap.get (phonenbr.substring (0Magne3)); return area==null?" other ": area } / / is used to find the sum of the two fields public int evaluate (int x drawing int y) {return x int y;}}

At this point, I believe that you have a deeper understanding of "the use of Hive", you might as well come to the actual operation! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.