Installation and use of Hive 04/17 Update SLTechnology News&Howtos

Installation and use of Hive

2025-04-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "the installation and use of Hive". In the daily operation, I believe many people have doubts about the installation and use of Hive. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "installation and use of Hive". Next, please follow the editor to study!

Preface

Hive is a program interface of Hadoop, Hive allows data analysts to get started quickly, Hive uses SQL-like syntax, Hive makes the world of JAVA simple and easy, and Hive makes Hadoop popular to people other than programmers.

Starting with Hive, analysts can also play with big data.

Catalogue

Installation of Hive

Basic use of Hive: CRUD

Hive interactive mode

Data import

Data export

Hive query HiveQL

Hive view

Hive partition table

1. Installation of Hive

System environment

After installing the hadoop environment, we can install the Hive on the namenode machine (C1).

Hadoop environment, please refer to: let Hadoop run in the cloud series of articles, RHadoop practice series: Hadoop environment building

Download: hive-0.9.0.tar.gz

Extract to: / home/cos/toolkit/hive-0.9.0

Hive configuration

~ cd / home/cos/toolkit/hive-0.9.0~ cp hive-default.xml.template hive-site.xml~ cp hive-log4j.properties.template hive-log4j.properties

Modify hive-site.xml configuration file

Store the metadata of Hive in MySQL

~ vi conf/hive-site.xmljavax.jdo.option.ConnectionURLjdbc:mysql://c1:3306/hive_metadata?createDatabaseIfNotExist=trueJDBC connect string for a JDBC metastorejavax.jdo.option.ConnectionDriverNamecom.mysql.jdbc.DriverDriver class name for a JDBC metastorejavax.jdo.option.ConnectionUserNamehiveusername to use against metastore databasejavax.jdo.option.ConnectionPasswordhivepassword to use against metastore databasehive.metastore.warehouse.dir/user/hive/warehouselocation of default database for the warehouse

Modify hive-log4j.properties

# log4j.appender.EventCounter=org.apache.hadoop.metrics.jvm.EventCounterlog4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter

Set environment variabl

~ sudo vi / etc/environmentPATH= "/ usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/home/cos/toolkit/ant184/bin:/home/cos/toolkit/jdk16/bin:/home/cos/toolkit/maven3/bin:/home/cos/toolkit/hadoop-1.0.3/bin:/home/cos/toolkit/hive-0.9.0/bin" JAVA_HOME= / home/cos/toolkit/jdk16ANT_HOME=/home/cos/toolkit/ant184MAVEN_HOME=/home/cos/toolkit/maven3HADOOP_HOME=/home/cos/toolkit/hadoop-1.0.3HIVE_HOME=/home/cos/toolkit/hive-0.9.0CLASSPATH=/home/cos/toolkit/jdk16/lib/dt.jar:/home/cos/toolkit/jdk16/lib/tools.jar

On the hdfs, create a directory

$HADOOP_HOME/bin/hadoop fs-mkidr / tmp$HADOOP_HOME/bin/hadoop fs-mkidr / user/hive/warehouse$HADOOP_HOME/bin/hadoop fs-chmod Grouw / tmp$HADOOP_HOME/bin/hadoop fs-chmod Grouw / user/hive/warehouse

Create a database in MySQL

Create database hive_metadata;grant all on hive_metadata.* to hive@'%' identified by 'hive';grant all on hive_metadata.* to hive@localhost identified by' hive';ALTER DATABASE hive_metadata CHARACTER SET latin1

Manually upload mysql's jdbc library to hive/lib

~ ls / home/cos/toolkit/hive-0.9.0/libmysql-connector-java-5.1.22-bin.jar

Start hive

# start metastore service ~ bin/hive-- service metastore & Starting Hive Metastore Server# start hiveserver service ~ bin/hive-- service hiveserver & Starting Hive Thrift Server# start hive client ~ bin/hive shellLogging initialized using configuration in file:/root/hive-0.9.0/conf/hive-log4j.propertiesHive history file=/tmp/root/hive_job_log_root_201211141845_1864939641.txthive > show tablesOK

Query metadata in MySQL database

Hive has been successfully installed. Here is a guide to using hive.

2. Basic use of Hive

1. Enter the hive console

~ cd / home/cos/toolkit/hive-0.9.0~ bin/hive shellLogging initialized using configuration in file:/home/cos/toolkit/hive-0.9.0/conf/hive-log4j.propertiesHive history file=/tmp/cos/hive_job_log_cos_201307160003_95040367.txthive >

New table

# create data (text is separated by tab) ~ vi / home/cos/demo/t_hive.txt16 2 361 12 1341 2 3117 21 371 2 311 12 3411 2 3 create a new table hive > CREATE TABLE t_hive (an int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY'\ t' OKTime taken: 0.489 seconds# Import data t_hive.txt into t_hive table hive > LOAD DATA LOCAL INPATH'/ home/cos/demo/t_hive.txt' OVERWRITE INTO TABLE t_hive; Copying data from file:/home/cos/demo/t_hive.txtCopying file: file:/home/cos/demo/t_hive.txtLoading data to table default.t_hiveDeleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hiveOKTime taken: 0.397 seconds

View tables and data

# View table hive > show tables;OKt_hiveTime taken: 0.099 seconds# canonical match table name hive > show tables'* tweets: OKTHIVETime taken: 0.065 seconds# view table data hive > select * from THIVITOR OK16 2361 12 1341 2 3117 21 371 2 311 12 3411 2 34Time taken: 0.264 seconds# view table structure hive > desc t_hive OKa intb intc intTime taken: 0.1 seconds

Modify the table

# add a field hive > ALTER TABLE t_hive ADD COLUMNS (new_col String); OKTime taken: 0.186 secondshive > desc tactile OKa intb intc intnew_col stringTime taken: 0.086 seconds# re-command table name ~ ALTER TABLE t_hive RENAME TO tweak hadoopteric OKTime taken: 0.45 secondshive > show tables;OKt_hadoopTime taken: 0.07 seconds

Delete tabl

Hive > DROP TABLE tantalhadoopteric OkTime taken: 0.767 secondshive > show tables;OKTime taken: 0.064 seconds3. Hive interactive mode

Quit,exit: exit interactive shell

Reset: reset configuration to default

Set =: modify the value of a specific variable (if the variable name is misspelled, no error will be reported)

Set: outputs hive configuration variables overridden by the user

Set-v: outputs all configuration variables for Hadoop and Hive

Add FILE [S] *, add JAR [S] *, add ARCHIVE [S] *: add one or more file, jar, archives to the distributed cache

List FILE [S], list JAR [S], list ARCHIVE [S]: outputs resources that have been added to the distributed cache.

List FILE [S] *, list JAR [S] *, list ARCHIVE [S] *: check whether a given resource is added to the distributed cache

Delete FILE [S] *, delete JAR [S] *, delete ARCHIVE [S] *: delete the specified resource from the distributed cache

Dfs: execute a dfs command from Hive shell

Execute a Hive query, and then output the results to standard output

Source FILE: execute a hive script file in CLI

4. Data import

Take the t_hive just now as an example.

# create table structure hive > CREATE TABLE t_hive (an int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY'\ t'

Load data from the operating local file system (LOCAL)

Hive > LOAD DATA LOCAL INPATH'/ home/cos/demo/t_hive.txt' OVERWRITE INTO TABLE t_hive Copying data from file:/home/cos/demo/t_hive.txtCopying file: file:/home/cos/demo/t_hive.txtLoading data to table default.t_hiveDeleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hiveOKTime taken: 0.612 seconds# find the data just imported in HDFS ~ hadoop fs-cat / user/hive/warehouse/t_hive/t_hive.txt16 2 361 12 1341 2 3117 21 371 2 311 12 3411 2 34

Load data from HDFS

Create the table t_hive2hive > CREATE TABLE t_hive2 (an int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY'\ tuploading # load data from HDFS hive > LOAD DATA INPATH'/ user/hive/warehouse/t_hive/t_hive.txt' OVERWRITE INTO TABLE tweehive2 loading data to table default.t_hive2Deleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hive2OKTime taken: 0.325 seconds# View data hive > select * from t_hive2 OK16 2 361 12 1341 2 3117 21 371 2 311 12 3411 2 34Time taken: 0.287 seconds

Import data from other tables

Hive > INSERT OVERWRITE TABLE t_hive2 SELECT * FROM t_hive Total MapReduce jobs = 2Launching Job 1 out of 2Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_201307131407_0002, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0002Kill Command = / home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job-Dmapred.job.tracker=hdfs://c1.wtmart.com:9001-kill job_201307131407_0002Hadoop job information for Stage-1: number of mappers: 1 Number of reducers: 02013-07-16 10 reduce 32 Stage-1 map = 0%, reduce = 0% 2013-07-16 10 reduce 32 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10 reduce 32 sec2013 49050 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10 sec2013 3250068 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10 51082 Stage-1 map = 100% Reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10 Stage-1 map 52093 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10 sec2013 32 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10 Stage-1 map 32 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.03 secMapReduce Total cumulative CPU time: 1 seconds 30 msecEnded Job = job_201307131407_0002Ended Job =-314818888 Job is filtered out (removed at runtime) .Moving data to: hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16010-32-31_323_5732404975764014154/-ext-10000Loading data to table default.t_hive2Deleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hive2Table default.t_hive2 stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 56 Raw_data_size: 0] 7 Rows loaded to t_hive2MapReduce Jobs Launched:Job 0: Map: 1 Cumulative CPU: 1.03 sec HDFS Read: 273 HDFS Write: 56 SUCCESSTotal MapReduce CPU Time Spent: 1 seconds 30 msecOKTime taken: 23.227 secondshive > select * from t_hive2 OK16 2 361 12 1341 2 3117 21 371 2 311 12 3411 2 34Time taken: 0.134 seconds

Create tables and import data from other tables

# Delete the table hive > DROP TABLE * FROM t_hive2 # create a table and import data hive > CREATE TABLE t_hive AS SELECT * from other tables Total MapReduce jobs = 2Launching Job 1 out of 2Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_201307131407_0003, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0003Kill Command = / home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job-Dmapred.job.tracker=hdfs://c1.wtmart.com:9001-kill job_201307131407_0003Hadoop job information for Stage-1: number of mappers: 1 Number of reducers: 02013-07-16 10 reduce 36 sec2013 4648 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10 V 36 V 55657 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10 R V 56666 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10 V V 57673 Stage-1 map = 100% Reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10 secMapReduce Total cumulative CPU time 36Stage-1 map 58683 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10 sec2013 36V 59691 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.13 secMapReduce Total cumulative CPU time: 1 seconds 130 msecEnded Job = job_201307131407_0003Ended Job =-670956236 Job is filtered out (removed at runtime) .Moving data to: hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16 October 10-36-39_986_1343249562812540343/-ext-10001Moving data to: hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hiveTable default.t_hive stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 56 Raw_data_size: 0] 7 Rows loaded to hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16 October 10-36-39_986_1343249562812540343/-ext-10000MapReduce Jobs Launched:Job 0: Map: 1 Cumulative CPU: 1.13 sec HDFS Read: 272 HDFS Write: 56 SUCCESSTotal MapReduce CPU Time Spent: 1 seconds 130 msecOKTime taken: 20.13 secondshive > select * from t_hive OK16 2 361 12 1341 2 3117 21 371 2 311 12 3411 2 34Time taken: 0.109 seconds

Only replicating table structure does not guide data

Hive > CREATE TABLE t_hive3 LIKE tincture hive > select * from tincture hive3 * Oktime taken: 0.077 seconds

Import data from MySQL database

We'll talk about it when we introduce Sqoop.

5. Data export

Copy from HDFS to other HDFS locations

~ hadoop fs-cp / user/hive/warehouse/t_hive / ~ hadoop fs-ls / t_hiveFound 1 items-rw-r--r-- 1 cos supergroup 56 2013-07-16 10:41 / hadoop fs-cat / t_hive/000000_016236112134123117213712311123411234

Export to the local file system through Hive

Hive > INSERT OVERWRITE LOCAL DIRECTORY'/ tmp/t_hive' SELECT * FROM t_hive Total MapReduce jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_201307131407_0005, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0005Kill Command = / home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job-Dmapred.job.tracker=hdfs://c1.wtmart.com:9001-kill job_201307131407_0005Hadoop job information for Stage-1: number of mappers: 1 Number of reducers: 02013-07-16 10 reduce 46 sec2013 24774 Stage-1 map = 0%, reduce = 0% 2013-07-16 10 10 V 46 V 30823 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10 V 46 R R 31833 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10 V 46 R R 32844 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10 R 46 R 33856 Stage-1 map = 100% Reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10 sec2013 46 Stage-1 map 34865 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10 sec2013 46 reduce 35873 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87710 sec2013-07-16 10Switzerland 34884 Stage-1 map = 100%, reduce = 100% Cumulative CPU 0.87 secMapReduce Total cumulative CPU time: 870 msecEnded Job = job_201307131407_0005Copying data to local directory / tmp/t_hiveCopying data to local directory / tmp/t_hive7 Rows loaded to / tmp/t_hiveMapReduce Jobs Launched:Job 0: Map: 1 Cumulative CPU: 0.87sec HDFS Read: 271 HDFS Write: 56 SUCCESSTotal MapReduce CPU Time Spent: 870 msecOKTime taken: 23.369 seconds# View the local operating system hive >! Cat / tmp/t_hive/000000_0;hive > 162361121341231172137123111234112346. Hive query HiveQL

Note: the following code removes the log output from map,reduce.

Normal queries: sorting, column aliases, nested subqueries

Hive > FROM (> SELECT bmai c as c2 FROM t_hive >) t > SELECT t.b, t.c2 > WHERE b > 2 > LIMIT 2 Ten 12 1321 3

Join query: JOIN

Hive > SELECT t1.a JOIN t_hive2 t 1.b 2.a JOIN t_hive2 t 2.b > WHERE T1 JOIN t_hive2 t 2 on t1.a=t2.a > WHERE t 1.c > 10 *

Aggregate query 1:count, avg

Hive > SELECT count (*), avg (a) FROM tweets 7 31.142857142857142

Aggregate query 2:count, distinct

Hive > SELECT count (DISTINCT b) FROM tours

Aggregate query 3:GROUP BY, HAVING

# GROUP BYhive > SELECT avg (a), bjournal sum (c) FROM t_hive GROUP BY c16.02 356.0 2 6211.0 2 3461.0 12 131.0 12 3417.0 21 3#HAVINGhive > SELECT avg (a), bheroic sum (c) FROM t_hive GROUP BY bpentry c HAVING sum (c) > 3056.0 2 6211.0 2 341.0 12347. Hive view

The concept of the Hive view is the same as the database view, and we also take t_hive as an example.

Hive > CREATE VIEW v_hive AS SELECT a FROM t_hive where b FROM t_hive where c > 30 position hive > select * from venerable hiveterter41 271 21 1211 2

Delete View

Hive > DROP VIEW IF EXISTS vastly hiveable OkTime taken: 0.495 seconds8. Hive partition table

Partitioned table is the basic concept of database, but in many cases, the amount of data is small, and we don't need partitioned table at all. Hive is a kind of OLAP data warehouse software, the amount of data involved is very large, so the partition table is very important in this scenario!

Let's redefine a data table structure: t_hft

Create data

~ vi / home/cos/demo/t_hft_20130627.csv000001,092023,9.76000002,091947,8.99000004,092002,9.79000005,091514,2.2000001,092008,9.70000001,092059,9.45~ vi / home/cos/demo/t_hft_20130628.csv000001,092023,9.76000002,091947,8.99000004,092002,9.79000005,091514,2.2000001,092008,9.70000001,092059,9.45

Create a datasheet

DROP TABLE IF EXISTS tweak hftterCreate TABLE t_hft (SecurityID STRING,tradeTime STRING,PreClosePx DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY','

Create a partition data table

Partition design according to business: by day and stock ID

DROP TABLE IF EXISTS tweehftterCreate TABLE t_hft (SecurityID STRING,tradeTime STRING,PreClosePx DOUBLE) PARTITIONED BY (tradeDate INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY','

Import data

# 20130627hive > LOAD DATA LOCAL INPATH'/ home/cos/demo/t_hft_20130627.csv' OVERWRITE INTO TABLE t_hft PARTITION (tradeDate=20130627); Copying data from file:/home/cos/demo/t_hft_20130627.csvCopying file: file:/home/cos/demo/t_hft_20130627.csvLoading data to table default.t_hft partition (tradedate=20130627) # 20130628hive > LOAD DATA LOCAL INPATH'/ home/cos/demo/t_hft_20130628.csv' OVERWRITE INTO TABLE t_hft PARTITION (tradeDate=20130628) Copying data from file:/home/cos/demo/t_hft_20130628.csvCopying file: file:/home/cos/demo/t_hft_20130628.csvLoading data to table default.t_hft partition (tradedate=20130628)

View partition table

Hive > SHOW PARTITIONS tasking hftten tradedatebooks 20130627 tradedates 20130628 time taken: 0.082 seconds

Query data

Hive > select * from t_hft where securityid='000001';000001 092023 9.76 20130627000001 092008 9.7 20130627000001 092059 9.45 20130627000001 092023 9.76 20130628000001 092008 9.7 20130628000001 092059 9.45 20130628hive > select * from t_hft where tradedate=20130627 and PreClosePx

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.