In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces "the installation and use of Hive". In the daily operation, I believe many people have doubts about the installation and use of Hive. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "installation and use of Hive". Next, please follow the editor to study!
Preface
Hive is a program interface of Hadoop, Hive allows data analysts to get started quickly, Hive uses SQL-like syntax, Hive makes the world of JAVA simple and easy, and Hive makes Hadoop popular to people other than programmers.
Starting with Hive, analysts can also play with big data.
Catalogue
Installation of Hive
Basic use of Hive: CRUD
Hive interactive mode
Data import
Data export
Hive query HiveQL
Hive view
Hive partition table
1. Installation of Hive
System environment
After installing the hadoop environment, we can install the Hive on the namenode machine (C1).
Hadoop environment, please refer to: let Hadoop run in the cloud series of articles, RHadoop practice series: Hadoop environment building
Download: hive-0.9.0.tar.gz
Extract to: / home/cos/toolkit/hive-0.9.0
Hive configuration
~ cd / home/cos/toolkit/hive-0.9.0~ cp hive-default.xml.template hive-site.xml~ cp hive-log4j.properties.template hive-log4j.properties
Modify hive-site.xml configuration file
Store the metadata of Hive in MySQL
~ vi conf/hive-site.xmljavax.jdo.option.ConnectionURLjdbc:mysql://c1:3306/hive_metadata?createDatabaseIfNotExist=trueJDBC connect string for a JDBC metastorejavax.jdo.option.ConnectionDriverNamecom.mysql.jdbc.DriverDriver class name for a JDBC metastorejavax.jdo.option.ConnectionUserNamehiveusername to use against metastore databasejavax.jdo.option.ConnectionPasswordhivepassword to use against metastore databasehive.metastore.warehouse.dir/user/hive/warehouselocation of default database for the warehouse
Modify hive-log4j.properties
# log4j.appender.EventCounter=org.apache.hadoop.metrics.jvm.EventCounterlog4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
Set environment variabl
~ sudo vi / etc/environmentPATH= "/ usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/home/cos/toolkit/ant184/bin:/home/cos/toolkit/jdk16/bin:/home/cos/toolkit/maven3/bin:/home/cos/toolkit/hadoop-1.0.3/bin:/home/cos/toolkit/hive-0.9.0/bin" JAVA_HOME= / home/cos/toolkit/jdk16ANT_HOME=/home/cos/toolkit/ant184MAVEN_HOME=/home/cos/toolkit/maven3HADOOP_HOME=/home/cos/toolkit/hadoop-1.0.3HIVE_HOME=/home/cos/toolkit/hive-0.9.0CLASSPATH=/home/cos/toolkit/jdk16/lib/dt.jar:/home/cos/toolkit/jdk16/lib/tools.jar
On the hdfs, create a directory
$HADOOP_HOME/bin/hadoop fs-mkidr / tmp$HADOOP_HOME/bin/hadoop fs-mkidr / user/hive/warehouse$HADOOP_HOME/bin/hadoop fs-chmod Grouw / tmp$HADOOP_HOME/bin/hadoop fs-chmod Grouw / user/hive/warehouse
Create a database in MySQL
Create database hive_metadata;grant all on hive_metadata.* to hive@'%' identified by 'hive';grant all on hive_metadata.* to hive@localhost identified by' hive';ALTER DATABASE hive_metadata CHARACTER SET latin1
Manually upload mysql's jdbc library to hive/lib
~ ls / home/cos/toolkit/hive-0.9.0/libmysql-connector-java-5.1.22-bin.jar
Start hive
# start metastore service ~ bin/hive-- service metastore & Starting Hive Metastore Server# start hiveserver service ~ bin/hive-- service hiveserver & Starting Hive Thrift Server# start hive client ~ bin/hive shellLogging initialized using configuration in file:/root/hive-0.9.0/conf/hive-log4j.propertiesHive history file=/tmp/root/hive_job_log_root_201211141845_1864939641.txthive > show tablesOK
Query metadata in MySQL database
~ mysql-uroot-pmysql > use hive_metadata;Database changedmysql > show tables +-+ | Tables_in_hive_metadata | +-+ | BUCKETING_COLS | | CDS | | COLUMNS_V2 | | DATABASE_PARAMS | | DBS | | IDXS | | INDEX_PARAMS | | PARTITIONS | | PARTITION_KEYS | | PARTITION_KEY_VALS | | PARTITION_PARAMS | | PART_COL_PRIVS | | PART_PRIVS | | SDS | | SD_PARAMS | | SEQUENCE_TABLE | | SERDES | SERDE_PARAMS | | SORT_COLS | | | TABLE_PARAMS | | TBLS | | TBL_COL_PRIVS | | TBL_PRIVS | +-+ 23 rows in set (0.00 sec) |
Hive has been successfully installed. Here is a guide to using hive.
2. Basic use of Hive
1. Enter the hive console
~ cd / home/cos/toolkit/hive-0.9.0~ bin/hive shellLogging initialized using configuration in file:/home/cos/toolkit/hive-0.9.0/conf/hive-log4j.propertiesHive history file=/tmp/cos/hive_job_log_cos_201307160003_95040367.txthive >
New table
# create data (text is separated by tab) ~ vi / home/cos/demo/t_hive.txt16 2 361 12 1341 2 3117 21 371 2 311 12 3411 2 3 create a new table hive > CREATE TABLE t_hive (an int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY'\ t' OKTime taken: 0.489 seconds# Import data t_hive.txt into t_hive table hive > LOAD DATA LOCAL INPATH'/ home/cos/demo/t_hive.txt' OVERWRITE INTO TABLE t_hive; Copying data from file:/home/cos/demo/t_hive.txtCopying file: file:/home/cos/demo/t_hive.txtLoading data to table default.t_hiveDeleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hiveOKTime taken: 0.397 seconds
View tables and data
# View table hive > show tables;OKt_hiveTime taken: 0.099 seconds# canonical match table name hive > show tables'* tweets: OKTHIVETime taken: 0.065 seconds# view table data hive > select * from THIVITOR OK16 2361 12 1341 2 3117 21 371 2 311 12 3411 2 34Time taken: 0.264 seconds# view table structure hive > desc t_hive OKa intb intc intTime taken: 0.1 seconds
Modify the table
# add a field hive > ALTER TABLE t_hive ADD COLUMNS (new_col String); OKTime taken: 0.186 secondshive > desc tactile OKa intb intc intnew_col stringTime taken: 0.086 seconds# re-command table name ~ ALTER TABLE t_hive RENAME TO tweak hadoopteric OKTime taken: 0.45 secondshive > show tables;OKt_hadoopTime taken: 0.07 seconds
Delete tabl
Hive > DROP TABLE tantalhadoopteric OkTime taken: 0.767 secondshive > show tables;OKTime taken: 0.064 seconds3. Hive interactive mode
Quit,exit: exit interactive shell
Reset: reset configuration to default
Set =: modify the value of a specific variable (if the variable name is misspelled, no error will be reported)
Set: outputs hive configuration variables overridden by the user
Set-v: outputs all configuration variables for Hadoop and Hive
Add FILE [S] *, add JAR [S] *, add ARCHIVE [S] *: add one or more file, jar, archives to the distributed cache
List FILE [S], list JAR [S], list ARCHIVE [S]: outputs resources that have been added to the distributed cache.
List FILE [S] *, list JAR [S] *, list ARCHIVE [S] *: check whether a given resource is added to the distributed cache
Delete FILE [S] *, delete JAR [S] *, delete ARCHIVE [S] *: delete the specified resource from the distributed cache
!:
Dfs: execute a dfs command from Hive shell
Execute a Hive query, and then output the results to standard output
Source FILE: execute a hive script file in CLI
4. Data import
Take the t_hive just now as an example.
# create table structure hive > CREATE TABLE t_hive (an int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY'\ t'
Load data from the operating local file system (LOCAL)
Hive > LOAD DATA LOCAL INPATH'/ home/cos/demo/t_hive.txt' OVERWRITE INTO TABLE t_hive Copying data from file:/home/cos/demo/t_hive.txtCopying file: file:/home/cos/demo/t_hive.txtLoading data to table default.t_hiveDeleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hiveOKTime taken: 0.612 seconds# find the data just imported in HDFS ~ hadoop fs-cat / user/hive/warehouse/t_hive/t_hive.txt16 2 361 12 1341 2 3117 21 371 2 311 12 3411 2 34
Load data from HDFS
Create the table t_hive2hive > CREATE TABLE t_hive2 (an int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY'\ tuploading # load data from HDFS hive > LOAD DATA INPATH'/ user/hive/warehouse/t_hive/t_hive.txt' OVERWRITE INTO TABLE tweehive2 loading data to table default.t_hive2Deleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hive2OKTime taken: 0.325 seconds# View data hive > select * from t_hive2 OK16 2 361 12 1341 2 3117 21 371 2 311 12 3411 2 34Time taken: 0.287 seconds
Import data from other tables
Hive > INSERT OVERWRITE TABLE t_hive2 SELECT * FROM t_hive Total MapReduce jobs = 2Launching Job 1 out of 2Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_201307131407_0002, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0002Kill Command = / home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job-Dmapred.job.tracker=hdfs://c1.wtmart.com:9001-kill job_201307131407_0002Hadoop job information for Stage-1: number of mappers: 1 Number of reducers: 02013-07-16 10 reduce 32 Stage-1 map = 0%, reduce = 0% 2013-07-16 10 reduce 32 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10 reduce 32 sec2013 49050 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10 sec2013 3250068 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10 51082 Stage-1 map = 100% Reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10 Stage-1 map 52093 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10 sec2013 32 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10 Stage-1 map 32 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.03 secMapReduce Total cumulative CPU time: 1 seconds 30 msecEnded Job = job_201307131407_0002Ended Job =-314818888 Job is filtered out (removed at runtime) .Moving data to: hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16010-32-31_323_5732404975764014154/-ext-10000Loading data to table default.t_hive2Deleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hive2Table default.t_hive2 stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 56 Raw_data_size: 0] 7 Rows loaded to t_hive2MapReduce Jobs Launched:Job 0: Map: 1 Cumulative CPU: 1.03 sec HDFS Read: 273 HDFS Write: 56 SUCCESSTotal MapReduce CPU Time Spent: 1 seconds 30 msecOKTime taken: 23.227 secondshive > select * from t_hive2 OK16 2 361 12 1341 2 3117 21 371 2 311 12 3411 2 34Time taken: 0.134 seconds
Create tables and import data from other tables
# Delete the table hive > DROP TABLE * FROM t_hive2 # create a table and import data hive > CREATE TABLE t_hive AS SELECT * from other tables Total MapReduce jobs = 2Launching Job 1 out of 2Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_201307131407_0003, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0003Kill Command = / home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job-Dmapred.job.tracker=hdfs://c1.wtmart.com:9001-kill job_201307131407_0003Hadoop job information for Stage-1: number of mappers: 1 Number of reducers: 02013-07-16 10 reduce 36 sec2013 4648 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10 V 36 V 55657 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10 R V 56666 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10 V V 57673 Stage-1 map = 100% Reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10 secMapReduce Total cumulative CPU time 36Stage-1 map 58683 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10 sec2013 36V 59691 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.13 secMapReduce Total cumulative CPU time: 1 seconds 130 msecEnded Job = job_201307131407_0003Ended Job =-670956236 Job is filtered out (removed at runtime) .Moving data to: hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16 October 10-36-39_986_1343249562812540343/-ext-10001Moving data to: hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hiveTable default.t_hive stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 56 Raw_data_size: 0] 7 Rows loaded to hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16 October 10-36-39_986_1343249562812540343/-ext-10000MapReduce Jobs Launched:Job 0: Map: 1 Cumulative CPU: 1.13 sec HDFS Read: 272 HDFS Write: 56 SUCCESSTotal MapReduce CPU Time Spent: 1 seconds 130 msecOKTime taken: 20.13 secondshive > select * from t_hive OK16 2 361 12 1341 2 3117 21 371 2 311 12 3411 2 34Time taken: 0.109 seconds
Only replicating table structure does not guide data
Hive > CREATE TABLE t_hive3 LIKE tincture hive > select * from tincture hive3 * Oktime taken: 0.077 seconds
Import data from MySQL database
We'll talk about it when we introduce Sqoop.
5. Data export
Copy from HDFS to other HDFS locations
~ hadoop fs-cp / user/hive/warehouse/t_hive / ~ hadoop fs-ls / t_hiveFound 1 items-rw-r--r-- 1 cos supergroup 56 2013-07-16 10:41 / hadoop fs-cat / t_hive/000000_016236112134123117213712311123411234
Export to the local file system through Hive
Hive > INSERT OVERWRITE LOCAL DIRECTORY'/ tmp/t_hive' SELECT * FROM t_hive Total MapReduce jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_201307131407_0005, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0005Kill Command = / home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job-Dmapred.job.tracker=hdfs://c1.wtmart.com:9001-kill job_201307131407_0005Hadoop job information for Stage-1: number of mappers: 1 Number of reducers: 02013-07-16 10 reduce 46 sec2013 24774 Stage-1 map = 0%, reduce = 0% 2013-07-16 10 10 V 46 V 30823 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10 V 46 R R 31833 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10 V 46 R R 32844 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10 R 46 R 33856 Stage-1 map = 100% Reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10 sec2013 46 Stage-1 map 34865 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10 sec2013 46 reduce 35873 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87710 sec2013-07-16 10Switzerland 34884 Stage-1 map = 100%, reduce = 100% Cumulative CPU 0.87 secMapReduce Total cumulative CPU time: 870 msecEnded Job = job_201307131407_0005Copying data to local directory / tmp/t_hiveCopying data to local directory / tmp/t_hive7 Rows loaded to / tmp/t_hiveMapReduce Jobs Launched:Job 0: Map: 1 Cumulative CPU: 0.87sec HDFS Read: 271 HDFS Write: 56 SUCCESSTotal MapReduce CPU Time Spent: 870 msecOKTime taken: 23.369 seconds# View the local operating system hive >! Cat / tmp/t_hive/000000_0;hive > 162361121341231172137123111234112346. Hive query HiveQL
Note: the following code removes the log output from map,reduce.
Normal queries: sorting, column aliases, nested subqueries
Hive > FROM (> SELECT bmai c as c2 FROM t_hive >) t > SELECT t.b, t.c2 > WHERE b > 2 > LIMIT 2 Ten 12 1321 3
Join query: JOIN
Hive > SELECT t1.a JOIN t_hive2 t 1.b 2.a JOIN t_hive2 t 2.b > WHERE T1 JOIN t_hive2 t 2 on t1.a=t2.a > WHERE t 1.c > 10 *
Aggregate query 1:count, avg
Hive > SELECT count (*), avg (a) FROM tweets 7 31.142857142857142
Aggregate query 2:count, distinct
Hive > SELECT count (DISTINCT b) FROM tours
Aggregate query 3:GROUP BY, HAVING
# GROUP BYhive > SELECT avg (a), bjournal sum (c) FROM t_hive GROUP BY c16.02 356.0 2 6211.0 2 3461.0 12 131.0 12 3417.0 21 3#HAVINGhive > SELECT avg (a), bheroic sum (c) FROM t_hive GROUP BY bpentry c HAVING sum (c) > 3056.0 2 6211.0 2 341.0 12347. Hive view
The concept of the Hive view is the same as the database view, and we also take t_hive as an example.
Hive > CREATE VIEW v_hive AS SELECT a FROM t_hive where b FROM t_hive where c > 30 position hive > select * from venerable hiveterter41 271 21 1211 2
Delete View
Hive > DROP VIEW IF EXISTS vastly hiveable OkTime taken: 0.495 seconds8. Hive partition table
Partitioned table is the basic concept of database, but in many cases, the amount of data is small, and we don't need partitioned table at all. Hive is a kind of OLAP data warehouse software, the amount of data involved is very large, so the partition table is very important in this scenario!
Let's redefine a data table structure: t_hft
Create data
~ vi / home/cos/demo/t_hft_20130627.csv000001,092023,9.76000002,091947,8.99000004,092002,9.79000005,091514,2.2000001,092008,9.70000001,092059,9.45~ vi / home/cos/demo/t_hft_20130628.csv000001,092023,9.76000002,091947,8.99000004,092002,9.79000005,091514,2.2000001,092008,9.70000001,092059,9.45
Create a datasheet
DROP TABLE IF EXISTS tweak hftterCreate TABLE t_hft (SecurityID STRING,tradeTime STRING,PreClosePx DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY','
Create a partition data table
Partition design according to business: by day and stock ID
DROP TABLE IF EXISTS tweehftterCreate TABLE t_hft (SecurityID STRING,tradeTime STRING,PreClosePx DOUBLE) PARTITIONED BY (tradeDate INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY','
Import data
# 20130627hive > LOAD DATA LOCAL INPATH'/ home/cos/demo/t_hft_20130627.csv' OVERWRITE INTO TABLE t_hft PARTITION (tradeDate=20130627); Copying data from file:/home/cos/demo/t_hft_20130627.csvCopying file: file:/home/cos/demo/t_hft_20130627.csvLoading data to table default.t_hft partition (tradedate=20130627) # 20130628hive > LOAD DATA LOCAL INPATH'/ home/cos/demo/t_hft_20130628.csv' OVERWRITE INTO TABLE t_hft PARTITION (tradeDate=20130628) Copying data from file:/home/cos/demo/t_hft_20130628.csvCopying file: file:/home/cos/demo/t_hft_20130628.csvLoading data to table default.t_hft partition (tradedate=20130628)
View partition table
Hive > SHOW PARTITIONS tasking hftten tradedatebooks 20130627 tradedates 20130628 time taken: 0.082 seconds
Query data
Hive > select * from t_hft where securityid='000001';000001 092023 9.76 20130627000001 092008 9.7 20130627000001 092059 9.45 20130627000001 092023 9.76 20130628000001 092008 9.7 20130628000001 092059 9.45 20130628hive > select * from t_hft where tradedate=20130627 and PreClosePx
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.