Deployment and use of Hive1.2 based on Hadoop data Warehouse 07/02 Update SLTechnology News&Howtos

Deployment and use of Hive1.2 based on Hadoop data Warehouse

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

The following is based on the previous Hadoop2.6 cluster deployment: http://lizhenliang.blog.51cto.com/7876557/1661354

Next, install Hadoop data Warehouse Hive. In the previous section, we learned about the simple use of HBase. It sounds like HBase is similar to Hive, and the concept is a little vague. Let's first understand the difference between them:

HBase is a distributed, column-oriented NoSQL database, which stores data in the form of a table based on HDFS storage. The table is composed of rows and columns, and the columns are divided into column families. HBase does not provide a SQL-like query language. If you want to query data like SQL, you can use Phonix to convert SQL queries into hbase scans and corresponding operations, or you can use the Hive warehouse tool to store HBase as Hive.

Hive is a data warehouse running on Hadoop, which maps structured data files to a database table, provides a simple SQL-like query language, called HQL, and converts SQL statements into MapReduce task operations. It is advantageous to use SQL language to query and analyze data, and is suitable for dealing with data that does not change frequently. The underlying Hive can be files stored by HBase or HDFS.

Both of them are based on different technologies on Hadoop and can handle different types of business in enterprises. Hive is used to deal with unstructured offline analysis statistics and HBase is used to deal with online queries.

There are three ways to store metadata in Hive:

1 >. Local derby storage, which only allows one user to connect to Hive, suitable for test environment

2 >. Local / remote MySQL storage, supports multi-user connection to Hive, suitable for production environment

3. Hive installation and configuration (storing metadata to remote MySQL configuration below)

1. Create Hive metadata repositories and connect users in MySQL

Mysql > create database hive;mysql > grant all on *. * to'hive'@'%' identified by 'hive';mysql > flush privileges

two。 Install and configure Hive (installed in HMaster0)

# tar zxvf apache-hive-1.2.0-bin.tar.gz# mv apache-hive-1.2.0-bin / opt# vi hive-site.xml javax.jdo.option.ConnectionURL jdbc:mysql://192.168.18.210:3306/hive?createDatabaseIfNotExist=true javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver javax.jdo.option.ConnectionUserName hive_user Javax.jdo.option.ConnectionPassword hive_pass

3. Configure system variabl

# vi / etc/profileHIVE_HOME=/opt/apache-hive-1.2.0-binPATH=$PATH:$HIVE_HOME/binexport HIVE_HOME PATH# source / etc/profile

4. Start Hive

# hive-- service metastore & # start remote mode, otherwise you can only log in locally

5. Check whether it starts properly.

Check to see if the process starts:

[root@HMaster0 ~] # jps2615 DFSZKFailoverController30027 ResourceManager29656 NameNode25451 Jps10270 HMaster14975 RunJar # will start a RunJar process

Executing the hive command enters the command interface:

[root@HMaster0 ~] # hiveLogging initialized usingconfiguration in file:/opt/apache-hive-1.2.0-bin/conf/hive-log4j.propertieshive > show databases;OKdefaultTime taken: 0.986 seconds,Fetched: 1 row (s)

Check the database, there is a default library by default, and you can now use the SQL language you are familiar with.

6. Client connection Hive (must have a Hadoop environment)

# tar zxvf apache-hive-1.2.0-bin.tar.gz# mv apache-hive-1.2.0-bin / opt# vi hive-site.xml hive.metastore.uris thrift://192.168.18.215:9083

Configure the connection information and connect to the command line:

# / opt/apache-hive-1.2.0-bin/bin/hive

7.Hive common SQL commands

7.1 create a test library first

Hive > create database test

7.2.Create the tb1 table and specify the field delimiter as the tab key (otherwise NULL will be inserted)

Hive > create table tb1 (id int,name string) row format delimited fields terminated by'\ t'

If you want to create another table and the table structure is the same as tb1, you can do this:

Hive > create table tb3 like tb1

View the structure of the following table:

Hive > describe tb3; OK id int name string Time taken: 0.091 seconds, Fetched: 2 row (s)

7.3 Import data from a local file into the Hive table

First create a data file with a tab key space for the key value:

# cat kv.txt 1 zhangsan 2 lisi 3 wangwu

Then import the data:

Hive > load data local inpath'/root/kv.txt' overwrite into table tb1

7.4 Import data from HDFS into Hive table

# hadoop fs-cat / kv.txt # View the data to be imported in hdfs 1 zhangsan 2 lisi 3 wangwu hive > load data inpath'/ kv.txt'overwrite into table tb1

7.5 query whether the import is successful

Hive > select * from tb1; OK 1 zhangsan 2 lisi 3 wangwu Time taken: 0.209 seconds,Fetched: 3 row (s)

Blog address: http://lizhenliang.blog.51cto.com

The above is a simple operation of the basic table. In order to improve processing performance, Hive introduces a partitioning mechanism, so we understand the concept of partitioned table:

1 >. A partition table is the partition space specified when the table is created

2 >. A table can have one or more partitions, which means to divide the data into blocks

3 >. The partition is in the form of fields in the table structure and does not store the actual data content.

Advantages of partitioned table: allocate the data in the table to different partitions according to conditions, narrow the scope of query, and improve retrieval speed and processing performance.

Single partition table:

7.6 create a single partition table tb2 (there is only one level of directory under the HDFS table directory):

Hive > create table tb2 (idint,name string) partitioned by (dt string) row format delimited fieldsterminated by'\ t'

Note: dt can be understood as the partition name.

7.7 Import data from the file into the Hive partition table and define the partition information

Hive > load data local inpath'/ root/kv.txt' into table tb2 partition (dt='2015-06-26'); hive > load data local inpath'/ root/kv.txt' into table tb2 partition (dt='2015-06-27')

7.8 View table data

Hive > select * from tb2; OK 1 zhangsan 2015-06-26 2 lisi 2015-06-26 3 wangwu 2015-06-26 1 zhangsan 2015-06-27 2 lisi 2015-06-27 3 wangwu 2015-06-27 Time taken: 0.223 seconds,Fetched: 6 row (s)

7.9 View table catalog changes in HDFS warehouse

# hadoop fs-ls-R / user/hive/warehouse/test.db/tb2 drwxr-xr-x-root supergroup 0 2015-06-26 04:12/user/hive/warehouse/test.db/tb2/dt=2015-06-26-rwxr-xr-x 3 root supergroup 27 2015-06-26 04:12/user/hive/warehouse/test.db/tb2/dt=2015-06-26/kv.txt drwxr-xr-x-root supergroup 0 2015- 06-26 04:15/user/hive/warehouse/test.db/tb2/dt=2015-06-27-rwxr-xr-x 3 root supergroup 27 2015-06-26 04:15/user/hive/warehouse/test.db/tb2/dt=2015-06-27/kv.txt

You can see that the data imported by the tb2 table divides the data into different directories based on the date.

Multi-partition table:

7.10 create a multi-partition table tb3 (there is a first-level directory under the HDFS table directory and a child directory under the first-level directory)

Hive > create table tb3 (idint,name string) partitioned by (dt string,location string) row formatdelimited fields terminated by'\ t'

7.11 Import data from a file into the Hive partition table and define partition information

Hive > load data local inpath'/ root/kv.txt' into table tb3 partition (dt='2015-06-26); hive > load data local inpath'/ root/kv.txt' into table tb3 partition (dt='2015-06-27)

7.12 View table data

Hive > select * from tb3; OK 1 zhangsan 2015-06-26 beijing 2 lisi 2015-06-26 beijing 3 wangwu 2015-06-26 beijing 1 zhangsan 2015-06-26 shanghai 2 lisi 2015-06-26 shanghai 3 wangwu 2015-06-26 shanghai Time taken: 0.208 seconds,Fetched: 6 row (s)

7.13 View table catalog changes in HDFS repository

# hadoop fs-ls-R / user/hive/warehouse/test.db/tb3

Drwxr-xr-x-root supergroup 0 2015-06-26 04:35/user/hive/warehouse/test.db/tb3/dt=2015-06-26

Drwxr-xr-x-root supergroup 0 2015-06-26 04:35 / user/hive/warehouse/test.db/tb3/dt=2015-06-26/location=beijing

-rwxr-xr-x 3 root supergroup 27 2015-06-26 04:35/user/hive/warehouse/test.db/tb3/dt=2015-06-26/location=beijing/kv.txt

Drwxr-xr-x-root supergroup 0 2015-06-26 04:45 / user/hive/warehouse/test.db/tb3/dt=2015-06-27

Drwxr-xr-x-root supergroup 0 2015-06-26 04:45/user/hive/warehouse/test.db/tb3/dt=2015-06-27/location=shanghai

-rwxr-xr-x 3 root supergroup 27 2015-06-26 04:45/user/hive/warehouse/test.db/tb3/dt=2015-06-27/location=shanghai/kv.txt

You can see that the first level of the dt partition directory in the table is divided into location partitions.

7.14 View table partition information

Hive > show partitions tb2

7.15 query data by partition

Hive > select name from tb3 where dt='2015-06-27'

7.16 rename Partition

Hive > alter table tb3 partition (dt='2015-06-27) rename to partition (dt='20150627',location='shanghai')

7.17 Delete a partition

Hive > alter table tb3 droppartition (dt='2015-06-26)

7.18 Fuzzy search table

Hive > show tables' tb*'

7.19 add a new column to the table

Hive > alter table tb1 addcolumns (commnet string)

7.20 rename table

Hive > alter table tb1 rename to new_tb1

7.21 Delete table

Hive > drop table new_tb1

8. An error was encountered during startup

Error 1:

[ERROR] Terminal initialization failed; falling back to unsupported

Java.lang.IncompatibleClassChangeError:Found class jline.Terminal, but interface was expected

As a solution, copy the jline package under hive/lib to hadoop/yarn/lib:

# cp / opt/apache-hive-1.2.0-bin/lib/jline-2.12.jar / opt/hadoop-2.6.0/share/hadoop/yarn/lib/# rm / opt/hadoop-2.6.0/share/hadoop/yarn/lib/jline-0.9.94.jar

Error 2:

Javax.jdo.JDOFatalInternalException:Error creating transactional connection factory

To solve the problem, download the java connection MySQL package on Baidu and put it under hive/lib:

# cp mysql-connector-java-5.1.10-bin.jar / opt/apache-hive-1.2.0-bin/lib

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.