Installation and configuration of 2018-08-26 Hive 07/09 Update SLTechnology News&Howtos

Installation and configuration of 2018-08-26 Hive

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Description:

Hive is a set of data warehouse analysis system based on Hadoop. It provides rich SQL query methods to analyze data stored in Hadoop distributed file system. It can map structured data files to a database table, and provide complete SQL query functions. It can convert SQL statements into MapReduce tasks to run, and query and analyze the content through its own SQL. This set of SQL is referred to as Hive SQL for short. It is very convenient for users who are not familiar with mapreduce to use SQL language to query, summarize and analyze data. And mapreduce developers can put while mapreduce developers can put

Mapper and reducer have been written as plug-ins to support Hive to do more complex data analysis. It is slightly different from the SQL of relational database, but supports the vast majority of statements such as DDL, DML and common aggregate functions, join queries, conditional queries. HIVE is not suitable for online online transaction processing, nor does it provide real-time query capabilities. It is most suitable for batch jobs based on a large amount of immutable data. HIVE features: scalable (dynamically add devices on the Hadoop cluster), expandable, fault-tolerant, loosely coupled input format.

1. Upload installation package

Package name: hive-0.12.0-bin.tar.gz

2. Decompress the installation package

[root@hadoop-server01 ~] # ll hive-0.12.0-bin.tar.gz

-rw-r--r--. 1 root root 65662469 Jul 8 18:27 hive-0.12.0-bin.tar.gz

[root@hadoop-server01] # tar-xvf hive-0.12.0-bin.tar.gz-C / usr/local/apps/

Modify the configuration file

[root@hadoop-server01 hive-0.12.0-bin] # pwd

/ usr/local/apps/hive-0.12.0-bin

[root@hadoop-server01 hive-0.12.0-bin] # ll

Total 220

Drwxr-xr-x. 3 root root 4096 Oct 9 2013 bin

Drwxr-xr-x. 2 root root 4096 Jul 8 18:31 conf

Drwxr-xr-x. 4 root root 4096 Oct 9 2013 examples

Drwxr-xr-x. 7 root root 4096 Oct 9 2013 hcatalog

Drwxr-xr-x. 4 root root 4096 Jul 8 18:31 lib

-rw-rw-r--. 1 root root 23828 Oct 9 2013 LICENSE

-rw-rw-r--. 1 root root 1559 Oct 9 2013 NOTICE

-rw-rw-r--. 1 root root 3832 Oct 9 2013 README.txt

-rw-rw-r--. 1 root root 166087 Oct 9 2013 RELEASE_NOTES.txt

Drwxr-xr-x. 3 root root 4096 Oct 9 2013 scripts

Hive is a tool that can be used directly without modifying any configuration. Here, the metadata uses the derby database that comes with hive by default, but there are many disadvantages, which will be discussed later.

Fourth, configure environment variables

Configure / usr/local/apps/hive-0.12.0-bin/bin to the environment variable

[root@hadoop-server01 bin] # vi / etc/profile

Export HIVE_HOME=/usr/local/apps/hive-0.12.0-bin

Export PATH=$PATH:$ZK_HOME/bin:$HBASE_HOME/bin:$HIVE_HOME/bin

Save exit

Effective configuration:

[root@hadoop-server01 bin] # source / etc/profile

Run client-side hive script verification

Before running the hive client tool, you need to start the hadoop cluster, because hive is just a SQL translator, and the translated SQL is converted into a MapReduce program, so you need to use MapReduce to run the framework.

Therefore, you need to start the hadoop cluster before using the hive tool

1. Start hdfs and yarn

[root@hadoop-server01 bin] # start-dfs.sh

[root@hadoop-server01 bin] # start-yarn.sh

[root@hadoop-server01 bin] # jps

2854 SecondaryNameNode

3277 NodeManager

2997 ResourceManager

2704 DataNode

2584 NameNode

2. Start the client hive tool

[root@hadoop-server01 bin] # cd / usr/local/apps/hive-0.12.0-bin/bin

[root@hadoop-server01 bin] #. / hive

Hive >

Or execute the hive command directly, because the environment variable is already configured

3. Verification test

Assume that there is a file with the following class data, and that the file name is user.txt

Userid username orgid logintimes

U0001 Zhangsan G0001 10

U0002 Lisi G0001 12

U0003 Wangwu G0002 13

U0004 Liuneng G0002 18

U0005 Zhaosi G0004 29

The data has been divided by tabs\ t

Now, aiming at this kind of file data structure, use hive to create a hive table to store this kind of data.

Hive > create table t_user

> (userid string,username string,orgid string,logintimes int)

> row format delimited

> fields terminated by'\ t'

Time taken: 0.689 seconds

Description: 1 the field types in hive are very different from those in relational databases, and the data types in hive are basically the same as those in java.

2 row format delimited fields terminated by'\ t'- is used to set the column delimiter supported by the created table when loading data, here using the tab character\ t

Here you have completed the installation and configuration of hive and table testing.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.