The Construction and basic usage of Hive Environment 07/15 Update SLTechnology News&Howtos

The Construction and basic usage of Hive Environment

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article focuses on "the construction and basic usage of Hive environment". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "the construction and basic usage of Hive environment".

A brief introduction to the basics of Hive

1. Basic description

Hive is a data warehouse tool based on Hadoop, which is used to extract, transform and load data. It is a component that can query, analyze and store large-scale data stored in Hadoop. Hive data warehouse tool can map structured data files to a database table and provide SQL query function. It can transform SQL statements into MapReduce tasks to execute at low cost. Fast MapReduce statistics can be achieved through SQL-like statements, making MapReduce easier without having to develop specialized MapReduce applications. Hive is very suitable for statistical analysis of data warehouse.

2. Composition and structure

User interface: ClientCLI, JDBC access Hive, WEBUI browser access Hive.

Metadata: Hive stores metadata in databases such as mysql, derby. The metadata in Hive includes the name of the table, the columns and partitions of the table and its attributes, the attributes of the table (whether it is an external table, etc.), the directory where the data of the table is located, and so on.

Driver: based on interpreter, editor and optimizer, HQL query statements are generated from lexical analysis, syntax analysis, compilation, optimization and query plan.

Actuator engine: ExecutionEngine converts a logical execution plan into a runnable physical plan.

Bottom layer of Hadoop: storage based on HDFS, calculation using MapReduce, scheduling mechanism based on Yarn.

Hive receives the interactive request sent to the client, receives the operation instruction (SQL), translates the instruction into MapReduce, submits it to Hadoop for execution, and finally outputs the execution result to the client.

II. Installation of Hive environment

1. Prepare to install the package

Hive-1.2, which depends on the Hadoop cluster environment, is located on the hop01 service.

2. Decompress and rename

Tar-zxvf apache-hive-1.2.1-bin.tar.gzmv apache-hive-1.2.1-bin/ hive1.2

3. Modify the configuration file

Create a profile

[root@hop01 conf] # pwd/opt/hive1.2/conf [root@hop01 conf] # mv hive-env.sh.template hive-env.sh

Add content

[root@hop01 conf] # vim hive-env.shexport HADOOP_HOME=/opt/hadoop2.7export HIVE_CONF_DIR=/opt/hive1.2/conf

One of the configuration contents is the Hadoop path and the hive profile path.

4. Hadoop configuration

Start hdfs and yarn; first, then create / tmp and / user/hive/warehouse directories on HDFS and modify the granted permissions.

Bin/hadoop fs-mkdir / tmpbin/hadoop fs-mkdir-p / user/hive/warehousebin/hadoop fs-chmod Grouw / tmpbin/hadoop fs-chmod Grouw / user/hive/warehouse

5. Start Hive

[root@hop01 hive1.2] # bin/hive

6. Basic operation

View the database

Hive > show databases

Select a database

Hive > use default

View the data sheet

Hive > show tables

Create a database to use

Hive > create database mytestdb;hive > show databases; defaultmytestdbhive > use mytestdb

Create a tabl

Create table hv_user (id int, name string, age int)

View table structure

Hive > desc hv_user;id int name string age int

Add table data

Insert into hv_user values (1, "test-user", 23)

Query table data

Hive > select * from hv_user

Note: through the observation of the query log here, it is clear that the process executed by Hive.

Delete tabl

Hive > drop table hv_user

Exit Hive

Hive > quit

View the Hadoop directory

# hadoop fs-ls / user/hive/warehouse/ user/hive/warehouse/mytestdb.db

Databases and data created through Hive are stored on HDFS.

Third, integrate the MySQL5.7 environment

Here, the version of MySQL5.7 is installed by default, and the relevant login accounts are configured, and the Host of root users is configured in% mode.

1. Upload MySQL driver package

Upload the MySQL driver dependency package to the lib directory of the hive installation directory.

[root@hop01 lib] # pwd/opt/hive1.2/lib [root@hop01 lib] # llmysql-connector-java-5.1.27-bin.jar

2. Create hive-site configuration

[root@hop01 conf] # pwd/opt/hive1.2/conf [root@hop01 conf] # touch hive-site.xml [root@hop01 conf] # vim hive-site.xml

3. Configure MySQL storage

Javax.jdo.option.ConnectionURL jdbc:mysql://hop01:3306/metastore?createDatabaseIfNotExist=true JDBC connect string for a JDBC metastore javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver Driver class name for a JDBC metastore javax.jdo.option.ConnectionUserName root username to use against metastore database Javax.jdo.option.ConnectionPassword 123456 password to use against metastore database

After the configuration is completed, restart the MySQL, hadoop, hive environment in turn, view the MySQL database information, and add the metastore database and related tables.

4. Start hiveserver2 in the background

[root@hop01 hive1.2] # bin/hiveserver2 &

5. Jdbc connection test

[root@hop01 hive1.2] # bin/beelineBeeline version 1.2.1 by Apache Hivebeeline >! connect jdbc:hive2://hop01:10000Connecting to jdbc:hive2://hop01:10000Enter username for jdbc:hive2://hop01:10000: hiveroot (account enter) Enter password for jdbc:hive2://hop01:10000: * (password 123456 enter) Connected to: Apache Hive (version 1.2.1) Driver: Hive JDBC (version 1.2.1) 0: jdbc:hive2://hop01:10000 > show databases +-+-- + | database_name | +-+-- + | default | +-+-- + 4. Advanced query syntax

1. Basic function

Select count (*) count_user from hv_user;select sum (age) sum_age from hv_user;select min (age) min_age,max (age) max_age from hv_user +-- + | min_age | max_age | +-- + | 23 | 25 | +-+

2. Conditional query statement

Select * from hv_user where name='test-user' limit 1 +-- + | hv_user.id | hv_user.name | hv_user.age | +-+ | 1 | | test-user | 23 | +-- + select * from hv_user where id > 1 AND name like 'dev%' | +-- + | hv_user.id | hv_user.name | hv_user.age | +-+ | 2 | | dev-user | 25 | +-+ select count (*) count_name | Name from hv_user group by name +-- + | count_name | name | +-- + | 1 | dev-user | | 1 | test-user | +-+

3. Join query

Select T1 join hv_dept T2 on t1.id=t2.dp_id 2.* from hv_user T1 join hv_dept T2 +-+-+ | t1.id | t1.name | t1.age | t2.dp_id | t2.dp_name | +- -+ | 1 | test-user | 23 | 1 | Technical Department | +-+-- + 5. Source code address GitHub address https://github.com/cicadasmile/big-data-parentGitEE address https://gitee.com/cicadasmile/big-data-parent to this I believe that you have a deeper understanding of "the construction and basic use of the Hive environment". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.