What is the Hive architecture? 07/02 Update SLTechnology News&Howtos

What is the Hive architecture?

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article is about what the Hive architecture is like. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

Part one: concept

Concept

User interface: the entrance for users to access Hive

Metadata: user information of Hive and MetaData of tables

Interpreter: analyze and translate the components of HQL

Compilers: compiling components of HQL

Optimizer: optimizing the components of HQL

Part II: Hive architecture and basic composition

Architecture diagram

Basic composition

User interface, including CLI,JDBC/ODBC,WebUI

Metadata storage, usually stored in relational databases such as mysql, derby

Interpreter, compiler, optimizer, executor

Hadoop: use HDFS for storage and MapReduce for calculation

Basic functions of each component

There are three main user interfaces: CLI,JDBC/ODBC and WebUI

CLI, the Shell command line

JDBC/ODBC is the JAVA of Hive, similar to the way you use traditional database JDBC

WebGUI accesses Hive through a browser

Hive stores metadata in databases. Currently, only mysql and derby are supported, and more databases will be supported in the next release. The metadata in Hive includes the name of the table, the columns and partitions of the table and its attributes, the attributes of the table (whether it is an external table, etc.), the directory where the data of the table is located, and so on.

The interpreter, compiler and optimizer complete the HQL query sentence from lexical analysis, syntax analysis, compilation, optimization and query plan generation. The generated query plan is stored in HDFS and subsequently executed by a MapReduce call

Hive's data is stored in HDFS, and most queries are done by MapReduce (queries that include *, such as select * from table do not generate MapRedcue tasks)

Metastore

Metastore is the system catalog (catalog) used to hold metadata (metadata) information for tables stored in Hive.

Metastore is a feature that distinguishes Hive from other similar systems when it is used as a traditional database solution such as oracle and db2

The Metastore contains the following sections:

Database is the namespace of the table. The default database (database) is named 'default'

The original data of the Table table (table) contains information: columns (list of columns) and their type (types), owner (owner), storage space (storage) and SerDei information

Partition each partition has its own column (columns), SerDe, and storage space (storage). This feature will be used to support schema evolution (schema evolution) in Hive

Compiler

Driver calls the compiler (compiler) to process HiveQL strings, which may be a DDL, DML, or query statement

The compiler converts strings to policies (plan)

The policy consists only of metadata operations and HDFS operations, metadata operations contain only DDL statements, and HDFS operations contain only LOAD statements

For inserts and queries, the policy consists of directed acyclic graphs (directedacyclic graph,DAG) in map-reduce tasks

Part III: Hive operation mode

Hive operation mode

The running mode of Hive is the execution environment of the task.

It can be divided into local and cluster types.

We can specify through mapred.job.tracker

Setting mode

Hive > SET mapred.job.tracker=local

Part IV: data types

Original data type

Integers

TINYINT-1 byte

SMALLINT-2 byte

INT-4 byte

BIGINT-8 byte

Boolean type

BOOLEAN-TRUE/FALSE

Floating point numbers

FLOAT-single precision

DOUBLE-double precision

String type

STRING-sequence of characters in a specified character set

Complex data type

Structs: example {c INT; d INT}

Maps (key-value tuples):. Example 'group'-> gid M [' group']

Arrays (indexable lists): example ['1','2','3']

New attributes added in TIMESTAMP version 0.8

Part V: metadata storage of Hive

Storage mode and mode

Hive stores metadata in a database

There are three ways to connect to a database schema

Single user mode

Multi-user mode

Remote server mode

Single user mode

This mode connects to an In-memory database Derby, which is generally used for Unit Test

Multi-user mode

Connecting to a database over a network is the most frequently used mode

Remote server mode

For non-Java clients to access the Metabase, start MetaStoreServer on the server side, and the client uses the Thrift protocol to access the Metabase through MetaStoreServer

Part VI: data storage of Hive

Basic concepts of Hive data storage

Hive's data storage is based on Hadoop HDFS.

Hive does not have a special data storage format

The storage structure mainly includes: database, file, table and view.

Hive can load text files directly by default, and also supports sequence file and RCFile.

When creating a table, we directly tell Hive the column and row delimiters of the data, and Hive can parse the data.

Data Model of Hive-Database

DataBase similar to traditional database

It's actually a table in a third-party database.

Simple exampl

Command line hive > create database test_database

Data Model of Hive-Table

Table internal table

Partition partition table

External Table external table

Bucket Table

Internal table

Similar in concept to Table in the database

Each Table has a corresponding directory to store data in the Hive

For example, a table test whose path in HDFS is: / warehouse / test

Warehouse is the directory of the data warehouse specified by ${hive.metastore.warehouse.dir} in hive-site.xml

All Table data (excluding External Table) is stored in this directory.

When you delete a table, both metadata and data are deleted

Simple example of internal table

Create a data file test_inner_table.txt

Create a table

Create table test_inner_table (key string)

Load data

LOAD DATA LOCAL INPATH 'filepath' INTO TABLE test_inner_table

View data

Select * from test_inner_table

Select count (*) from test_inner_table

Delete table drop table test_inner_table

Partition table

Partition corresponds to a dense index of Partition columns in the database

In Hive, a Partition in a table corresponds to a directory under the table, and all Partition data is stored in the corresponding directory

For example, if the test table contains two Partition, date and position, the HDFS subdirectory corresponding to date=20120801 and position = zh is: / warehouse / test/date=20120801/ position = zh

The HDFS subdirectory corresponding to = 20100801, position = US is; / warehouse / xiaojun/date=20120801/ position = US

Simple example of a partition table

Create a data file test_partition_table.txt

Create a table

Create table test_partition_table (key string) partitioned by (dt string)

Load data

LOAD DATA INPATH 'filepath' INTO TABLE test_partition_table partition (dt='2006')

View data

Select * from test_partition_table

Select count (*) from test_partition_table

Delete table drop table test_partition_table

External table

To point to data that already exists in HDFS, you can create a Partition

It is the same as the internal table in the organization of metadata, but the storage of actual data is quite different.

During the process of creating the internal table and loading the data (both of which can be done in the same statement), the actual data is moved to the data warehouse directory during the loading process; after that, the access to the data pair will be completed directly in the data warehouse directory. When you delete a table, the data and metadata in the table will be deleted at the same time

There is only one process for an external table, which loads the data and creates the table at the same time, and does not move to the data warehouse directory, but simply establishes a link with the external data. When you delete an external table, only the link is deleted

Simple example of an external table

Create a data file test_external_table.txt

Create a table

Create external table test_external_table (key string)

Load data

LOAD DATA INPATH 'filepath' INTO TABLE test_inner_table

View data

Select * from test_external_table

Select count (*) from test_external_table

Delete table drop table test_external_table

Bucket Table

The columns of the table can be further decomposed into different file stores through the Hash algorithm.

For example, if the age column is divided into 20 files, the first step is to Hash the AGE, corresponding to the write / warehouse/test/date=20120801/postion=zh/part-00000 of 0 and the write / warehouse/test/date=20120801/postion=zh/part-00001 of 1

If you want to apply a lot of Map tasks, this is a good choice.

Simple example of Bucket Table

Create a data file test_bucket_table.txt

Create a table

Create table test_bucket_table (key string)

Clustered by (key) into 20 buckets

Load data

LOAD DATA INPATH 'filepath' INTO TABLE test_bucket_table

View data

Select * from test_bucket_table

Set hive.enforce.bucketing = true

Data Model of Hive-View

The view is similar to that of a traditional database

The view is read-only

The basic table on which the view is based, if changed, means that the increase will not affect the rendering of the view; if deleted, there will be a problem

If you do not specify the columns for the view, it will be generated according to the select statement

Exampl

Create view test_view as select * from test

Part VII: introduction to HiveUI

Start UI

Configuration

Hive-site.xml add

Hive.hwi.war.file

Lib/hive-hwi-0.8.1.war

Start UI sh $HIVE_HOME/bin/hive of Hive-- service hwi

Thank you for reading! This is the end of this article on "what the Hive architecture is like". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.