In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article is about what the Hive architecture is like. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
Part one: concept
Concept
User interface: the entrance for users to access Hive
Metadata: user information of Hive and MetaData of tables
Interpreter: analyze and translate the components of HQL
Compilers: compiling components of HQL
Optimizer: optimizing the components of HQL
Part II: Hive architecture and basic composition
Architecture diagram
Basic composition
User interface, including CLI,JDBC/ODBC,WebUI
Metadata storage, usually stored in relational databases such as mysql, derby
Interpreter, compiler, optimizer, executor
Hadoop: use HDFS for storage and MapReduce for calculation
Basic functions of each component
There are three main user interfaces: CLI,JDBC/ODBC and WebUI
CLI, the Shell command line
JDBC/ODBC is the JAVA of Hive, similar to the way you use traditional database JDBC
WebGUI accesses Hive through a browser
Hive stores metadata in databases. Currently, only mysql and derby are supported, and more databases will be supported in the next release. The metadata in Hive includes the name of the table, the columns and partitions of the table and its attributes, the attributes of the table (whether it is an external table, etc.), the directory where the data of the table is located, and so on.
The interpreter, compiler and optimizer complete the HQL query sentence from lexical analysis, syntax analysis, compilation, optimization and query plan generation. The generated query plan is stored in HDFS and subsequently executed by a MapReduce call
Hive's data is stored in HDFS, and most queries are done by MapReduce (queries that include *, such as select * from table do not generate MapRedcue tasks)
Metastore
Metastore is the system catalog (catalog) used to hold metadata (metadata) information for tables stored in Hive.
Metastore is a feature that distinguishes Hive from other similar systems when it is used as a traditional database solution such as oracle and db2
The Metastore contains the following sections:
Database is the namespace of the table. The default database (database) is named 'default'
The original data of the Table table (table) contains information: columns (list of columns) and their type (types), owner (owner), storage space (storage) and SerDei information
Partition each partition has its own column (columns), SerDe, and storage space (storage). This feature will be used to support schema evolution (schema evolution) in Hive
Compiler
Driver calls the compiler (compiler) to process HiveQL strings, which may be a DDL, DML, or query statement
The compiler converts strings to policies (plan)
The policy consists only of metadata operations and HDFS operations, metadata operations contain only DDL statements, and HDFS operations contain only LOAD statements
For inserts and queries, the policy consists of directed acyclic graphs (directedacyclic graph,DAG) in map-reduce tasks
Part III: Hive operation mode
Hive operation mode
The running mode of Hive is the execution environment of the task.
It can be divided into local and cluster types.
We can specify through mapred.job.tracker
Setting mode
Hive > SET mapred.job.tracker=local
Part IV: data types
Original data type
Integers
TINYINT-1 byte
SMALLINT-2 byte
INT-4 byte
BIGINT-8 byte
Boolean type
BOOLEAN-TRUE/FALSE
Floating point numbers
FLOAT-single precision
DOUBLE-double precision
String type
STRING-sequence of characters in a specified character set
Complex data type
Structs: example {c INT; d INT}
Maps (key-value tuples):. Example 'group'-> gid M [' group']
Arrays (indexable lists): example ['1','2','3']
New attributes added in TIMESTAMP version 0.8
Part V: metadata storage of Hive
Storage mode and mode
Hive stores metadata in a database
There are three ways to connect to a database schema
Single user mode
Multi-user mode
Remote server mode
Single user mode
This mode connects to an In-memory database Derby, which is generally used for Unit Test
Multi-user mode
Connecting to a database over a network is the most frequently used mode
Remote server mode
For non-Java clients to access the Metabase, start MetaStoreServer on the server side, and the client uses the Thrift protocol to access the Metabase through MetaStoreServer
Part VI: data storage of Hive
Basic concepts of Hive data storage
Hive's data storage is based on Hadoop HDFS.
Hive does not have a special data storage format
The storage structure mainly includes: database, file, table and view.
Hive can load text files directly by default, and also supports sequence file and RCFile.
When creating a table, we directly tell Hive the column and row delimiters of the data, and Hive can parse the data.
Data Model of Hive-Database
DataBase similar to traditional database
It's actually a table in a third-party database.
Simple exampl
Command line hive > create database test_database
Data Model of Hive-Table
Table internal table
Partition partition table
External Table external table
Bucket Table
Internal table
Similar in concept to Table in the database
Each Table has a corresponding directory to store data in the Hive
For example, a table test whose path in HDFS is: / warehouse / test
Warehouse is the directory of the data warehouse specified by ${hive.metastore.warehouse.dir} in hive-site.xml
All Table data (excluding External Table) is stored in this directory.
When you delete a table, both metadata and data are deleted
Simple example of internal table
Create a data file test_inner_table.txt
Create a table
Create table test_inner_table (key string)
Load data
LOAD DATA LOCAL INPATH 'filepath' INTO TABLE test_inner_table
View data
Select * from test_inner_table
Select count (*) from test_inner_table
Delete table drop table test_inner_table
Partition table
Partition corresponds to a dense index of Partition columns in the database
In Hive, a Partition in a table corresponds to a directory under the table, and all Partition data is stored in the corresponding directory
For example, if the test table contains two Partition, date and position, the HDFS subdirectory corresponding to date=20120801 and position = zh is: / warehouse / test/date=20120801/ position = zh
The HDFS subdirectory corresponding to = 20100801, position = US is; / warehouse / xiaojun/date=20120801/ position = US
Simple example of a partition table
Create a data file test_partition_table.txt
Create a table
Create table test_partition_table (key string) partitioned by (dt string)
Load data
LOAD DATA INPATH 'filepath' INTO TABLE test_partition_table partition (dt='2006')
View data
Select * from test_partition_table
Select count (*) from test_partition_table
Delete table drop table test_partition_table
External table
To point to data that already exists in HDFS, you can create a Partition
It is the same as the internal table in the organization of metadata, but the storage of actual data is quite different.
During the process of creating the internal table and loading the data (both of which can be done in the same statement), the actual data is moved to the data warehouse directory during the loading process; after that, the access to the data pair will be completed directly in the data warehouse directory. When you delete a table, the data and metadata in the table will be deleted at the same time
There is only one process for an external table, which loads the data and creates the table at the same time, and does not move to the data warehouse directory, but simply establishes a link with the external data. When you delete an external table, only the link is deleted
Simple example of an external table
Create a data file test_external_table.txt
Create a table
Create external table test_external_table (key string)
Load data
LOAD DATA INPATH 'filepath' INTO TABLE test_inner_table
View data
Select * from test_external_table
Select count (*) from test_external_table
Delete table drop table test_external_table
Bucket Table
The columns of the table can be further decomposed into different file stores through the Hash algorithm.
For example, if the age column is divided into 20 files, the first step is to Hash the AGE, corresponding to the write / warehouse/test/date=20120801/postion=zh/part-00000 of 0 and the write / warehouse/test/date=20120801/postion=zh/part-00001 of 1
If you want to apply a lot of Map tasks, this is a good choice.
Simple example of Bucket Table
Create a data file test_bucket_table.txt
Create a table
Create table test_bucket_table (key string)
Clustered by (key) into 20 buckets
Load data
LOAD DATA INPATH 'filepath' INTO TABLE test_bucket_table
View data
Select * from test_bucket_table
Set hive.enforce.bucketing = true
Data Model of Hive-View
The view is similar to that of a traditional database
The view is read-only
The basic table on which the view is based, if changed, means that the increase will not affect the rendering of the view; if deleted, there will be a problem
If you do not specify the columns for the view, it will be generated according to the select statement
Exampl
Create view test_view as select * from test
Part VII: introduction to HiveUI
Start UI
Configuration
Hive-site.xml add
Hive.hwi.war.file
Lib/hive-hwi-0.8.1.war
Start UI sh $HIVE_HOME/bin/hive of Hive-- service hwi
Thank you for reading! This is the end of this article on "what the Hive architecture is like". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.