In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)05/31 Report--
Most people don't understand the knowledge points of this article "What are Hive data types?" Therefore, Xiaobian summarizes the following contents for everyone. The contents are detailed, the steps are clear, and they have certain reference value. I hope everyone can gain something after reading this article. Let's take a look at this article "What are Hive data types?"
I. Introduction to Hive
hive: A statistical tool open-sourced by Facebook to solve massive structured logs.
Hive is a Hadoop-based data warehouse tool that maps structured data files into a table and provides SQL-like query capabilities.
Hive's advantages and disadvantages
Pros:
Similar to SQL statements, easy to learn
Avoid writing MapReduce and reduce developer learning costs
Hive execution delay is relatively high, so Hive is often used for data analysis, and the real-time requirements are not high.
Hive has the advantage of processing big data, but it has no advantage for processing small data because Hive has higher execution latency.
Hive supports user-defined functions. Users can implement their own functions according to their own needs.
Disadvantages:
Hive has limited ability to express HQL
Hive is less efficient.
Hive is essentially an MR
Hive Architecture Hive User Interface
Hive CLI(Hive Command Line) Hive Command Line
HWI(Hive Web Interface) HiveWeb Interface
Hive provides Thrift services, or Hiveserver.
Three storage modes of Hive metadata
Single-user mode: Hive is installed using Derby database to store metadata by default, so Hive cannot be invoked concurrently.
Multi-user mode: MySQL server stores metadata
Remote Server Mode: Starting MetaStoreServer
Hive Data Storage
Hive data can be distinguished into table data and metadata. Table data we all know is the data in tables, while metadata is used to store table names, columns, table partitions, and attributes.
Hive is based on Hadoop distributed file storage, and its data is stored in HDFS. Now let's introduce some common data import methods in Hive
Import data from local file system to Hive
Import data from HDFS to Hive tables
Query the corresponding data from other tables and import it into Hive table
When creating a table, query the corresponding records from other tables and insert them into the created table
#1.1 create table student(id string, name string) row format delimited fields terminated by '\t';#1.2 load local files into hive load data local inpath '/root/student.txt' into table default.student; #default.test database. Table name can also be directly table name #2. Demonstration loading HDFS file into hive #2.1 Upload file to HDFS root directory dfs -put/root/student. txt/;#2.2 Load data on HDFS load data inpath '/student.txt' into table test.student;#3. Load data overwrite original data in table #3.1 Upload file to HDFS dfs -put/root/student. txt/;#Loading a file under a table is equivalent to cutting in Windows #3.2 Load data overwrite original data in a table load data inpath '/student.txt' overwrite into table test.student;#4. Query table select * from student;#Insert data into table by query statement (insert)#1.1 Create table student_par (id int,name String)row format delimited fields terminated by '\t';#1.2 Insert data via insert insert into table student_par values(1,' zhangsan'),(2,'lisi'); Architecture principles
user interface
CLI (command-line interface), JDBC/ODBC(jdbc access hive), WEBUI (browser access hive)
metadata
Metadata includes: table name, database to which the table belongs (default), owner of the table, column/partition field, type of table (whether it is an external table), directory where the data of the table is located, etc.
Hadoop
HDFS for storage and MapReduce for computation.
Driver: Driver
(1) SQL Parser: converts SQL strings into abstract syntax tree AST, which is generally completed with third-party tool libraries, such as antlr; performs syntax analysis on AST, such as whether tables exist, whether fields exist, and whether SQL semantics are incorrect.
(2) Compiler (Physical Plan): compile AST to generate logical execution plan.
Query Optimizer: Optimize logical execution plans.
Execution: Transforming logical execution plans into operational physical plans. For Hive, it's MR/Spark.
Hive file format
TextFile
This is the default file format. Data will not be compressed, disk overhead, data parsing overhead is also large.
SequenceFile
This is a binary file support provided by the Hadoo API, serialized into files in binary form.
RCFile
This format is the storage method of row and column storage structure.
ORC
The Optimized Row Columnar ORC file format is a columnar storage format in the Hadoop ecosystem.
Advantages of ORC:
List storage with multiple file compression options
Files are divisible.
Multiple indexes are provided
Support complex data structures such as Map
ORC file format is stored in binary mode, so it is not directly readable.
Hive essence
Convert HQL to MapReduce.
Hive processes data stored on HDFS
The underlying implementation of Hive analytics data is MapReduce.
Executor runs on Yarn
How Hive works
Hive is simply a query engine. When Hive receives a SQL statement, it does the following:
Lexical analysis and syntax analysis. Parse SQL statements into abstract syntax trees using antlr
Semantic analysis. Get metadata information from MetaStore, interpret table name, column name and data type in SQL statement
Logical plan generation. Generative logic plan gets operator tree
Logic plan optimization. Optimizing the operator tree
Physical plan generation. Physical plan of DAG composed of MapReduce tasks generated from logical plan
Physical plan execution. Send DAG to Hadoop cluster for execution
Return query results.
Hive presents MapReduce tasks designed to components:
Metastore: This component stores information about tables in Hive, including tables, table partitions, schemas, columns and their types, table mapping relationships, etc.
Drivers: Components that control the HiveQL lifecycle
Query Editor
execution engine
Hive Server
Client component provides command line interfaces Hive CLI, Web UI, JDBC driver, etc.
Hive data types
Hive supports two data types, an atomic data type and a complex data type.
Basic data type type description example TINYINT1 byte has a matching integer 1SMALLINT2 byte signed integer 1INT4 byte signed integer 1BIGINT8 byte signed integer 1FLOAT4 byte single-precision floating-point number 1.0 DOUBLE8 byte double-precision floating-point number 1.0 BOOLEANtrue/falsetrueSTRING string "hive",'hive'
The String data type in Hive is similar to VARCHAR in MySQL. The type is a mutable string.
Hive supports data type conversion. Hive is written in Java, so the data type conversion rules follow Java:
Implicit conversion--> Small to Large
Forced conversion--> Big to small
Type Description Example ARRAY Ordered fields. Character types must be the same ARRAY(1,2)MAP Unordered key-value pairs. The type must be atomic and the value can be any type. Map ('a ', 1,'b', 2) STRUCTURE A named set of fields. Field types can be different STRUCTURE ('a ', 1,1,0)
The above is the content of this article about "What are Hive data types?" I believe everyone has a certain understanding. I hope the content shared by Xiaobian will be helpful to everyone. If you want to know more relevant knowledge content, please pay attention to the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.