Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Introduction and Application of Hive in Hadoop

2025-01-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

This paper takes the Hive tool in Hadoop as an example to analyze the core concepts and architecture principles of Hive as well as the application of Hive. Reading the complete article, I believe you have a certain understanding of Hive in Hadoop.

1. Hive core concepts and architecture principles 1.1, Hive concepts

Hive is developed by FaceBook to solve the data statistics of massive structured logs.

Hive is a Hadoop-based data warehouse tool that maps structured data to a table and provides query functions similar to SQL statements.

Essence: convert Hive SQL into MapReduce program.

1.2 、 The difference between Hive and database on variable Hive database software query language HQLSQL data storage HDFSRaw Devce or Loal FS executor MapReduceExecutor data insertion support batch import / single insert support single or batch import data operations overwrite additional row-level update delete processing data size execution delay low partition support index version 0.8 after the addition of index support complex index scalability Limited data loading mode read time mode (fast) write time mode (slow) application scene massive data query real-time query

Read-time mode: Hive does not validate when loading data into a table.

Write-time mode: the Mysql database validates when it inserts data into the table.

Summary: Hive is only suitable for massive offline data statistical analysis, that is, data warehouse.

1.3. advantages and disadvantages of Hive

Advantages: the operation interface uses a SQL-like syntax, which provides the ability of rapid development, avoids writing MapReduce;Hive and supports user-defined functions, and users can implement their own functions according to their own needs.

Disadvantages: Hive does not support record-level add, delete and modify operations; Hive query delay is very serious; Hive does not support transactions.

1.4.The principle of Hive architecture

(1) user interface: CLI (hive shell); JDBC (java access Hive); WEBUI (browser access Hive)

(2) metadata: MetaStore

The metadata includes: table name, database to which the table belongs (default is default), table owner, column / partition field, target type (whether the table is an external table), and the directory where the table's data resides. This is the default storage of data in the derby database that comes with Hive. It is recommended to use MySQL database to store MetaStore.

(3) Hadoop cluster:

HDFS is used to store data and MapReduce is used to calculate.

(4) Driver: driver

Parser (SQL Parser): replace the SQL string with the abstract syntax tree AST to parse the AST, such as whether the table exists, whether the field exists, and whether the SQL semantics is incorrect.

Compiler (Physical Plan): compiles AST into a logical execution plan.

Query Optimizer: optimizes the logical plan.

Execution: converts an execution plan into a physical plan that can be run. For Hive, the default is the Mapreduce task.

Process of data analysis through Hive**:

2. Hive interaction mode

You need to start the hadoop cluster and MySQL service first

2.1.Interactive Hive shellcd / opt/bigdata2.7/hive (the installation path of hive, which can be changed according to your actual situation) bin/hive

You can write a HQL statement on the command port: show databases; verifies that it is available.

2.2, JDBC interaction

Typing hiveserver2 is equivalent to opening a server to check the transition of hivesever2.

Enter the netstat-nlp command to view:

Running hiveserver2 is equivalent to opening a server with port number 10000, and you need to open a client to communicate, so open another window and enter the command beeline.

Beeline connection method:! connect jdbc:hive2://node1:10000

Don't omit the idea!

Of course, the hiveserver2 server can run in the background:

Nohup hiveserver2 & 3. Hive data type 3.1 basic data type name description example booleanTrue/falseTruetinyint1 byte signed integer 1Smallint2 byte signed integer 1int4 byte signed integer 1Bigint8 byte signed integer 1Float4 byte precision floating point number 1.0Double8 byte precision floating point number 1.0String string (no length) "adcadfaf" Varchar string (1-65355) "adfafdafaf" Timestamp timestamp 123454566date date 201602023.2 composite data class Type name description example Array an ordered set of fields The field type must be the same array (element 1, element 2) Array (1pc2) Map a set of disordered keys and values to map (K1memv1recorder k2Powerv2) Map ('axiaozhenglingjingbaojin2) Struct a group of named fields, the field type can be different struct (element1, element2) Struct (' axiaojiejie 1pl 2p0)

(1) the element access method of the Array field: the subscript gets the element, and the subscript starts with 0.

For example: get the first element: array [0]

(2) how to access the Map field: get the value by key

For example: get the value:map corresponding to a this key ['a']

(3) how to obtain the elements of the struct field:

Define a field c of type struct (an int;b string)

Get the values of an and b:

Create table complex (col1 array, col2 map, col3 struct) 4, Hive data type conversion 4.1, implicit type conversion

The system automatically implements type conversion without customer intervention.

For example, tinyint can be converted into int,int and can be converted into bigint

All integer types, float, string types can be implicitly converted to double

Tinyint, samllint and int can all be converted to float

Boolean cannot be converted to any other type

4.2. Manual type conversion

You can use the data type conversion displayed by the cast function

For example: cast ('1' as int) converts the string'1' to the integer 1

NULL is returned if the cast type fails, such as executing a cast ('x'as int) expression.

After reading the above, do you have a general understanding of the Hive tools in Hadoop? If you want to know more about the content of the article, welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report