Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the basic concept of Hive

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly shows you "what is the basic concept of Hive", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "what is the basic concept of Hive" this article.

1. Hive basic Concepts 1.1.What is hive? Hive is a data warehouse tool based on Hadoop, which can map structured data files to a table and provide SQL-like query functions. Facebook. 1.2.The essence of hive is to transform HQL into MR program. Workflow: Hive database encapsulates the commonly used statements in SQL into corresponding MapReduce templates and encapsulates them in hive-> the customer needs to submit tasks to hive using SQL statements-> hive will call the MapReduce program according to the corresponding sql statement-> MapReduce program will be submitted to yarn to run-> the running result will be returned to the customer.

What I want to say is:

The data processed by Hive is stored on hdfs. (this must be clear, or there will be the illusion that hive is like a database.)

The underlying layer of the HQL statement call is MapReduce.

The Hive task is submitted on Yarn.

Advantages and disadvantages of hive: (1) Hive can use SQL-like language for data query operation, which reduces the difficulty of development; (2) Hive avoids writing lengthy MapReduce procedures when developing, and improves development efficiency; (3) Hive can analyze and calculate massive data; (4) Hive provides custom functions, which users can customize according to their needs. Disadvantages: (1) Hive's HQL expression ability is limited, the automatically generated MapReduce procedures are not intelligent enough, and can not encapsulate part of the better algorithm; (2) Hive does not support line-level update, only allow new and append, do not allow to modify and delete operation; (3) Hive is relatively inefficient, high execution delay, and coarse tuning granularity. 1.4.The principle of hive framework

(1) client: user interface, command line cli, JDBC interface

(2) Metastore: metadata, such as library name, table name, field, category, storage directory, custom function, etc., which are stored in the built-in derby by default.

(3) Hadoop:HDFS stores data, and MapReduce performs data calculation.

(4) Driver: parser (SQL-- > AST), compiler (AST-- > execution plan), optimizer (execution plan optimization), executor (execution plan-> MapReduce)

1.5.The comparison between hive and database

(1) query language: Hive provides SQL-like query statement HQL, but HIve does not provide data to be stored in the computing environment

(2) data update: Hive does not support deletion or modification of the database.

(3) execution delay: Hive execution delay is high.

(4) data scale: Hive can analyze and calculate massive data.

These are all the contents of the article "what are the basic Concepts of Hive". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report