Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the difference between Impala and hive

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains "what is the difference between Impala and hive". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what is the difference between Impala and hive".

Introduction to Impala

Impala is a new query system developed by Cloudera. It provides SQL semantics and can query PB-level big data stored on HDFS and HBase of Hadoop. Its performance is 30 times higher than that of Hive.

The operation of Impala depends on the metadata of Hive. Impala is designed with reference to the Dremel system.

Impala uses a distributed query engine similar to commercial parallel relational databases, which can query directly with HDFS and HBase.

Impala and Hive use the same SQL syntax, ODBC driver, and user interface.

module

Figure: relationship between Impala and other components

Impala system architecture

System architecture

Figure: Impala system architecture diagram

Impala and Hive, HDFS, HBase and other tools are deployed on the same Hadoop platform. Impala is mainly composed of Impalad,State Store and CLI.

(1) Impalad

Responsible for coordinating the execution of queries submitted by the client includes three modules: Query Planner, Query Coordinator and Query Exec Engine. Runs on the same node as the data node (HDFS DN) of HDFS. Assign tasks to other Impalad and collect the execution results of other Impalad for summary. Impalad also performs other tasks assigned to it by Impalad, mainly manipulating some of the data in the local HDFS and HBase.

(2) State Store

A statestored process is created. Responsible for collecting the resource information of each Impalad process distributed in the cluster for query and scheduling.

(3) CLI

Provide users with command-line tools to use for queries. It also provides interfaces for Hue, JDBC and ODBC.

Description: the metadata in Impala is stored directly in Hive. Impala uses the same metadata, SQL syntax, ODBC driver and user interface as Hive, so that analysis tools such as Hive and Impala can be deployed uniformly on one Hadoop platform, while supporting batch processing and real-time query.

Impala query execution process

Process diagram

Figure: Impala query execution process diagram

The specific process for Impala to execute the query:

Step 0, before the user submits the query, Impala creates an Impalad process responsible for coordinating the query submitted by the client, which submits the registration subscription information to Impala State Store, State Store creates a statestored process, and the statestored process processes the registration subscription information of Impala by creating multiple threads. In the first step, the user submits a query to the impalad process through the CLI client, and the Query Planner of Impalad parses the SQL statement to generate a parsing tree; then Planner changes the parsing tree of the query into several PlanFragment and sends it to Query Coordinator. In step 2, Coordinator gets the data address from the name node of the HDFS by getting the metadata from the MySQL Metabase to get all the data nodes that store the data related to the query. Step 3, Coordinator initializes the task execution on the corresponding impalad, that is, assigning the query task to all data nodes that store the data related to the query. In step 4, the Query Executor streams the intermediate output, and the Query Coordinator aggregates the results from each impalad. Step 5, Coordinator returns the summarized results to the CLI client. Comparison between Impala and Hive

Contrast

Figure: comparison of Impala and Hive

The differences between Hive and Impala are summarized as follows:

Hive is suitable for long-time batch query analysis, while Impala is suitable for real-time interactive SQL query. Hive relies on the MapReduce computing framework. Impala represents the execution plan as a complete execution plan tree and directly distributes the execution plan to each Impalad to execute the query. During the execution of Hive, if it does not store all the data in it, it will use external memory to ensure that the query can be executed sequentially, while Impala will not use external memory when it is unable to store data in it, so Impala will be subject to certain restrictions in processing queries.

The similarities between Hive and Impala are summarized as follows:

Hive uses the same storage data pool as Impala, and both support storing data in HDFS and HBase. Hive uses the same metadata as Impala. The interpretation of SQL in Hive is similar to that in Impala, which generates the execution plan through lexical analysis.

Summary:

Impala is not intended to replace existing MapReduce tools. The combination of Hive and Impala works best. You can first use Hive for data conversion processing, and then use Impala for rapid data analysis on the resulting dataset after Hive processing. Thank you for your reading, the above is the content of "what is the difference between Impala and hive". After the study of this article, I believe you have a deeper understanding of what is the difference between Impala and hive, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report