Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the differences between HIVE and HBASE

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article is to share with you about the differences between HIVE and HBASE. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

1. What is the difference between the two?

Apache Hive is a data warehouse built on top of Hadoop infrastructure. Through Hive, you can use HQL language to query the data stored on HDFS. HQL is a kind of SQL language, which is eventually transformed into Map/Reduce. Although Hive provides SQL query capabilities, Hive cannot query interactively-because it can only execute Hadoop in batches on Hadoop.

Apache HBase is a Key/Value system that runs on HDFS. Unlike Hive, Hbase can run in real time on its database instead of running MapReduce tasks. Hive is partitioned into tables, and tables are further divided into column clusters. Column clusters must be defined using schema, and column clusters aggregate columns of a certain type (columns do not require schema definitions). For example, the "message" column cluster might contain "to", "from"date", "subject", and "body". Each key/value pair is defined as a cell in Hbase, and each key is composed of row-key, column cluster, column, and timestamp. In Hbase, a row is a collection of key/value maps that are uniquely identified by row-key. Hbase leverages Hadoop's infrastructure and can scale horizontally with general-purpose equipment.

two。 The characteristics of both

Hive helps people who are familiar with SQL to run MapReduce tasks. Because it is JDBC compatible, it can also be integrated with existing SQL tools. Running a Hive query takes a long time because it traverses all the data in the table by default. Despite this disadvantage, the amount of data traversed at a time can be controlled by Hive's partitioning mechanism. Partitions allow you to run filter queries on datasets that are stored in different folders and only traverse the data in the specified folder (partition) when querying. This mechanism can be used, for example, to process only files within a certain time range, as long as the time format is included in these file names.

HBase works by storing key/value. It supports four main operations: add or update rows, view a range of cell, get specified rows, and delete specified rows, columns, or column versions. Version information is used to get historical data (the historical data of each row can be deleted, and then space can be freed up through Hbase compactions). Although HBase includes tables, schema is only required by tables and column clusters, and columns do not require schema. Hbase's tables include add / count functions.

3. Limit

Hive does not currently support update operations. In addition, because hive runs batch operations on hadoop, it takes a long time, usually minutes to hours, to get the results of the query. Hive must provide a predefined schema to map files and directories to columns, and Hive is not compatible with ACID.

HBase queries are written in a specific language that needs to be relearned. The functionality of the SQL class can be implemented through Apache Phonenix, but at the expense of having to provide schema. In addition, Hbase is not compatible with all ACID features, although it supports some. Last but not least-in order to run Hbase,Zookeeper, zookeeper is a service for distributed coordination, including configuration services, maintenance meta-information, and namespace services.

4. Application scenario

Hive is suitable for analyzing and querying data over a period of time, for example, to calculate trends or website logs. Hive should not be used for real-time queries. Because it takes a long time to return the result.

Hbase is very suitable for real-time query of big data. Facebook uses Hbase for message and real-time analysis. It can also be used to count the number of connections in Facebook.

5. Summary

Hive and Hbase are two different technologies based on Hadoop-Hive is a SQL-like engine and runs MapReduce tasks, and Hbase is a NoSQL Key/vale database on top of Hadoop. Of course, both tools can be used at the same time. Just like searching with Google and socializing with FaceBook, Hive can be used for statistical queries, HBase can be used for real-time queries, data can be written from Hive to Hbase, and then set up to write back to Hive from Hbase.

Thank you for reading! This is the end of the article on "what's the difference between HIVE and HBASE". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report