Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Summary of hbase and hive

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

1. What's the difference?

Apache Hive is a data warehouse built on top of Hadoop infrastructure. Hive allows you to query data stored on HDFS using HQL. HQL is a SQL-like language that eventually became Map/Reduce. Although Hive provides SQL query functionality, Hive cannot perform interactive queries--because it can only execute Hadoop in bulk on Haoop.

Apache HBase is a Key/Value system that runs on top of HDFS. Unlike Hive, Hbase can run real-time on its database, rather than running MapReduce tasks. Hives are partitioned into tables, which are further partitioned into clusters of columns. Column clusters must be schema defined, and a column cluster is a collection of columns of a certain type (columns do not require a schema definition). For example, the "message" column cluster might contain: "to", "from" "date", "subject", and "body". Each key/value pair is defined as a cell in Hbase, and each key consists of row-key, column cluster, column, and timestamp. In Hbase, rows are collections of key/value mappings uniquely identified by row-keys. Hbase leverages Hadoop's infrastructure to scale horizontally with common devices.

2. Characteristics of both

Hive helps people familiar with SQL run MapReduce tasks. Because it is JDBC compliant, it can also be integrated with existing SQL tools. Hive queries take a long time to run because they traverse all the data in the table by default. Despite this drawback, the amount of data traversed at once can be controlled by Hive's partitioning mechanism. Partitions allow you to run filter queries on datasets stored in different folders, traversing only the data in a specified folder (partition). This mechanism can be used, for example, to process only files within a certain time range, as long as the file names include the time format.

HBase works by storing key/value. It supports four main operations: adding or updating rows, viewing a range of cells, retrieving specified rows, and deleting specified rows, columns, or column versions. Version information is used to retrieve historical data (historical data for each row can be deleted and then freed up via Hbase compactions). Although HBase includes tables, a schema is only required for tables and column clusters; columns do not require a schema. Hbase tables include increment/count functionality.

3. limit

Hive does not currently support update operations. Also, because hive runs batch operations on hadoop, it takes a long time, usually minutes to hours, to get results from queries. Hive must provide predefined schemas to map files and directories to columns, and Hive is not ACID compatible.

HBase queries are written in a specific language that needs to be relearned. SQL-like functionality can be achieved via Apache Phonix, but at the expense of having to provide a schema. Also, Hbase is not compatible with all ACID features, although it supports some. Last but not least-Zookeeper is required to run Hbase, a service for distributed coordination, including configuration services, maintenance meta-information, and namespace services.

4. application scenarios

Hive is ideal for analyzing queries over time, for example, to calculate trends or website logs. Hive should not be used for real-time queries. Because it takes a long time to return results.

Hbase is ideal for real-time queries of big data. Facebook uses Hbase for messaging and real-time analytics. It can also be used to count Facebook connections.

5. summary

Hive and Hbase are two different technologies based on Hadoop-Hive is a SQL-like engine that runs MapReduce tasks, and Hbase is a NoSQL Key/vale database on top of Hadoop. Of course, these two tools can be used simultaneously. Just like using Google to search and Facebook to socialize, Hive can be used for statistical queries, HBase can be used for real-time queries, data can also be written from Hive to Hbase, and settings can be written from Hbase back to Hive.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report