How to apply hdfs, hbase, hive and hbase scenarios 04/27 Update SLTechnology News&Howtos

How to apply hdfs, hbase, hive and hbase scenarios

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly shows you "how to area hdfs, hbase, hive and hbase application scenarios", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "how to area hdfs, hbase, hive and hbase application scenarios" this article.

Hive

Friends who do not want to use programming language to develop MapReduce, such as DB, friends who are familiar with SQL can use Hive for data processing and analysis offline.

Note that Hive is now suitable for data manipulation offline, that is, it is not suitable for real-time online queries or operations in a real production environment, because one word is "slow". On the contrary

It originated from the role of FaceBook,Hive as a data warehouse in Hadoop. Built at the top of the Hadoop cluster, it operates on the SQL-like interface of the data stored on the Hadoop cluster. You can use HiveQL for select,join, and so on.

If you have data warehouse requirements and you are good at writing SQL and do not want to write MapReduce jobs, you can use Hive instead.

HBase

HBase runs on top of HDFS as a column-oriented database, and HDFS lacks immediate read and write operations, which is why HBase appears. HBase is modeled on Google BigTable and stored in the form of key-value pairs. The goal of the project is to quickly locate and access the required data in billions of rows of data in the host.

HBase is a database, a NoSql database, like other databases to provide immediate read and write function, Hadoop can not meet the real-time needs, HBase can meet. If you need to access some data in real time, store it in HBase.

You can use Hadoop as a static data warehouse and HBase as a data store to store data that will change if you do something.

Both hbase and hive are built on top of hadoop. All use hadoop as the underlying storage. Hbase is used as a distributed database while hive is used as a distributed data warehouse. Of course, hive still borrows hadoop's MapReduce to complete the execution of some commands in hive.

In what scenarios do I apply Hbase?

Mature data analysis theme, query mode has been established, and will not be easily changed.

The traditional relational database has been unable to bear the load, high-speed insertion, a large number of reads.

Suitable for massive, but also simple operations (for example: key-value).

Official explanation:

Use Apache HBase ™when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables-billions of rows X millions of columns-atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

Pig VS Hive

Hive is more suitable for data warehouse tasks, while Hive is mainly used for static structures and tasks that require frequent analysis. The similarity between Hive and SQL makes it an ideal intersection of Hadoop and other BI tools.

Pig gives developers more flexibility in the big data set domain and allows the development of concise scripts to transform data streams for embedding into larger applications.

Pig is relatively lightweight compared to Hive, and its main advantage is that it can significantly reduce the amount of code compared to using Hadoop Java APIs directly. Because of this, Pig still attracts a large number of software developers.

Both Hive and Pig can be used in combination with HBase. Hive and Pig also provide high-level language support for HBase, which makes it very easy to process data statistics on HBase.

Hive VS HBase

Hive is a batch processing system based on Hadoop to reduce the writing work of MapReduce jobs, and HBase is a project to support a project to make up for Hadoop's shortcomings in real-time operation.

Imagine you are operating a RMDB database. If it is a full table scan, use Hive+Hadoop, and if it is an index access, use HBase+Hadoop.

Hive query is that MapReduce jobs can run from 5 minutes to more than a few hours. HBase is very efficient and certainly much more efficient than Hive.

These are all the contents of the article "how to use hdfs, hbase, hive and hbase scenarios". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.