What's the difference between Hive and HBase? 04/21 Update SLTechnology News&Howtos

What's the difference between Hive and HBase?

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly shows you "What is the difference between Hive and HBase". The content is simple and easy to understand and the organization is clear. I hope it can help you solve your doubts. Let Xiaobian lead you to study and learn "What is the difference between Hive and HBase".

What they have in common:

HBase and Hive are built on top of Hadoop. Hadoop is used as the underlying storage.

The difference between the two:

Hive is a batch processing system built on top of Hadoop to reduce MapReduce jobs, and HBase is a project to support Hadoop's shortcomings in real-time operations.

3. Imagine you are operating an RMDB database, Hive+Hadoop for full table scan and HBase+Hadoop for index access.

4. Hive query is MapReduce jobs can be from 5 minutes to more than a few hours, HBase is very efficient, certainly more efficient than Hive.

Hive itself does not store and compute data, it relies entirely on HDFS and MapReduce, and the tables in Hive are purely logical.

6. hive borrows MapReduce from hadoop to complete the execution of some commands in hive

7.hbase is a physical table, not a logical table. It provides a large memory hash table through which search engines store indexes to facilitate query operations.

8. hbase is column storage. Therefore, Hbase can add, modify and delete data, but Hive is a row and can only add data.

9. hdfs is the underlying storage, hdfs is the system for storing files, and Hbase is responsible for organizing files.

10.hive requires hdfs storage files and MapReduce computing framework.

Chapter 2: Hive Operations

Basic operation:

Joint query:

Inner connection:

hive> SELECT sales.*, things.* FROM sales JOIN things ON (sales.id = things.id);

See how many MapReduce jobs hive uses for a query

hive> Explain SELECT sales.*, things.* FROM sales JOIN things ON (sales.id = things.id);

External connections:

hive> SELECT sales.*, things.* FROM sales LEFT OUTER JOIN things ON (sales.id = things.id);

hive> SELECT sales.*, things.* FROM sales RIGHT OUTER JOIN things ON (sales.id = things.id);

hive> SELECT sales.*, things.* FROM sales FULL OUTER JOIN things ON (sales.id = things.id);

In query: Hive not supported, but can use LEFT SEMI JOIN

hive> SELECT * FROM things LEFT SEMI JOIN sales ON (sales.id = things.id);

CREATE TABLE ... AS SELECT: New table does not exist in advance

hive>CREATE TABLE target AS SELECT col1,col2 FROM source;

View query:

Create a view:

hive> CREATE VIEW valid_records AS SELECT * FROM records2 WHERE temperature != 9999;

View View Details:

hive> DESCRIBE EXTENDED valid_records;

Difference and operation of external table, internal table and partition table:

At this time, a new tt table data storage place will be created on hdfs. For example, the author is in hdfs://master/input/table_data.

Upload hdfs data to table:

load data inpath '/input/data' into table tt;

view source print ?

1. load data inpath '/input/data' into table tt;

This will transfer the data from/input/data directory on hdfs to/input/table_data directory.

After deleting tt table, all data and metadata information in tt table will be deleted, that is, there is no data under/input/table_data at the end, and of course, there is no data in the previous step under/input/data!

If you create an internal table without specifying a location, a table directory is created under/user/hive/warehouse/, and the rest is the same.

Note: load data will transfer data!

2. External tables:

create external table et (name string , age string);

view source print ?

1. create external table et (name string , age string);

At this point, a table directory et will be created in/user/hive/warehouse/

load data inpath '/input/edata' into table et;

At this time, the data under/input/edata/on hdfs will be transferred to/user/hive/warehouse/et. After deleting this external table, the data under/user/hive/warehouse/et will not be deleted, but the data under/input/edata/will be lost after the previous step load! The location of the data has changed! The essence is that when loading data on a hdfs, the data will be transferred!

3. partition table

Import data for a partition of an internal table, Hive will create a directory and copy the data to the partition

LOAD DATA LOCAL INPATH '${env:HOME}/california-employees'

INTO TABLE employees

PARTITION (country = 'US', state = 'CA');

Add data to a partition of an external table

ALTER TABLE log_messages ADD IF NOT EXISTS PARTITION(year = 2012, month = 1, day = 2)

LOCATION 'hdfs://master_server/data/log_messages/2012/01/02';

Note: Hive doesn't care about partitions, directories, or data, which results in no query results.

Load data from the local file system. LOAD DATA LOCAL INPATH "/opt/data/1.txt"> Load data from HDFS LOAD DATA LOCAL INPATH "/data/datawash/1.txt" INTO TABLE1; means write from HDFS/data/datawash/1.txt to the directory where table1 is located.

About loading Overwrite is like this. LOAD DATA LOCAL INPATH "/opt/data/1.txt" OVERWRITE INTO TABLE table1; if you add OVERWRITE, overwrite the data that already exists, if you are sure there is no data, you can write it.

Chapter 3: HBase Operations

Grammar:

Zookeeper needs to be installed before installing Hbase, see http://xxx

operation command expression create table

create 'table_name', 'family1','family2','familyN'

add (update) records put 'table_name',' rowkey','family: column','value'view records get 'table_name','rowkey' view total number of records in table count 'table_name' delete records delete 'table_name',' rowkey','family:column'

deleteall 'table_name','rowkey' disable 'table_name' first

Then drop 'table_name' to view all records scan 'table_name', very dangerous Best add LIMIT: scan 'table_name', LIMIT=>10 View all data in a column of a table scan' table ', {COLUMNS =>['family1:','family2' VERSIONS=2]} VERSIONS Optional

Exercise:

status //View server status

version //query Hbase version

create 'test',' course','device'//create table

list //List all tables

exists 'test' //query whether the table exists

put 'test','Li Lei','course:Math','90'

put 'test','Han Meimei','course: English','92' //insert record

get 'test','Li Lei' //Get all the data of an Id

get 'test','Li Lei','device' //Get all the data of an ID and a column family

get 'test','Li Lei','device: laptop' //Get an ID, all the data of a column in a column family

Update a record:

put 'test','Li Yang','device:laptop','Asus'

count 'test' //see how many rows there are in the table

delete 'test','Li Yang','device:laptop' //Delete the 'device:laptop' field for values with id 'Li Yang'.

deleteall 'test','Li Yang' //delete entire row

Delete table:

disable 'test'

drop 'test'

Withdrawal:

exit

The above is "Hive and HBase what is the difference" all the content of this article, thank you for reading! I believe that everyone has a certain understanding, hope to share the content to help everyone, if you still want to learn more knowledge, welcome to pay attention to the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.