How to understand the structured Storage Database HBase in big data era 07/15 Update SLTechnology News&Howtos

How to understand the structured Storage Database HBase in big data era

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article shows you how to understand the structured storage database HBase of the big data era, which is concise and easy to understand. It will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Hbase is very suitable for databases with unstructured data storage. It was initiated by Chad Walters and Jim Kellerman of PowerSet at the end of 2006 and became a sub-project of Apache Hadoop in 2008. Now it has been used as a product in many enterprises.

Distributed database HBase

Authorization Agreement: Apache

Development language: Java

Operating system: cross-platform

Project address: https://github.com/apache/hbase

HBase Project introduction

HBase-Hadoop Database is a highly reliable, high-performance, column-oriented and scalable distributed storage system. Large-scale structured storage clusters can be built on cheap PC Server by using HBase technology.

HBase is an open source implementation of Google Bigtable, similar to Google Bigtable using GFS as its file storage system and HBase using Hadoop HDFS as its file storage system; Google runs MapReduce to deal with massive data in Bigtable, HBase also uses Hadoop MapReduce to deal with massive data in HBase; Google Bigtable uses Chubby as a collaborative service and HBase uses Zookeeper as its counterpart.

Hbase characteristics

Large tables: billions of rows * millions of columns * thousands of versions = TB or PB storage

Column-oriented: column-oriented storage and permission control, column (family) independent retrieval.

Sparse: columns that are null do not take up storage space, so tables can be designed to be very sparse.

Multiple versions of data: there can be multiple versions of data in each cell. By default, the version number is automatically assigned, which is the timestamp when the cell is inserted.

Single data type: all data in Hbase is a string and has no type

HBase system architecture

The components in HBase include Client, Zookeeper, HMaster, HRegionServer, HRegion, Store, MemStore, StoreFile, HFile, HLog, etc. Each table in HBase is divided into multiple child tables (HRegion) according to a certain range through row keys. By default, a HRegion exceeding 256m is divided into two. This process is managed by HRegionServer, while the allocation of HRegion is managed by HMaster.

Introduction of related nouns

RowKey: is Byte array, is the "primary key" of each record in the table, easy to find quickly, the design of Rowkey is very important. The rows in the table are sorted according to the key values of the rows, and the data is stored in the dictionary order of RowKey.

Column Family: a column family that has a name (string) and contains one or more related columns. Column families must be predefined as part of the table schema (schema) definition. Such as create 'alarmInfo',' i'

Column: belongs to a certain columnfamily,familyName:columnName, and each record can be added dynamically

Version Number: type is Long, default is system timestamp, which can be customized by the user

Value (Cell): a unit determined by {row key, column (= +), version} * *. The data in cell is typeless and is all stored in bytecode form.

HBase logical model

HBase stores data as a table. The table consists of rows and columns. Columns are divided into several column families (row family)

Hbase physical model

All rows in Table are arranged according to the dictionary order of row key; Table is divided into multiple Region;Region according to size in the direction of the row, and there is only one region at the beginning of each table. With the increase of data, the region continues to grow. When it reaches a threshold, region will divide into two new region, and then more and more region;Region will be the smallest unit of distributed storage and load balancing in Hbase, and different Region will be distributed to different RegionServer.

Although Region is the smallest unit of distributed storage, it is not the smallest unit of storage. Region consists of one or more Store, each store holds an columns family;, each Strore consists of a memStore and 0 to more StoreFile, the StoreFile contains the HFile;memStore stored in memory, and the StoreFile is stored on the HDFS.

Hbase request process

HBase is a distributed database, so the data in one table may be distributed in different nodes. It is important to note that region is the smallest unit of Hbase distributed storage, but region is not the smallest unit of HBase storage. In HBase, a table is divided into several region according to the range of row key values, and then different region will be placed on different region servers, managed and maintained by HRegionServer on the server.

So we can infer that when we send a lookup (insert, delete) request, the client can first determine which region the row key value should be stored on according to the row key value in the request, and which region server the region is on (the query process mainly uses-ROOT- table and .meta table, after locating the location of the operating region server. The client (Client) sends the operation to the region server, and then the requested operation may be performed directly, or it may enter the task queue and wait.

Comparison between Hbase and RDBMS

Why use HBase?

Different from the general relational database, HBase is a database suitable for unstructured data storage. The so-called unstructured data storage means that HBase is column-based rather than row-based, so you can read and write your big data content.

HBase is a data storage method between Map Entry (key & value) and DB Row. A little bit similar to the popular Memcache, but not just a simple key corresponding to a value, you probably need to store multiple attributes of the data structure, but not as many relationships in the traditional database tables, this is the so-called loose data.

To put it simply, what you create a table in HBase can be seen as a large table, and the properties of this table can be dynamically increased according to your requirements. There is no query associated with tables in HBase. You just need to tell which column families your data is stored in Hbase, and you don't need to specify its specific type: char,varchar,int,tinyint,text, etc. However, you need to note that HBase does not include functions such as transactions.

The above is how to understand the structured storage database HBase in the era of big data. Have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.