What is the concept of the HBase data model 04/27 Update SLTechnology News&Howtos

What is the concept of the HBase data model

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what is the concept of HBase data model". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let Xiaobian take you to learn "What is the concept of HBase data model"!

Introduction:

HBase(Hadoop Database): A high-reliability, high-performance, column-oriented, scalable distributed storage system that uses HBase technology to build large-scale organizational storage clusters on inexpensive PC servers.

HBase is a knockoff version of BigTable, but it stands on the shoulders of the giant Hadoop!

HBase is a non-relational database (NoSQL database)

NoSQL relational database data model is suitable for structured, unstructured and semi-structured data. It is suitable for structured data. Scalability is easy to expand, such as adding nodes and expanding. It is difficult to operate different standard SQL or SQL-like schema. Fields must be predefined at any time. It is not convenient to expand.

HBase stores data in table form

Features of the table:

Large: A table can have hundreds of millions of rows and millions of columns.

Column-oriented: column-oriented (family-oriented) storage and permission control, column (family-independent) retrieval

Sparse: null columns do not take up storage space, so tables can be designed very loosely

Below is the Hadoop ecosystem:

HBase is located in the structured storage layer;

Hadoop HDFS provides high-reliability underlying storage support for HBase

Hadoop MapReduce provides high-performance computing power for HBase

Zookeeper provides stability and failover for HBase

Pig and Hive also provide high-level language support for HBase, making statistical processing of data on HBase very easy. Sqoop provides convenient RDBMS data import function for HBase, which makes it very convenient to migrate traditional databases to HBase.

Key concepts of the data model:

RowKey: The primary key used to retrieve records. There are only three ways to access rows in HTable:

Access via a single rowkey

by rowkey range

full table scan

Qualifier: dynamically extensible column, which does not need to be defined in HTable schema first, similar to Column in relational database

Famliy: A family with multiple qualifiers under it. TTL, Versions, Comperssion wait are all set at this level.

Version/Timestamp: The value under family:qualifier corresponding to a single rowkey can allow multiple versions, distinguished by millisecond timestamps, or set a range to take data corresponding to any version

Cell: The smallest unit of storage. Value is uniquely determined from rowkey+family:qualifier+timestamp

Namespace: table namespace, default "default" space if not specified

HTable system table:

hbase:namespace: the namespace where the tables of htable are stored

hbase:meta: stores rowkey information, location information, etc. of each region of htable

hbase:ad: records information about table operation permissions, supports ACL to qualifier, permission classes READ ('R '),WRITE ('W'),EXEC ('X '),CREATE ('C'),ADMIN ('A ')

HBase Storage Framework:

Zookeeper: HBase cluster depends on this component

Zookeeper Quorum Storage-ROOT-Table Address, HMaster Address

HRegionServer registers itself in Zookeeper in Ephedal mode, and HMaster senses the health status of each HRegionServer at any time.

HMaster: Mainly responsible for managing Table and Region

Manage user's operations of adding, deleting, checking and correcting tables

Manage Load Balancer of HRegionServer, Adjust Region Distribution

After Region Split, responsible for the distribution of new Region

Responsible for Region migration on failed HRegionServer after HRegionServer outage

HRegionServer: The core module, mainly responsible for corresponding user I/O requests, reading and writing data to HDFS file system

HRegionServer manages a list of HRegion objects

Each HRegion corresponds to a Region in Table, and an HRegion consists of multiple HStores.

Each HStore corresponds to a Column Family store in Table

Column Family is a centralized storage unit, so it is more efficient to put Columns with the same IO characteristics in a Column Family.

Region: When the Table grows larger with the increasing number of records, it will gradually split into multiple splits, which become regions. A region is represented by [startkey,endkey). Different regions will be assigned to the corresponding RegionServer by the Master for management.

HStore: The core of HBase storage. MemStore is a Sorted Memory Buffer.

HLog: In a distributed system environment, it is impossible to avoid system errors or downtime. Once HRegionServer unexpectedly exits, the memory data in MemStore will be lost. To prevent data loss, HLog is introduced.

Every time a user writes to Memstore, a piece of data is also written to the HLog file, which periodically scrolls to new files and deletes old files (data persisted to StoreFile). When HRegionServer terminates unexpectedly, HMaster will sense it through Zookeeper. HMaster will first process the legacy HLog files, split the log data of different regions into corresponding region directories, and then redistribute the invalid regions. HRegionServer that receives these regions will find that there are historical HLog files to process in the process of Load Region, so it will Replay the data in HLog to MemStore, and then flush to StoreFiles to complete data recovery.

HDFS: HDFS is a distributed file system. It implements distributed storage by dividing a large file into fixed-size blocks. The default size of each Block is 128MB. Each Block has multiple backups and is deployed on different data nodes to ensure data security. Currently, all of HBase's underlying data is stored in HDFS as files. The HBase side itself does not solidify and stores data information

At this point, I believe that everyone has a deeper understanding of "what is the concept of HBase data model". Let's actually operate it! Here is the website, more related content can enter the relevant channels for inquiry, pay attention to us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.