Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the principle and architecture of HBase?

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly shows you "what is the principle and structure of HBase". The content is simple and clear. I hope it can help you solve your doubts. Let the editor lead you to study and learn this article "what is the principle and structure of HBase".

I. logical storage model

HBase stores data as a table, which consists of rows and columns. Columns are divided into several column families

RowKey:Hbase uses Rowkey to uniquely distinguish a row of data. Such as "rk001" in the picture.

Column family: Hbase divides the data storage through the column family, and the column family can contain as many columns as you like to achieve flexible data access. The number of column families in Hbase is not as many as possible. The official recommendation is that column families should be less than or equal to 3. The scenario we use is usually 1 column family. Such as the "CF1" column family in the figure, there are two columns below: "Name" and "Alias".

Timestamp: TimeStamp is critical to Hbase because it is the key to implementing multiple versions of Hbase. Different timestame is used in Hbase to identify the impassable version of data corresponding to the same rowkey row.

A storage unit determined by rowkey and columns in Cell:HBase is called cell. Each cell holds multiple versions of the same data. The version is indexed by timestamp.

Second, physical storage model

In HBase, data is stored as a table, which consists of many rows, each consisting of Row key (row keys) and one or more column values. We can think of it this way: when a table has a lot of Row, we split the table into many parts according to certain rules (for example, every 500th), then each part of the split is called HRegion, and the HRegion is assigned to a RegionServer by HMaster as a whole. It can be understood here that HMaster is thought of as a boss who assigns HRegion to a server so that a table is divided into multiple HRegion and may be assigned to different RegionServer. We just said that HRegion is a whole, which means he can't go any further, he has to be on a RegionServer.

III. Overall structure

The overall structure of HBase is as follows:

Including HMaster, HRegionSever, HRegion, HLog, Store, MemStore, StoreFile, HFile and so on.

The underlying layer of HBase relies on HDFS, and HDFS operations are performed through DFS Cilent.

HMaster is responsible for assigning HRegion to HRegionServer. Each HRegionServer can contain multiple HRegion, and multiple HRegion share HLog,HLog for disaster recovery.

Each HRegion consists of one or more Store, a Store corresponding to a column family of the table, each Store contains its corresponding MemStore and one or more StoreFile (which is the lightweight encapsulation of the actual data storage file HFile), the MemStore is in memory, the modified data is saved, and the data in MemStore is written to the file is StoreFile.

3.1 HMaster

The main functions of HMaster are:

① assigns HRegion to a RegionServer.

② has RegionServer downtime, and HMaster can migrate the Region on this machine to active's RegionServer.

③ balances the load of HRegionServer.

④ collects junk files (invalid logs, etc.) through HDFS's dfs client interface

Note: there is no single point problem with HMaster. Multiple HMaster can be started in HBase, and there is always a Master running through Zookeeper's Master Election mechanism.

3.2 HRegionServer

① maintains the HRegion assigned to it by HMaster and processes IO requests to these HRegion, that is, clients deal directly with HRegionServer. (you can also see it in the picture.)

② is responsible for shredding HRegion that is becoming too large during operation.

3.3 HRegion

Let's look at the structure of HRegion:

Each HRegion consists of multiple Store, each Store holds a column family (Columns Family), the table has several column families, then there are several Store, each Store consists of a MemStore and multiple StoreFile, MemStore is the content of Store in memory, and StoreFile is written to the file. The underlying StoreFile is saved in HFile format.

3.4 HLog

HLog (WAL log): Wall means write ahead log (pre-written log), which is used for disaster recovery. HLog records changes to data, including serial numbers and actual data, so once region server goes down, you can roll back data that has not been persisted from log.

3.5 HFile

The data of HBase is eventually stored in HDFS in the form of HFile, and HFile in HBase has its own format.

The above is all the content of the article "what is the principle and structure of HBase". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report