What is the underlying principle of HBase? 07/06 Update SLTechnology News&Howtos

What is the underlying principle of HBase?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

What is the underlying principle of HBase? I believe many inexperienced people don't know what to do about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Introduction to HBase

HBase is a distributed, column-oriented, open source database. Built on top of HDFS. The name Hbase comes from Hadoop database, the Hadoop database. The computing and storage capacity of HBase depends on the Hadoop cluster.

It is between NoSql and RDBMS, can only retrieve data through primary key (row key) and primary key range, and only supports single-row transactions (complex operations such as multi-table join can be realized through Hive support).

Characteristics of tables in HBase:

Large: a table can have billions of rows and millions of columns

Column-oriented: column-oriented storage and permission control, column (family) independent retrieval.

Sparse: columns that are null do not take up storage space, so tables can be designed to be very sparse.

HBase underlying principle system architecture

HBase system architecture

According to this diagram, explain the components in HBase

Client

Including interfaces to access hbase, Client maintains some cache to speed up access to hbase, such as regione location information.

Zookeeper

HBase can use built-in Zookeeper or external, in the actual production environment, in order to maintain unity, external Zookeeper is generally used.

The role of Zookeeper in HBase:

Ensure that there is only one master in the cluster at any time

Store the addressing entry for all Region

Monitor the status of Region Server in real time and inform Master of the online and offline information of Region server in real time.

HMaster

Assign region to Region server

Responsible for load balancing of region server

Discover failed region server and reallocate region on it

Garbage file collection on HDFS

Processing schema update requests

HRegion Server

HRegion server maintains the region assigned to it by HMaster and processes IO requests for these region

HRegion server is responsible for shredding region that becomes too large during operation.

As you can see from the figure, Client does not need HMaster to participate in the process of accessing data on HBase (addressing access Zookeeper and HRegion server, data read-write access HRegione server)

HMaster only maintains the metadata information of table and HRegion, and the load is very low.

Table data Model of HBase

The overall structure of HBase

All the lines in Table are arranged in Row Key dictionary order.

The Table is split into multiple HRegion in the direction of the row.

HRegion is segmented by size (default is 10G). There is only one HRegion at the beginning of each table. As the data is inserted into the table, the HRegion grows. When it reaches a threshold, HRegion will divide into two new HRegion. When there are more rows in the Table, there will be more and more HRegion.

HRegion is the smallest unit of distributed storage and load balancing in HBase. The smallest unit means that different HRegion can be distributed on different HRegion Server. However, a HRegion will not be split into multiple Server.

Although HRegion is the smallest unit of load balancing, it is not the smallest unit of physical storage.

In fact, a HRegion consists of one or more Store, and each Store holds one Column Family.

Each Strore consists of a MemStore and 0 to more StoreFile. As pictured above.

2. StoreFile and HFile structure

StoreFile is saved on HDFS in HFile format.

The format of HFile is:

Specific structure of HFile

It begins with two fixed-length values that represent the length of the Key and the length of the Value. This is followed by Key, starting with a fixed-length number, indicating the length of RowKey, followed by RowKey, then a fixed-length value, representing the length of Family, then Family, then Qualifier, then two fixed-length values, representing Time Stamp and Key Type (Put/Delete). The Value part doesn't have such a complex structure, so it's pure binary data.

HFile is divided into six parts:

Data Block segment-saves the data in the table, which can be compressed.

Meta Block section (optional)-saves user-defined kv pairs that can be compressed.

File Info section-the meta-information of Hfile, which is not compressed, and users can also add their own meta-information in this section.

Data Block Index segment-the index of the Data Block. The key of each index is the key of the first record of the indexed block.

Meta Block Index section (optional)-the index of the Meta Block.

Trailer- this part is fixed in length. Save the offset of each segment, when reading a HFile, you will first read the Trailer,Trailer to save the starting position of each segment (the Magic Number of the segment is used for security check), and then the DataBlock Index will be read into memory, so that when retrieving a key, you do not need to scan the entire HFile, but only need to find the block where the key is located in memory, read the entire block into memory through a disk io, and then find the needed key. DataBlock Index was eliminated by LRU mechanism.

The Data Block,Meta Block of HFile is usually stored by compression. After compression, the network IO and disk IO can be greatly reduced. The resulting overhead, of course, is to spend cpu for compression and decompression.

Currently, HFile compression supports two ways: Gzip,Lzo.

3. Memstore and StoreFile

A HRegion consists of a plurality of Store, and each Store contains all the data Store of a column family, including the Memstore located in memory and the StoreFile located on the hard disk.

The write operation first writes to Memstore. When the amount of data in Memstore reaches a certain threshold, HRegionServer starts the FlashCache process to write StoreFile, and each write forms a separate StoreFile.

When the StoreFile size exceeds a certain threshold, the current HRegion is divided into two, and the HMaster is assigned to the corresponding HRegion server to achieve load balancing.

When the client retrieves data, look for it in memstore first, and then find storefile if you can't find it.

4. HLog (WAL log)

WAL means Write ahead log, which is similar to binlog in mysql. It is used for disaster recovery. Hlog records all changes to the data. Once the data is modified, it can be recovered from the log.

Each Region Server maintains one Hlog instead of one per Region. This mixes logs from different region (from different table) so that continuously appending a single file reduces the number of disk addressing times compared to writing multiple files at the same time, thus improving write performance to table. The trouble is that if a region server is offline, in order to restore the region on it, the log on the region server needs to be split and then distributed to other region server for recovery.

The HLog file is a normal Hadoop Sequence File:

The Key of HLog Sequence File is a HLogKey object, and the HLogKey records the attribution information of the written data. In addition to the names of table and region, it also includes that sequence number and timestamp,timestamp are "write time", the starting value of sequence number is 0, or the last time sequence number is stored in the file system.

The Value of HLog Sequece File is the KeyValue object of HBase, that is, the KeyValue in the corresponding HFile, as described above.

Reading and writing process 1. Read request process:

HRegionServer stores the meta table and table data. To access the table data, Client first visits the zookeeper to get the location information of the meta table from the zookeeper, that is, to find out which HRegionServer the meta table is saved on.

Then Client accesses the HRegionServer where the Meta table is located through the IP of the HRegionServer just obtained, thus reading the Meta, and then obtaining the metadata stored in the Meta table.

Client accesses the corresponding HRegionServer through the information stored in the metadata, and then scans the Memstore and Storefile of the HRegionServer to query the data.

Finally, HRegionServer responds to the queried data to Client.

View meta information

Hbase (main): 011 scan 0 > hbase:meta'2. Write request process:

Client also accesses zookeeper first, finds the Meta table, and obtains the Meta table metadata.

Determine the HRegion and HRegionServer servers that correspond to the data to be written.

The Client initiates a write data request to the HRegionServer server, and then the HRegionServer receives the request and responds.

Client first writes the data to HLog to prevent data loss.

The data is then written to Memstore.

If both HLog and Memstore are written successfully, the data is written successfully

If the Memstore reaches the threshold, the data in the Memstore is flush to the Storefile.

When there are more and more Storefile, it triggers the Compact merge operation, merging too much Storefile into one large Storefile.

As the Storefile becomes larger and larger, the Region becomes larger and larger, and when the threshold is reached, the Split operation is triggered to split the Region in two.

Details description:

HBase uses MemStore and StoreFile to store updates to the table.

When the data is updated, it is first written to Log (WAL log) and memory (MemStore). The data in MemStore is sorted. When MemStore accumulates to a certain threshold, a new MemStore is created, and the old MemStore is added to the flush queue, and a separate thread flush to disk to become a StoreFile. At the same time, the system records a redo point in zookeeper, indicating that the changes made before this time have been persisted.

When an accident occurs on the system, which may result in data loss in memory (MemStore), Log (WAL log) is used to recover the data after checkpoint.

StoreFile is read-only and cannot be modified once created. So the update of HBase is actually a constantly appended operation. When the StoreFile in a Store reaches a certain threshold, there will be a merge (minor_compact, major_compact), which merges the changes to the same key together to form a large StoreFile. When the size of the StoreFile reaches a certain threshold, the StoreFile will be split, which is divided into two StoreFile.

Because the updates to the table are constantly appended, when compact, you need to access all the StoreFile and MemStore in Store and merge them by row key. Because StoreFile and MemStore are sorted, and StoreFile has in-memory indexes, the merging process is relatively fast.

HRegion manages HRegion allocation

A HRegion can only be assigned to one HRegion Server at any one time. HMaster keeps track of what HRegion Server is currently available. And which HRegion is currently assigned to which HRegion Server and which HRegion has not yet been assigned. When a new HRegion needs to be allocated and there is space available on a HRegion Server, the HMaster sends a load request to the HRegion Server and assigns the HRegion to the HRegion Server. After the HRegion Server is requested, it begins to provide services to this HRegion.

HRegion Server is online

HMaster uses zookeeper to track HRegion Server status. When a HRegion Server starts, it first creates its own znode under the server directory on the zookeeper. Because HMaster subscribes to change messages in the server directory, HMaster can get real-time notification from zookeeper when files are added or deleted in the server directory. So as soon as HRegion Server comes online, HMaster can get the news immediately.

HRegion Server offline

When HRegion Server goes offline, it disconnects its session with zookeeper, and zookeeper automatically releases the exclusive lock on the file that represents the server. HMaster can determine:

The network between HRegion Server and zookeeper is disconnected.

HRegion Server is dead.

In either case, HRegion Server can no longer serve its HRegion, and HMaster deletes the znode data that represents the HRegion Server in the server directory and assigns the HRegion of this HRegion Server to other living nodes.

The working mechanism of HMaster master is online

Master starts with the following steps:

Acquire the only lock representing active master from zookeeper to prevent other HMaster from becoming master.

Scan the server parent node on zookeeper for a list of currently available HRegion Server.

Communicate with each HRegion Server to obtain the correspondence between the currently allocated HRegion and HRegion Server.

Scan the collection of .META.region, calculate the currently unallocated HRegion, and put them in the list of HRegion to be allocated.

Master offline

Since HMaster only maintains the metadata of tables and region, but does not participate in the process of table data IO, HMaster offline only causes all metadata modifications to be frozen (unable to create deleted tables, cannot modify table schema, cannot load balance HRegion, cannot handle HRegion uploading and downloading, and cannot merge HRegion. The only exception is that HRegion split can be carried out normally, because only HRegion Server participates), and table data reading and writing can be carried out normally. Therefore, HMaster offline has no impact on the entire HBase cluster in a short period of time.

As can be seen from the launch process, the information saved by HMaster is all redundant (all can be collected or calculated from other parts of the system)

Therefore, in a typical HBase cluster, there is always a HMaster providing services, and there is more than one 'HMaster' waiting for the opportunity to take its place.

Three important mechanisms of HBase 1. Flush mechanism

1. (hbase.regionserver.global.memstore.size) default; 40% of heap size

The size of the global memstore of the regionServer. Exceeding this size will trigger the flush to disk operation. The default is 40% of the heap size, and the regionserver-level flush will block client read and write.

2. (hbase.hregion.memstore.flush.size) default: 128m

The cache size of memstore in a single region is larger than that, and the entire HRegion will flush.

3. (hbase.regionserver.optionalcacheflushinterval) default: 1h

The longest time a file in memory can survive before it is automatically refreshed

4. (hbase.regionserver.global.memstore.size.lower.limit) default: heap size * 0.4 * 0.95

Sometimes the "write load" of the cluster is very high, and the number of writes has always exceeded the amount of flush, so we hope that the memstore does not exceed a certain security setting. In this case, the write operation is blocked until the memstore is restored to a "manageable" size, which is the default heap size of * 0.4 * 0.95, that is, when a regionserver-level flush operation is sent, it blocks client writes until the entire regionserver-level memstore is heap size * 0.4 * 0.95.

5. (hbase.hregion.preclose.flush.size) default: 5m

When the size of the memstore in a region is greater than this value, and we trigger the close of the region, we will first run the "pre-flush" operation to clean up the memstore that needs to be closed, and then take the region offline. When a region goes offline, we can't do any more writes. If a memstore is very large, the flush operation will take a lot of time. "pre-flush" >

6. (hbase.hstore.compactionThreshold) default: more than 3

The number of hfile allowed to be stored in a store will be written into a new hfile, that is, the memstore corresponding to each column family of each region. When flush is hfile, by default, these files will be merged and rewritten into a new file when there are more than 3 hfile. The larger the number, the less time it takes to trigger a merge, but the longer it takes for each merge.

2. Compact mechanism

Merge small storeFile files into large HFile files.

Clean up expired data, including deleted data

Save the version number of the data as 1.

After reading the above, have you mastered the basic principles of HBase? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.