How does the Hbase architecture 07/02 Update SLTechnology News&Howtos

How does the Hbase architecture

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "Hbase architecture how", the content of the article is simple and clear, easy to learn and understand, now please follow the editor's ideas slowly in depth, together to study and learn "how to Hbase architecture" bar!

It is still the explanation of Hadoop components. Today, when it comes to HBase architecture, it is typed word by word. I hope you will forward it and pay attention, and will always write high-quality content for you.

Physically, Hbase is a master-slave (master-slave) architecture consisting of three types of server:

Region Server is responsible for handling read and write requests for data, and the client interacts directly with Region Server when requesting data.

HBase Master, responsible for the allocation of Region, DDL (create, delete table) and other operations.

Zookeeper, as part of HDFS, is responsible for maintaining cluster state.

Of course, the underlying storage is based on Hadoop HDFS:

Hadoop DataNode is responsible for storing data managed by Region Server. All HBase data is stored in the HDFS file. Region Server and HDFS DataNode tend to be distributed together so that Region Server can localize the data (data locality, that is, put the data as close as possible to those who need it). The HBase data is local when written, but when the region is migrated, the data may no longer be local until the compaction is completed.

Hadoop NameNode maintains meta-information for all HDFS physical data block.

Regions

The HBase table (Table) is horizontally split into several region according to the range of rowkey. Each region contains all the lines (row) between the start key and end key of this region. Regions is assigned to certain nodes in the cluster to manage, that is, Region Server, which is responsible for processing read and write requests for data. Each Region Server can manage approximately 1000 regions.

HBase Master

Also known as HMaster, responsible for Region allocation, DDL (create and delete tables) and other operations:

Coordinate and coordinate all region server:

Assign regions at startup and reallocate regions during fault recovery and load balancing

Monitor all Region Server instances in the cluster (get notification information from Zookeeper)

Administrator functions:

Provides interfaces for creating, deleting, and updating HBase Table

Zookeeper

HBase uses Zookeeper as a distributed management service to maintain the state of all services in the cluster. Zookeeper maintains which servers is healthy and available, and notifies you if server fails. Zookeeper uses consistency protocols to ensure the consistency of distributed states. Note that this requires three or five machines to do the conformance agreement.

How do these components work together

Zookeeper is used to coordinate the sharing of cluster state information in distributed systems. Region Servers maintains a session (session) with online HMaster (active HMaster) and Zookeeper. Zookeeper maintains all temporary nodes (ephemeral nodes) through heartbeat detection.

Each Region Server creates an ephemeral node. HMaster monitors these nodes to find available Region Servers, and it also monitors these nodes for failures.

HMaster competes to create ephemeral nodes, while Zookeeper decides who is the first to act as an online HMaster, ensuring that there is only one HMaster online. The online HMaster (active HMaster) sends a heartbeat to the Zookeeper, and the offline standby HMaster (inactive HMaster) listens for possible failures of the active HMaster and is ready to go up.

If a Region Server or HMaster fails or fails to send a heartbeat for various reasons, their session with the Zookeeper will expire, the ephemeral node will be deleted and the listeners will receive the message. Active HMaster listens for region servers offline messages and then recovers the failed region server and the region data it is responsible for. On the other hand, Inactive HMaster is concerned with the news that active HMaster is offline, and then the competition goes online into active HMaster.

Comments: this paragraph is very important and involves some core concepts in distributed system design, including cluster state, consistency and so on. We can see that Zookeeper is a bridge to communicate everything. All participants maintain heartbeat conversations with Zookeeper and get the cluster state information they need from Zookeeper to manage other nodes and change roles. This is also a very important idea in distributed system design. Special services are used to maintain distributed cluster state information.

First read and write operation

There is a special HBase Catalog table called Meta table (which is actually a special HBase table), which contains the location information of all the regions in the cluster. Zookeeper saves the location of the Meta table.

When the HBase first read or write operation arrives:

The client gets which Region Server is responsible for managing the Meta table from the Zookeeper.

The client will query the Region Server that manages the Meta table and then know which Region Server is responsible for managing the rowkey required for this data request. The client caches this information, as well as the location information of the Meta table itself.

Then the client goes back to the Region Server to get the data.

For future read requests, the client can obtain the location information of the Meta table directly from the cache (on which Region Server), as well as the location information of the previously accessed rowkey (on which Region Server), unless the cache expires due to the migration of the Region. At this point, the client will repeat the above steps to retrieve the relevant location information and update the cache.

Comments: the client actually takes two steps to read and write data: the first step is to locate and obtain which Region Server management rowkey belongs to from Meta table; the second step is to read and write data from the corresponding Region Server. There are two Region Server involved here, to understand their respective role functions. Meta table will be described in more detail below.

HBase Meta Table

Meta table is a special HBase table that holds all the region lists in the system. This table is similar to a b-tree, and its structure is roughly as follows:

Key:table, region start key, region id

Value:region server

Region Server composition

Region Server runs on HDFS DataNode and consists of the following components:

WAL:Write Ahead Log is a file on a distributed file system, which is used to store new data that has not been persisted. It is used for fault recovery.

BlockCache: this is the read cache, which stores the most frequently accessed data in memory, which is the LRU (Least Recently Used) cache.

MemStore: this is the write cache, which stores new data in memory that has not been persisted to the hard disk. When written to the hard disk, the data is sorted first. Notice that each Region has a MemStore for each Column Family.

HFile stores HBase data on a hard disk (HDFS) in the form of an ordered KeyValue.

Comments: this paragraph is the top priority. Understanding the composition of Region Server is very important to understand the structure of HBase. To fully understand the functions of Region Server and the role of each component, the behavior and functions of these components will be unfolded one by one in subsequent paragraphs.

HBase write data step

When the client initiates a write data request (Put operation), the first step is to write the data to the WAL:

The new data is appended to the end of the WAL file.

WAL is used to recover data that has not been persisted when the failure recovers.

After the data is written to the WAL, it is added to the MemStore write cache. The server can then return ack to the client to indicate that the write data is complete.

Comments: pay attention to the order in which WAL and MemStore are updated when data is written, which cannot be exchanged. You must first WAL and then MemStore. If, on the contrary, the MemStore is updated first, crash occurs in the Region Server, the update in memory is lost, and the data cannot be recovered before the data is persisted to WAL. In theory, WAL is a mirror of data in MemStore and should be consistent unless a system crash occurs. Also note that the update WAL is appended at the end of the file. This disk operation has high performance and will not affect the overall response time of the request.

HBase MemStore

MemStore caches HBase's data updates in memory in the form of an ordered KeyValues, which is the same as in HFile. Each Column Family has a MemStore, and all updates are sorted by Column Family.

HBase Region Flush

After enough data has been accumulated in MemStore, the entire ordered dataset is written to a new HFile file on HDFS. HBase creates a HFile for each Column Family, which stores specific Cell, or KeyValue data. Over time, HFile will continue to be generated, because KeyValue will continue to be written from MemStore to the hard drive.

Note that this is one of the reasons why HBase limits the number of Column Family. Every Column Family has a MemStore;. If a MemStore is full, all MemStore will be written to the hard disk. It also records the maximum sequence number (sequence number) of the last written data, so that the system can know which data has been persisted so far.

The maximum sequence number is an meta message that is stored in each HFile to indicate which piece of data the persistence has taken place and where it should be continued. When region starts, these serial numbers are read, the largest of which is taken as the base serial number, and subsequent new data updates increment the new serial number based on that value.

Comments: here is a concept of serial number, each HBase data update will be bound to a new self-increasing serial number. Each HFile stores the maximum sequence number of the data it stores, and this meta-information is very important. It is the equivalent of a commit point, telling us that the data prior to this serial number has been persisted to the hard disk. It is used not only when the region starts, but also when the failure recovers, it also tells us where to play back the historical updates of the data from where in the WAL.

HBase HFile

The data is stored in HFile, in the form of Key/Value. When MemStore accumulates enough data, the entire ordered dataset is written to a new HFile file to HDFS. The whole process is a sequential write operation, which is very fast because it does not need to move the disk head. (note that HDFS does not support random file modification operations, but does support append operations.)

HBase HFile file structure

HFile uses a multi-tier index to query data without having to read the entire file, which is similar to a B + tree:

KeyValues orderly storage.

Rowkey points to index, while index points to a specific data block, in units of 64 KB.

Every block has its leaf index.

The last key of each block is stored in the middle-tier index.

The index root node points to the middle-tier index.

Trailer points to the original information data block, which is written at the end of the HFile file when the data is persisted to HFile. Trailer also contains information such as Bloom filters and time ranges. The Bloom filter is used to skip files that do not contain the specified rowkey, and time range information is filtered based on time, skipping files that are not within the requested time range.

HFile index

The index just discussed will be loaded into memory when the HFile is opened, so that the data query only needs to be queried by a hard disk.

HBase Read merger

We have found that the KeyValue cells for each line (row) may be in a different place, and these cell may be written to the HFile, recently updated, still in MemStore, or recently read, cached in Block Cache. So when you read a line of row, how does the system return the corresponding cells? A read operation merges cell in Block Cache,MemStore and HFile:

First scanner reads the cells from Block Cache. The recently read KeyValue is cached here, which is a LRU cache.

Scanner then reads the MemStore, or write cache, which contains the most recently updated data.

If scanner does not find the corresponding cells in both BlockCache and MemStore, HBase uses the index and Bloom filter in BlockCache to load the corresponding HFile into memory and find the requested row cells.

As discussed earlier, there may be more than one HFile per MemStore, so an read request may need to read more files, which can affect performance, which is called read magnification (read amplification).

Comments: from the timeline point of view, the HFile is also orderly, in essence, they save each region of each column family data history update. Therefore, for the same cell of the same rowkey, it may also have multiple versions of data distributed in different HFile, so it may need to read multiple HFiles, so the performance overhead will be high, especially when the data locality is not satisfied. This read amplification situation will be more serious. This is also the reason why compaction is necessary, which will be discussed later.

HBase Minor Compaction

HBase automatically merges some small HFile and rewrites it into a small number of larger HFiles. This process is called minor compaction. It uses the merge sorting algorithm to merge small files into large files, effectively reducing the number of HFile.

HBase Major Compaction

Major Compaction merges and rewrites all HFiles under each Column Family into a single large HFile. In the process, deleted and expired cell are physically deleted, which improves read performance. But because major compaction rewrites all HFile, it will incur a lot of hard disk I / O and network overhead. This is called write amplification (Write Amplification).

Major compaction can be set to schedule automatically. Because of the problem of write amplification, major compaction is usually arranged on weekends and midnight. The MapR database has made improvements to this and does not require compaction. Major compaction can also move data migration caused by server crash or load balancing back away from Region Server, so that data locality can be restored.

HDFS data backup

All reads and writes take place on the main DataNode node of the HDFS. HDFS automatically backs up the files blocks for WAL and HFile. HBase relies on HDFS to keep its data intact and secure. When the data is written to the HDFS, one copy is written to the local node and the other two backups are written to the other nodes.

Both WAL and HFiles are persisted to the hard drive and backed up. So how does HBase recover data in MemStore that has not been persisted to HFile? This issue will be discussed in the following chapters.

HBase fault recovery

When crash occurs in a Region Server, the region it manages cannot be accessed until the crash is detected and then the failure recovery is completed. These region can not restore access. Zookeeper relies on heartbeat detection to discover node failures, and then HMaster is notified of region server failures.

When HMaster discovers a region server failure, HMaster assigns the regions managed by that region server to other healthy region servers. In order to recover data in the MemStore of the failed region server that has not been persisted to the HFile, HMaster splits the WAL into several files and saves them on the new region server. Each region server then plays back the data in its own WAL fragments to establish a MemStore for the new region to which it is assigned.

WAL contains a series of modifications, each of which represents a put or delete operation. These changes are written in chronological order, and when persisted, they are written to the tail of the WAL file in turn.

What if the data is still in MemStore and has not been persisted to HFile? the WAL file will be played back. The way to do this is to read the WAL file, sort and add all changes to MemStore, and finally MemStore will be written to HFile.

Comments: fault recovery is an important feature of HBase reliability guarantee. WAL plays a key role here. When the WAL is split, the data is allocated to the corresponding new region server according to the region, and then the region server is responsible for playing back this part of the data to the MemStore.

Thank you for your reading, the above is the content of "how the Hbase architecture", after the study of this article, I believe you have a deeper understanding of how the Hbase architecture, the specific use of the need for you to practice and verify. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.