In-depth early warning: in-depth understanding of HBase system architecture 07/13 Update SLTechnology News&Howtos

In-depth early warning: in-depth understanding of HBase system architecture

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

The composition of HBase

Physically, HBase is made up of three types of servers in master-slave mode. The three servers are: Region server,HBase HMaster,ZooKeeper.

Region server is responsible for data reading and writing services. Users can access the data by communicating with Region server.

HBase HMaster is responsible for the allocation of Region and the creation and deletion of databases.

As a part of HDFS, ZooKeeper is responsible for maintaining the state of the cluster (whether a server is online, the synchronization of data between servers and the election of master, etc.).

In addition, Hadoop DataNode is responsible for storing all data managed by Region Server. All data in HBase is stored as a HDFS file. For the sake of making the data managed by Region server more localized, Region server is distributed according to DataNode. HBase's data is stored locally when it is written. However, when a region is removed or reassigned, it is possible that the data is not local. This situation can only be resolved after the so-called compaction.

NameNode is responsible for maintaining the meta-information (metadata) of all the physical blocks that make up the file.

The HBase structure is shown in the following figure:

Regions

The table in HBase is split horizontally into so-called region based on the value of row key. A region contains all rows in the table where the row key lies between the start key value and the end key value of the region. The node in the cluster responsible for managing the Region is called Region server. Region server is responsible for reading and writing data. Each Region server can manage approximately 1000 region. The structure of Region is shown in the following figure:

HMaster of HBase

HMaster is responsible for the allocation of region, database creation and deletion operations.

Specifically, the responsibilities of HMaster include:

Regulating the work of Region server

Region is allocated when the cluster starts, and region is reassigned according to the needs of recovery services or load balancing.

Monitor the working status of the Region server in the cluster. (by listening for zookeeper notification of ephemeral node status).

Manage database

Provides an interface to create, delete, or update tables.

The work of HMaster is shown in the following figure:

ZooKeeper

HBase uses ZooKeeper to maintain the state of servers in the cluster and coordinate the work of distributed systems. ZooKeeper maintains the survival and accessibility of the server and provides notification of server failure / downtime. ZooKeeper also uses consistency algorithms to ensure synchronization between servers. He is also responsible for the Master election. It should be noted that to ensure good consistency and smooth Master elections, the number of servers in the cluster must be odd. For example, three or five.

Sorted out a list of big data's learning materials suitable for programmers in 2018. The added QQ group: 834325294 indicates that 5CTO can be obtained free of charge.

The work of ZooKeeper is shown in the following figure:

Cooperation among the components of HBase

ZooKeeper is used to coordinate the state information shared among members of a distributed system. Region Server and HMaster are also connected to ZooKeeper. ZooKeeper maintains the corresponding ephemeral node for active connections through heartbeat information. As shown in the following figure:

Each Region server creates a corresponding ephemeral node in the ZooKeeper. HMaster monitors the status of these ephemeral node to find the Region server that is working or failing offline. HMaster compete with each other to create ephemeral node for Master election. ZooKeeper selects the first successfully created HMaster in the zone as the only active one. The active HMaster sends a heartbeat message to the ZooKeeper to indicate its online status. The inactive HMaster monitors the status of the active HMaster and re-elects the active HMaster after a failure, thus achieving high availability of the HBase.

If Region server or HMaster cannot successfully send heartbeat information to ZooKeeper, the corresponding ephemeral node will be deleted after its connection to ZooKeeper times out. Other nodes that monitor the state of the ZooKeeper will get the information that the corresponding node does not exist and deal with it accordingly. The active HMaster listens to the information of the Region Server and reassigns the Region server after it goes offline to restore the corresponding service. The inactive HMaster listens to the information of the active HMaster and reselects the active HMaster for service after being activated and offline.

The first read and write of HBase

There is a special table in HBase that acts as a directory, called META table. The address information of the cluster region is stored in META table. The location of the META table is saved in ZooKeeper.

When the user first wants to read or write in HBase, the following steps are performed:

1. The customer gets the information to save the Region server of the META table from the ZooKeeper.

two。 The customer asks the Region server for the address of the Region server of the region that manages the row key they want to access. The customer caches this information as well as the location of the META table.

3. The customer communicates with the region responsible for their row to read and write to the bank.

In future read and write operations, the customer will look for the corresponding Region server address based on the cache. Unless the Region server is no longer reachable. At this point, the customer will revisit the META table and update the cache. This process is shown in the following figure:

META table of HBase

The information about all the region in HBase is saved in META table.

The format of META table is similar to that of B tree.

The structure of META table is as follows:

Key: the start key of region, region id.

Value: Region server

As shown in the following figure:

The composition of Region Server

The Region server running on HDFS DataNode consists of the following sections:

WAL: both Write Ahead Log. WAL is a file in the HDFS distributed file system. WAL is used to store new data that has not been written to the persistent store. WAL is also used for data recovery in the event of server failure.

Block Cache:Block cache is the read cache. Block cache stores frequently read data in memory to improve the efficiency of reading data. When the Block cache space is full, the data that is read least frequently will be killed out.

MemStore:MemStore is the write cache. Data written from the WAL but not yet written to the hard disk is stored. The data in MemStore is sorted before it is written to the hard disk. Each column family in each region corresponds to a MemStore.

Hfiles:Hfiles exists on the hard disk and stores rows of data according to the key of the sorting number.

The structure of Region server is shown in the following figure:

* * write steps of HBase

Step one

When a user of HBase issues a PUT request (that is, a write request from HBase), the first step in HBase processing is to write the data to HBase's write-ahead log (WAL).

The WAL file is written sequentially, that is, all the newly added data is added to the end of the WAL file. The WAL file is stored on the hard drive.

When there is a problem with the server, the WAL can be used to recover data that has not been written to the HBase (because the WAL is saved on the hard disk).

As shown in the following figure:

Step two

When the data is successfully written to WAL, HBase stores the data in MemStore. At this point, HBase will notify the user that the PUT operation has been successful.

The process is shown in the following figure:

MemStore of HBase

Memstore exists in memory, in which the data to be written to the hard disk is stored in the order of keys. The data is also written into the HFile in sequence by pressing the key. Each Column family in each Region corresponds to a Memstore file. Therefore, updates to the data also correspond to each Column family.

As shown in the following figure:

HBase Region Flush

When enough data has been accumulated in the MemStore, the data in the entire Memcache will be written to a new HFile in the HDFS at once. Therefore, one Column family in a HDFS may correspond to multiple HFile. This HFile contains the corresponding cell, or an instance of the key value. These files are created as the operations on the data accumulated in MemStore are flush to the hard disk.

It is important to note that MemStore is stored in memory, which is why the number of Column family in HBase is limited. Each Column family corresponds to a MemStore, and when the MemStore is full, the accumulated data will be flush to the hard disk at one time. At the same time, in order for HDFS to know which data has been stored, the sequence number of the last write operation is saved in MemStore.

The largest sequence number in each HFile is stored as meta field, which indicates the end point where the previous data is stored to the hard disk and the starting point where it continues to be stored. When a region starts, it reads the sequence number in each HFile to know what the latest operation sequence number (the largest sequence number) is in the current region.

As shown in the following figure:

HFile

Key-value data pairs in HBase are stored in HFile. As mentioned above, when enough data is accumulated in MemStore, the entire data is written to a new HFile in HDFS. Because the data in MemStore is sorted by key, this is a sequential writing process. This operation is very efficient because sequential writes avoid the process of addressing a large number of disks.

As shown in the following figure:

The structure of HFile

HFile includes a multi-tier indexing system. This multi-tier index is the HBase that can find data without reading the entire file. This multi-tier index is similar to a B+ tree.

Key-value pairs are arranged in ascending order according to the size of the key.

The index points to a block of 64KB size.

Each data block has its corresponding leaf index (leaf-index).

The last key of each data block is used as the intermediate index (intermediate index).

The root index index points to the intermediate index.

The end of the file points to meta block. Because the meta block is written to the file at the end of the data write operation to the hard disk. The end of the file also contains some other information. Such as bloom filter and time information. Bloom filter can help HBase speed up data queries. Because HBase can use Bloom filter to skip files that do not contain the keys of the current query. Time information can help HBase skip files outside the time region expected by the read operation when querying.

As shown in the following figure:

Index of HFile

The index of HFile is read into memory when HFile is opened. This ensures that data retrieval requires only one hard disk query operation.

As shown in the following figure:

Read merge (Read Merge) and read magnification (Read amplification) of HBase

From the above discussion, we already know that the cell corresponding to a row of data in HBase may be located in several different files or storage media. For example, the rows that have been saved to the hard disk are in the HFile on the hard disk, the newly added or updated data is in the MemStore in memory, and the recently read data is in the Block cache in memory. So when we read a row, in order to return the corresponding row data, HBase needs to do the so-called read merge operation according to the data in Block cache,MemStore and HFile on the hard disk.

1.HBase first looks for the data it needs from Block cache (the read cache of HBase).

two。 Next, HBase looks for data in MemStore. Because as the write cache for HBase, MemStore contains the latest version of the data.

3. If the HBase does not find all the data of the cell corresponding to the row from the Block cache and MemStore, the system then reads the cell data of the target row from the corresponding HFile based on the index and bloom filter.

As shown in the following figure:

One thing to note here is the so-called read magnification effect (Read amplification). According to the previous article, the data corresponding to a MemStore may be stored in multiple different HFile (due to multiple flush), so during a read operation, the HBase may need to read multiple HFile to get the desired data. This can affect the performance of HBase.

As shown in the following figure:

Compaction of HBase

Minor Compaction

HBase automatically selects some smaller HFile for merging and writes the result to several larger HFile. This process is called Minor compaction. Minor compaction merges smaller files into larger files in the form of Merge sort, thus reducing the number of stored HFile and improving the performance of HBase.

This process is shown in the following figure:

Major Compaction

The so-called Major Compaction means that HBase reorganizes and merges all the HFile corresponding to a Column family into a HFile, and deletes the deleted or expired cell in the process, and updates the value of the existing cell. This operation greatly improves the efficiency of reading. But because Major compaction needs to reorganize all the HFile and write to a HFile, this process involves a large number of hard disk Icano operations and network data communication. This process is also known as write amplification (Write amplification). In the process of Major compaction, the current Region is basically inaccessible.

Major compaction can be configured to run automatically at a specified time. To avoid affecting the business, Major compaction is usually arranged at night or on weekends.

One thing to note is that Major compaction downloads all remote data served by the current Region to the local Region server. These remote data may be stored on the remote server due to server failure or load balancing.

This process is shown in the following figure:

Region Segmentation (Region split)

First, let's take a quick review of Region:

Tables in HBase can be split horizontally into one or more region based on row keys. Each region contains a contiguous row key between a start key value and an end key value.

The default size of each region is 1GB.

The corresponding Region server is responsible for providing customers with services to access the data in a certain region.

Each Region server can manage about 1000 region (these region may come from the same table or from different tables).

As shown in the following figure:

Each table initially corresponds to a region. As the amount of data in the region increases, the region is split into two sub-region. Half of the original data is stored in each child region. At the same time, Region server notifies HMaster of the split. For load balancing reasons, HMaster may assign the newly generated region to other Region server management (which leads to the case of Region server service remote data).

As shown in the following figure:

Load balancing for read operations (Read Load Balancing)

The segmentation of Region initially takes place locally in Region server. However, for the sake of load balancing, HMaster may assign the newly generated region to other Region server for management. This leads to the situation where Region server manages the region stored on the remote server. This situation will continue until the next Major compaction. As shown above, Major compaction downloads any data that is not local to local.

That is, the data in HBase is always stored locally when it is written. But with the redistribution of the region (due to load balancing or data recovery), the data is no longer necessarily local to the Region server. This situation will be resolved after Major compaction.

As shown in the following figure:

Data backup of HDFS (Data Replication)

All data read and write operations in HDFS are directed at the master node. HDFS automatically backs up WAL and HFile. HDFS has been providing reliable and secure data storage since HBase. When the data is written locally to HDFS, the other two backup data are stored on the other two servers.

As shown in the following figure:

Abnormal recovery of HBase (Crash Recovery)

WAL files and HFile are stored on your hard disk and there are backups, so restoring them is very easy. So how does HBase restore MemStore in memory?

When Region server goes down, the region it manages is inaccessible until the fault is discovered and fixed. ZooKeeper is responsible for monitoring the working status of the server according to the heartbeat information of the server. When a server goes offline, ZooKeeper sends a notification that the server is offline. After receiving this notification, HMaster will perform a restore operation.

HMaster first assigns the region managed by the down Region server to other active Region server that are still working. HMaster then splits the WAL of the server and assigns it to the corresponding newly allocated Region server for storage. The new Region server will read and sequentially perform the data operations in the WAL, thus recreating the corresponding MemStore.

As shown in the following figure:

Data recovery (Data Recovery)

A series of data operations are stored in the WAL file. Each operation corresponds to a row in the WAL. The new operations are written sequentially at the end of the WAL file.

So how do you recover the data stored in MemStore when it is lost for some reason? Since HBase, WAL has restored it. The corresponding Region server reads the WAL sequentially and performs the operations in it. This data is stored in the current MemStore in memory and sorted. Eventually, when the MemStore is full, the data is flush to the hard drive.

Need more big data development related learning materials (Hadoop,spark,kafka,MapReduce,scala, recommendation algorithm, real-time transaction monitoring system, user analysis behavior, recommendation system) free access to additional skirt: 792133408 clicks to join [big data development communication circle]

As shown in the following figure:

Advantages and disadvantages of Apache HBase

Advantages

Strong consistency model

When a write operation is confirmed, all users will read the same value.

Reliable automatic extension

It will be split automatically when there is too much data in region.

Use HDFS to distribute storage and back up data.

Built-in recovery function

Use WAL for data recovery.

Good integration with Hadoop

MapReduce is very intuitive on HBase.

Shortcoming

The WAL reply is slow.

Abnormal recovery is complex and inefficient.

Major compaction that requires a lot of resources and a lot of Igamo operations

Https://yq.aliyun.com/articles/601358?spm=a2c4e.11153987.0.0.1d67190akASwj0

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.