HBase principle-- sequenceId to be understood 07/08 Update SLTechnology News&Howtos

HBase principle-- sequenceId to be understood

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Why do I need sequenceId?

When HBase data is written, it is first appended to HLog and then written to Memstore, which means that a piece of data will exist in two different forms in two places. Does the same data in those two places need a mechanism to relate the two? If some friends want to ask why they need to be related, the author raises three related questions here:

After the data in Memstore is flush to the HDFS file, can the data corresponding to HLog be deleted? Otherwise, HLog will grow indefinitely! So the question is, how to map the flush-to-HDFS data in Memstore to the relevant log data in HLog?

A single HLog in HBase has a fixed size, the maximum number of log files is fixed, and the default maximum number of HLog files is 8. If the number of logs exceeds this number, the oldest HLog log must be deleted. So the question is, how do you know that all the data corresponding to the HLog log to be deleted has been closed? (if you know which data is not down, you can force flush on it, and then delete the HLog.)

Data in Memstore is bound to be lost after RegionServer downtime, and everyone knows that it can be recovered through HLog. So the question is, what data in HLog needs to be recovered? What doesn't need to be recovered?

These three problems are essentially a problem. All of them need a medium to represent which point of the data Flush in the Memstore corresponds to the location of the HLog. This medium is the focus of this article-sequenceId.

Core structure of HLog Log

To understand sequenceId, you need to briefly understand the basic structure of the HLog file in HBase, as shown in the following figure, with two main concerns:

Each RegionServer has one or more HLog (there is only one by default. Version 1.x enables the MultiWAL function and allows multiple HLog). Each HLog is shared by multiple Region, as shown in the figure, Region A, Region B, and Region C share a HLog file.

The log unit WALEntry in HLog represents the minimum append unit for a row-level update (red / × × × small box in the figure). It consists of two parts: HLogKey and WALEdit,HLogKey contain multiple attribute information, including table name, region name, sequenceid, and so on. WALEdit is used to represent a set of updates in a transaction, and a row-level transaction can atomically manipulate multiple columns in the same row. The WALEdit in the figure above contains multiple KeyValue.

What is sequenceid?

Sequenceid is the self-incrementing sequence number of a row-level transaction at the region level. I came up with this definition and there are three areas that need to be paid attention to:

Sequenceid is a self-increasing serial number. It is easy to understand that it will continue to increase over time and will not decrease.

Sequenceid is the self-incrementing sequence number of a row-level transaction. What is a row-level transaction? To put it simply, it is to update multiple column families and columns in a row. Row-level transactions can guarantee the atomicity, consistency, persistence and isolation of the update. HBase assigns a self-increasing sequence number to each row-level transaction.

Sequenceid is a self-increasing sequence number at the region level. Each region maintains its own sequenceid, and the sequenceid of different region is independent of each other.

Under such definition conditions, the HLog will look like the following figure:

There are two Region logging records in HLog, and the numbers in the box represent sequenceid, and the sequenceid of each region increases independently over time.

Question 1: when can HLog be reclaimed out of date?

The right part of the dotted line in the following figure is a HLog file formed by shredding over a single HLog size threshold. The question is when this file can be reclaimed and deleted by the system. In theory, only the maximum sequenceid corresponding to all Region in this file can be deleted. For example, in the following figure, if the maximum sequenceid (5) corresponding to RegionA has been removed, and the maximum sequenceid (5) corresponding to RegionB has also been removed, then the HLog can be deleted. So how did it happen?

RegionServer maintains a variable oldestUnflushedSequenceId for each Region (actually for each Store. For ease of explanation, it is considered to be Region here, which does not affect the principle), indicating that the earliest seqid of the Region, that is, all the data before the seqid has been set down. Next, let's take a look at how this value is maintained during flush, and how to use this value to determine the expiration of HLog.

The following figure is a schematic diagram of the change of oldestUnflushedSequenceId variable during the flush process. Initially, it is null. Suppose that at a certain time stage 2 RegionA (red box) will execute flush, and the data corresponding to sequenceId 1 / 4 in the middle HLog will fall off the disk. Before executing flush, HBase will append an empty Entry to HLog, just to get the next sequenceId (5), and assign this sequenceId to OldestUnflushedSequenceId-RegionA. As shown in the figure, the third phase OldestUnflushedSequenceId-RegionA points to the Entry with sequenceId 5.

As you can see, this variable moves forward some distance after each flush. This variable is very important and is the key to solving the three problems mentioned at the beginning of the article. Based on the above understanding of this variable, let's see whether the right HLog can be deleted in the following two scenarios:

Obviously, there is still data on the right side of HLog in scenario 1 (sequenceid=5 has not yet been removed), so it cannot be deleted; while all the data of the right HLog in scenario 2 has been unloaded, so this HLog can theoretically be deleted and recycled.

Question 2: if the oldest HLog is deleted after the number of HLog exceeds the threshold (maxlogs), which Region should be forced to refresh?

Suppose the current system sets the maximum number of HLog to 32, that is, hbase.regionserver.maxlogs=32, and the leftmost HLog in the figure above is 33. At this time, the system will get the oldest log (rightmost HLog) and check whether all the data corresponding to the Entry has been unloaded. As shown in the figure, there is still some data in the RegionC that has not been landed. In order to delete the HLog safely, you must force the flush operation on the Region to remove all the data from the disk.

Question 3: which WALEntry needs to be played back and which will be skip when RegionServer downtime recovers replay logs?

Theoretically, you only need to play back the WALEntry corresponding to the data that has not been landed in the Memstore, and the WALEntry corresponding to the data that has been landed can be skip. But the problem is that RegionServer has been down, the corresponding information must be gone, what to do? Find a way to persist. The oldestUnflushedSequenceId variable analyzed above is a variable generated by flush. This variable can be written to HFile in the form of metadata when flush (see the following figure for code):

In this way, Region can recover this core variable oldestUnflushedSequenceId by loading HFile metadata after the downtime migration is reopened (all HFlie generated by this flush are stored in the same sequenceId). After recovery, this sequenceId can be used to filter which Entry needs to be played back and which will be skip when playing back WALEntry.

A question is raised here: is it possible that the sequenceId stored in all the HFile generated by flush is inconsistent, for example, all store (store1, store2) in region execute flush, and store1 executes flush successfully, and oldestUnflushedSequenceId variable is successfully appended to the corresponding HFile; but before store2 executes flush, RegionServer outage occurs, and the oldestUnflushedSequenceId variable corresponding to store2 is still the sequenceId corresponding to the last file. Will the playback data be affected in this case? If so, why? If not, what is the mechanism to guarantee it?

So far, all of the above analysis has been based on the fact that flush operations in hbase are region-level operations, that is, each execution of flush requires all store in the entire region to execute flush. Next, as an extension of the reading, those who are interested in Per-CF Flush can continue to read, and Per-CF Flush allows the system to perform flush on one or some groups of columns separately. The implementation principle is basically similar to the content analyzed above. The difference is that oldestUnflushedSequenceId in the above corresponds to region one by one, and this parameter in Per-CF Flush needs to be refined to store, which corresponds to store one-to-one.

Extended reading: Per-CF Flush

Region-level flush does have many problems. In the case of multiple column families, the size of one of the store exceeds the threshold (128m). No matter how small the other store is, it will be forced to set up the disk, and some very small column families (a few megabytes) will form a lot of very small files after setting up the disk, which is not a good thing for hbase reading.

Per-cf flush allows a single store to execute flush, which already exists in version 1.0.0 and is set as the default policy in version 1.2.0. There are two necessary works to achieve this function. One is to propose a new flush strategy that can select one or more of multiple column families for flush. At present, the new strategy is called FlushLargerStoresPolicy, that is, the largest store is selected for flush. The second is that the granularity of oldestUnflushedSequenceId must be refined from region to store, that is, from map to map >, and the judgment logic of the above three questions also needs to be modified to store level judgment logic. Here we use the store level judgment logic to simply review questions 1 and 3.

When can HLog be reclaimed out of date under the Per-CF Flush policy?

The judgment logic at the region level mainly depends on map, as detailed above. The store-level data structure has been changed to map >. In fact, it is easy to change back to the region level after a simple transformation. Map finds the smallest oldestUnflushedSequenceId called minSeqNum, so the region-level data structure is changed-map, and no other logic needs to be changed.

Under Per-CF Flush policy, what data needs to be played back and which data will be skip when RegionServer crashes and recovers replay logs?

This problem is a little more complicated, and the first concern is the granularity of playback. You need to look back at the composition of Entry in HLog. As shown in the figure, you can see that an Entry consists of WALKey and WAKEdit. WALKey contains some basic information. This article focuses on the variable sequenceId. WALEdit contains a collection of KeyValue inserted / updated. It is important to note that these KeyValue may contain multiple column families (columns) in a row, so it can be said that WALEdit contains multiple KeyValue updated by store.

Under the All-CF Flush strategy, we have no problem with data playback at the granularity of HLog-Entry, but it no longer works under the Per-CF Flush strategy. Because the KeyValue of multiple CF in a HLog-Entry is mixed together, some of the KV may have been closed, while the other parts have not. Therefore, the playback granularity needs to be reduced to the KeyValue level, and a KeyValue is used to check the playback.

After figuring out the playback granularity, let's focus on which KeyValue needs to be played back and which will be skip. As mentioned above, each time you flush, the corresponding oldestUnflushedSequenceId is persisted into the HFile metadata. Under the All-CF Flush policy, the persistent oldestUnflushedSequenceId of all the store of the entire region in a flush operation is the same, so the sequenceId of the HLog-Entry only needs to be compared with this oldestUnflushedSequenceId during playback. If it is large, it needs to be played back, and if it is small, it needs to be skip. However, it no longer works in the Per-CF scenario. Different store in a region has its own independent oldestUnflushedSequenceId, so when playing back, you need to find the corresponding store according to the KeyValue. Compared with the oldestUnflushedSequenceId in the store, large ones need to be played back, and small ones need to be played back.

To sum up: skip hlog cells per store when replaying, note that there are two points here: hlog cells and per store.

Full-text summary

This paper starts with a very important variable in hbase (sequenceId), and explains the WAL module and Flush module involved in it. The article only gives a general idea, a lot of detailed knowledge has not been studied in depth, interested students can go deep into the source code according to the content of the article, I believe it will be easier. Next, the author will continue to analyze the MVCC mechanism in HBase according to the topic of sequenceId. Please look forward to it.

In order to help you make learning easier and efficient, we will share a large number of materials free of charge to help you overcome difficulties on your way to becoming big data engineers and even architects. Here to recommend a big data learning exchange circle: 658558542 welcome everyone to enter × × × stream discussion, learning exchange, common progress.

When you really start learning, it is inevitable that you do not know where to start, resulting in inefficiency that affects your confidence in continuing learning.

But the most important thing is not to know which skills need to be mastered, step on the pit frequently while learning, and eventually waste a lot of time, so it is necessary to have effective resources.

Finally, I wish all the big data programmers who encounter bottle illness and do not know what to do, and wish you all the best in the future work and interview.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.