What is the impact of HBase Flush on reading and writing services? 07/04 Update SLTechnology News&Howtos

What is the impact of HBase Flush on reading and writing services?

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces the knowledge of "what is the impact of HBase Flush on reading and writing services". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Trigger conditions for Flush operation of HBase:

1) the Manual call, HRegionInterface#flushRegion, can be implemented by calling the flush operation in user mode org.apache.hadoop.hbase.client.HBaseAdmin, which directly triggers the internalFlush of HRegion.

2) an update operation of HRegionServer, which makes the whole memory usage exceed the warning line. The warning line is globalMemStoreLimit, RS_JVM_HEAPSIZE * conf.getFloat ("hbase.regionserver.global.memstore.upperLimit"). If you exceed this value, FlushThread will be triggered directly. Select one of the global HRegion and brush its MemStore into the hdfs, thus ensuring that the global memstore capacity of the rs is within a controllable range.

Selection algorithm of HRegion on RS:

Step Region on 1:RS, sorted by the capacity of its MemStore.

Step 2: select the StoreFile in the Store under Region that does not reach the number of hbase.hstore.blockingStoreFiles, and MemStore uses the most Region. -bestFlushableRegion

Step 3: select the Region that is most used by MemStore under Region. -bestAnyRegion

Step 4: if the memstore usage of bestAnyRegion is more than twice that of bestFlushableRegion, this shows from another point of view that although the current bestAnyRegion has more than the number of blockingStoreFiles files, considering the pressure of RS memory and risking the Compaction to be executed, you also choose this Region because of the high benefits. Otherwise, use bestFlushableRegion directly.

The process by which the specified Region is written to hdfs:

Step 1: acquire the write lock of the updatesLock, blocking all update operations to the Region. Thus, it is known that the Flush operation will block the Row update operation (Put, Delete, Increment) in the Region area, because during the blocking update operation, the snapshot operation of Memstore is involved, and if there is no restriction, it is likely that multiple KV of a put operation will fall in kvset and snapshot respectively, which is contrary to the atomicity of row guaranteed by hbase.

Step 2:mvcc advances a write transaction. Each Region maintains a mvcc object (Multi Version)

Consistency Control) to control the transactionality of read and write operations.

Step 3: get a new newSeqNum from HLog and update the lastSeqWritten of HLog. Because the update operation of the Region is paused at this time, the lastSeqWritten record is temporarily deleted and written to the lastSeqWritten. The lastSeqWritten here is the SeqNum that HLog uses to store the last commit operation of each Regiond to the current moment.

Step 4: perform the snapshot operation for the MemStore of each Store under Region.

As shown in the figure above, the number of Store on HRegion is determined by the number of ColumnFamily in Table. Each Store is composed of one MemStore and several StoreFile (HFile) files. During the normal update operation, the updated content will be written to the kvset structure in MemStore. HRegion performs the Flush operation, which is actually the process of brushing all the contents of the MemStore into the hdfs. Although the update operation is currently blocked by adding a write lock, the read operation can still continue, so when memstore executes snapshot, it points to kvset through reference,snapshot, and then points to a new area of memory for kvset. The code is as follows:

Step 5: release the write lock for the updatesLock, and the HRegion can receive the update operation.

Step 6: update the mvcc read version to the current write version number.

Here is a small episode, in the update operation, mvcc. The operation of completeMemstoreInsert is outside the scope of updatesLock, so in the case of multi-thread and high concurrency, there is a situation in which the kvset has been written to MemStore, but the transaction has not yet completed commit. The relevant code for this scenario is as follows:

From line 4358, we can clearly see that the update operation is written to MemStore's kvset through updatesLock, but it is assumed that the Flush thread acquired the updatesLock write lock and performed the snapshot operation after the other update thread 4363 lines. Then, there will be a discrepancy between read and write transaction numbers in the mvcc here, so the Flush thread in Region needs to use waitForRead (w) and wait for the update to the current write version number.

Step 7: write the snapshot in Store into a temporary StoreFile file.

Step 8: rename the storefile file and update the file and Memstore status in Store.

Before the completion of step 8, the read request for the entire Hregion is not affected as before. Because during the read request, StoreScanner reads kvset and snapshot synchronously, even if kvset is switched to snapshot,scan, the operation can continue, and this part of the content is controlled by MemStoreScanner.

In the process of reading, the scanner in Store has two parts, one is StoreFileScanner, the other is MemStoreScanner, which inherits the KeyValueScanner interface and is encapsulated by KeyValueHeap in StoreScanner. Similarly, in RegionScannerImpl, the StoreScanner of each Store is encapsulated through a KeyValueHeap, thus providing external services directly.

At this point, perhaps careful engineers will have a question: does the Flush operation have any impact on reading?

It has an impact, but it is small. In the previous stages of step 8, MemStoreScanner achieved a free switch between kvset and snapshot.

As shown above, if kvset is reset, theNext will no longer be equal to kvsetNextRow, thus switching to start fetching data from the snapshot iterator.

Therefore, between steps 1 and 7, there is little impact on the read service. But in the final step of step 8, you need to update the generated storefile to the StoreFile list in the available Store and clear the contents of the snapshot.

So, at this point, ChangedReaderOberver begins to work.

/ / Tell listeners of the change in readers.

NotifyChangedReadersObservers ()

The most important thing here is that the heap emptying that storescanner is used to encapsulate all StoreFileScanner and MemStoreScanner will offend that when the next () operation is executed, the resetScannerStack operation will be triggered, all Scanner under Store will be reloaded, and seek will be executed to the last updated key. This process causes flush operations to suddenly pause for some next operations.

This is the end of the content of "what is the impact of HBase Flush on reading and writing services". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.