Example Analysis of HBase High performance query 04/18 Update SLTechnology News&Howtos

Example Analysis of HBase High performance query

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the example analysis of HBase high-performance query, which has a certain reference value, and interested friends can refer to it. I hope you will gain a lot after reading this article.

1. Why compaction?

As we mentioned in the last HBase read and write, HBase creates multiple scanner to grab data during the reading process.

Among them, multiple storefilescanner are created to the specified data block in the load HFile. Therefore, it is easy to think that if there are too many HFiles, it will involve a lot of disk IO, which is often referred to as "read magnification" phenomenon.

Therefore, there is today's topic, the core feature of HBase-compaction.

By executing compaction, the number of HFile can be basically stable, and the number of IO seek can be stable, and then each query rt can be stable in a certain range.

Classification of 2.compaction

There are two kinds of compaction, minor compaction and major compaction.

Minor compaction mainly merges some adjacent small files into large files, this process only does the merging of files, and does not delete deleted type data and ttl expired data.

Major compaction refers to merging all files under a HStore into a single HFile. This process consumes a lot of system resources. Generally, the function of automatic periodic majorcompaction is disabled online (the parameter hbase.hregion.majorcompaction is set to 0, but the trigger of flush will still be carried out), instead of manual low-peak execution. This process deletes three types of data: data marked for deletion, data whose TTL is out of date, and data whose version number does not meet the requirements.

Exactly when and which type of compaction will be triggered?

Major compaction is triggered if any of the following conditions are met, otherwise it is minor compaction:

User enforces major compaction execution

There is no compact for a long time, and the candidate file is less than the threshold (hbase.hstore.compaction.max)

Store contains reference files (temporary files generated by split), which need to be migrated through major compact to delete temporary files.

The trigger time of 3.compaction

There are three trigger times for compaction:

1) MemStore flush:

This was mentioned at the beginning, and I believe it is easy to understand. Because each time MemStore flush generates a new HFile file, and the number of files exceeds the limit, compaction is naturally triggered. It should be noted here that we have already mentioned in the article going deep into HBase architecture that memstore carries out flush in units of region, that is to say, if the memstore under any HStore in a region is full, the memstore of all HStore under this region will trigger flush. Then each HStore may trigger a compaction.

2) background thread periodic check

HBase has a background thread, CompactionChecker, which periodically patrols to trigger a check to see if compaction is performed.

This is different from the compaction triggered by flush. Here, first check whether the file tree is greater than the threshold, and trigger compaction if it is greater than the threshold. If it is not greater than the threshold, it will also check whether the earliest update time in HFile is earlier than a certain threshold (hbase.hregion.majorcompaction), and if so, trigger majorcompaction to clean up useless data.

3) manually trigger:

Because we are worried that major compaction will affect the business, we will choose the business trough to trigger manually.

Another part of the reason is that after the user executes the ddl and wants to understand it, it will also trigger the major compaction manually.

Finally, it may be because there is not enough disk capacity and you need major compaction to manually clean up invalid data and merge files.

4.HFile merge process

1) read the key value of the HFile to be merged and write it to the temporary file

2) move temporary files to the corresponding region data directory

3) write the input file path and output file path of compaction to the WAL log, and then forcibly execute sync

4) delete all input files in the corresponding region data directory

Analysis of side effects of 5.compaction

Of course, compaction itself also involves reading and writing a large number of files, and there will be a certain reading delay burr in the performance. Therefore, we can assume that the compaction process uses a large amount of IO consumption for a short period of time in exchange for low latency for subsequent queries.

On the other hand, assuming that in a long period of high write volume, the number of HFile has been growing, the speed of compaction can not keep up with the growth rate of HFile, then HBase will temporarily block write requests. When each memstore flush, if the number of HFile in a HStore exceeds the hbase.hstore.blockingStoreFIles (default is 7), then the action of the flush will be temporarily blocked for a blocking time of abase.hstore.blockingWaitTime. When the blocking time has passed, observe that the number of HFile drops to the above value, then the operation of the flush will continue. In this way, the number of HFile is stable, but it has a certain impact on the speed of writing.

Thank you for reading this article carefully. I hope the article "sample Analysis of HBase High performance query" shared by the editor will be helpful to you. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.