In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Today, I will talk to you about how to analyze the big merger and small merger of HBase. Many people may not know much about it. In order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.
HBase is designed based on a LSM-Tree (Log-Structured Merge Tree) storage model. When the client side writes data to each Regionserver of the HBase, it will first write the pre-write log WAL file, which is generally placed on the HDFS and shared by all Regionserver nodes before writing to MemStore memory. The default size of MemStore is 128MB (the same size as block, and it is not recommended to modify it). If MemStore reaches 128MB It will be overwritten to disk, generate StoreFile, and eventually become the HFile file in the persistence channel HDFS.
The following diagram is a diagram of the storage architecture of HBase, and there are a lot of them online that I won't talk about here:
As users continue to write, there will be more and more HFile files on disk and more and more metadata. A single HBase query requires more and more IO operations, which increases the time-consuming of the query. In order to optimize query performance, HBase will merge small HFile to reduce the number of files. HBase designed the Compaction mechanism.
Introduction to HBase Compaction mechanism:
HBase Compaction classification:
A.Minor Compaction small merge
Refers to selecting some small, adjacent StoreFile to merge them into a larger StoreFile, and will not deal with the Cell that is already Deleted or Expired in the process. The result of a Minor Compaction is less and larger StoreFile.
B.Major Compaction merger
Merge all the StoreFile into a single StoreFile, which cleans up three types of data: deleted data, TTL expired data, and data whose version number exceeds the set version number (VERSION).
Note:
Majorcompaction generally takes a long time to execute and consumes a lot of resources. The default execution period controlled by the parameter hbase.hregion.majorcompaction is 7 days. Production clusters generally shut it down, and other business is less or executed at night.
Setting hbase.hregion.majorcompaction = 0 closes the majorcompaction triggered by CompactionChecke, but not the mc at the user call level.
The role of Compaction:
a. Merge Fil
b. Clear deleted, expired, redundant versions of data
c. Improve the efficiency of reading and writing data
Trigger conditions of Compaction
Compact or major_compact requests in a.hbase shell
After b..memstore flush, it will determine whether to perform compaction, and once the conditions of minor compaction or major compaction are met, execution will be triggered.
c. Respond to majorCompact () in api
d. Background thread polling. HBase background thread CompactionChecker periodically checks whether compaction needs to be executed, and the check period is the product of two parameters:
Hbase.server.thread.wakefrequency*hbase.server.compactchecker.interval.multiplier
Parameter explanation:
The default value of hbase.server.thread.wakefrequency is 10000, that is, 10s, which is the wake-up time interval of HBase server threads, which is used for periodic check of log roller, memstore flusher and other operations.
The default value of hbase.server.compactchecker.interval.multiplier is 1000s, which is the multiplier factor checked periodically by compaction operation.
So the default execution cycle is:
10 * 1000 s is about equal to 2hrs, 46mins, 40sec.
I have taken two pictures from the source code here. If you are interested, you can take a look at the source code of HBase:
The default value here is 10 "1000, which means 10 seconds:
Note:
It should be noted here that even if the CompactionChecker thread reaches the execution time, it does not necessarily execute major compaction, and there is another condition that needs to be met:
Each HStore is judged once, and needsCompaction () determines whether enough files trigger the Compaction condition.
The conditions are:
Number of StoreFIles in HStore-number of files executing Compacting > minFilesToCompact
MinFilesToCompact: the default is 3, that is, merging starts with more than 3 hfile files.
There will be a function needsCompaction () in the CompactionChecker thread to determine whether it needs to be executed. The code is as follows:
Minor Compaction and Major Compaction have some related parameters, and there are many on the Internet. I think these may need to be adjusted, and the others are better by default:
1). Hbase.hregion.majorcompaction:
Configure the interval between major merging, which defaults to 7 days and can be set to 0. Automatic major merging is prohibited, and major merging can be done manually or periodically through scripts.
2) hbase.hstore.compaction.max:
Sets the maximum number of files to be merged to perform Compaction (including Major & Minor). The default value is 10, and if this setting is exceeded, MinorCompaction will be performed on some files once, with an online general configuration value of 30.
3). Hbase.hstore.compactionThreshold:
Set the threshold for performing Compaction (Major & & Minor) operations. The default is 3. If you want to reduce frequent merge operations, you can increase it a little bit. For systems with heavy HBase loads, you can set it to 5.
4). Hbase.hstore.compaction.max.size
The file size > the StoreFile of the parameter value will be excluded and the minor compaction will not be added. The default value, Long.MAX_VALUE, indicates that there is no limit. In general, it is not recommended to adjust this parameter.
5). Hbase.regionserver.thread.compaction.small:
The default value is the number of threads in the thread pool when Minor Compaction is done by the regionserver, which can be set to 5.
6). Hbase.regionserver.thread.compaction.large:
The default value is 1, the number of threads in the thread pool when Major Compaction is done by the regionserver, which can be set to 8.
7) .hbase.hstore.compaction.kv.max
The default value is 10. In the process of kv compaction, the number of kv read from the Hfile can be increased appropriately if there is enough memory.
The impact of Minor Compaction and Major Compaction on reading and writing:
1)。 First of all, Compaction involves reading and writing disks, which will certainly increase the io burden on the entire cluster.
2)。 Impact on writing:
There are two main parameters:
Hbase.hstore.blockingStoreFiles
Hbase.hstore.blockingWaitTime
If the number of underlying HFile exceeds the hbase.hstore.blockingStoreFiles configuration value, the default 10 flush operation will be blocked. The blocking time is hbase.hstore.blockingWaitTime, and the default is 90000. During this period, if the compaction operation causes the HFile to drop to the blockingStoreFiles configuration value, the blocking will be stopped. In addition, after blocking for more than a long time, the flush operation will resume. This can effectively control the speed of a large number of write requests, but it is also one of the main reasons that affect the speed of write requests.
3)。 Impact on reading:
Compaction operation will bring a lot of bandwidth pressure and short-time IO pressure. Therefore, compaction is the use of short-term IO consumption and bandwidth consumption in exchange for low latency for subsequent queries. This short-term pressure will cause large fluctuations in the delay of read requests.
After reading the above, do you have any further understanding of how to analyze HBase large mergers and small mergers? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.