HBase Best practices-CMS GC tuning (tuning from gc itself parameters) 07/02 Update SLTechnology News&Howtos

HBase Best practices-CMS GC tuning (tuning from gc itself parameters)

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Comrades, this part cannot be more important.

1. With the development of HBase, all kinds of optimization have never stopped, and GC optimization is the top priority.

Hbase gc tuning direction

Since version 0.94 proposed MemStoreLAB strategy and Memstore Chuck Pool strategy to optimize write cache Memstore, to version 0.96 proposed BucketCache and out-of-heap memory to optimize read cache BlockCache, and then to the subsequent version 2.0 announced that it will introduce more out-of-heap memory, it can be seen that HBase will take the use of out-of-heap memory as a strategic direction to optimize GC.

However, no matter how much out-of-heap memory is introduced, it is impossible to avoid using JVM memory in the full path of reading and writing. Take the offheap mode in BucketCache. Even though HBase data blocks are cached in out-of-heap memory, the block in out-of-heap memory will first be loaded into JVM memory and then returned to the user when reading.

The understanding of this sentence is: http://hbasefly.com/2016/04/26/hbase-blockcache-2/

/ / mentioned

BucketCache working mode / /

For example, when allocating memory, heap mode needs to first allocate memory from the operating system and then copy it to JVM heap, which is more time-consuming than offheap allocating memory directly from the operating system; but conversely, when reading cache, heap mode can be read directly from JVM heap, while offheap mode needs to be copied from the operating system to JVM heap and then read, which appears to be more time-consuming.

It can be seen that no matter how much out-of-heap memory is used, the use of JVM memory can not be bypassed after all. Since it can not be bypassed, we still need to settle on GC itself and optimize GC itself.

Review the working principle of CMS GC algorithm

First, the picture above:

For more information, please see: https://blog.51cto.com/12445535/2372976

In this paper, the memory structure and CMS GC of JVM are introduced in detail.

The whole JVM memory consists of three parts: Young area, Tenured area and Perm area. Young area is divided into one Eden area and two Survivor areas. The whole life cycle of the object is briefly described (be sure to be familiar with it, which will be used all the time below):

(1) Young area: after an object is initialized, it will first enter the Eden area. When the Eden area is full, a Minor GC,Minor GC will be triggered to check whether all objects in the Eden area are still alive (whether there are any other object references). If they are alive, they will be copied from the Eden area to the Survivor area, and the age of these surviving objects will be added to one, and the dead objects will be treated as garbage collection. At this time, the Eden area is free again, and after the new object is filled, Minor GC will be triggered, and so on. It is important to note that each time the Minor GC is executed, the age of the surviving object is incremented.

(2) Tenured area: once the age of the surviving object exceeds a certain threshold, it will be promoted to the Tenured area, so it can be understood that the Tenured area generally stores long-lived objects. Obviously, over time, the tenured area will also be filled, and CMS GC (old gc) will be triggered. This GC is relatively complex, consisting of six steps (initial tag; concurrent tag; concurrent pre-cleaning; re-marking; concurrent cleanup; concurrent reset. ), see the reference article for details. Both Minor GC and CMS GC will 'Stop-The-World',' all the user's threads, leaving only GC threads to collect garbage objects. Among them, the STW time of Minor GC is mainly spent in the replication stage (the younger generation itself is the copy algorithm), and the STW time of CMS GC is mainly spent in the stage of marking junk objects (that is, concurrent marking and concurrent pre-cleaning).

Let's take a look at the ultimate goal and basic principles of GC tuning:

The average Minor GC time is as short as possible. Because the entire Minor GC is in STW (STW time is mainly spent in the replication phase), short-time Minor GC will make users read and write more smoothly and latency can be controlled. The fewer times of CMS GC, the better. The shorter the time the better. On the one hand, a CMS GC usually causes at least a second application pause, which has a great impact on users' reading and writing; on the other hand, frequent CMS GC will produce a large number of memory fragments, which in serious cases will cause Full GC and lead to RegionServer downtime.

The following tuning techniques for parameters follow the above principles, especially for delay-sensitive projects such as HBase, which make GC smoother and have a shorter pause time while avoiding serious impact on user reading and writing.

/ / that is, follow a principle, whether minor gc or cms gc, to reduce the number of cms gc as far as possible.

CMS GC Optimization skills (three stages)

It is mainly divided into three stages.

1. The first phase will introduce the configuration of GC parameters applicable to all scenarios, which can be easily understood by readers without too much explanation.

2. Two sets of parameters are tuned and explained in the second stage and the third stage respectively. These two groups of parameters are generally set according to different application scenarios in order to make the best effect of GC. In view of the complexity of these two groups of parameters

Stage 1: default recommended configuration / / explain what each parameter means

-Xmx-Xms-Xmn-Xss-XX:MaxPermSize= M-XX:SurvivorRatio=S-XX:+UseConcMarkSweepGC-XX:+UseParNewGC-XX:+CMSParallelRemarkEnabled-XX:MaxTenuringThreshold=N-XX:+UseCMSCompactAtFullCollection-XX:+UseCMSInitiatingOccupancyOnly-XX:CMSInitiatingOccupancyFraction=C-XX:-DisableExplicitGC

-Xmx: maximum heap memory allocated to JVM

-Xms: the initial memory allocated to JVM, which is generally the same as the Xmx setting

-Xmn: allocated to the memory size of the young area. The setting of this value has a great impact on system performance. The tuning of the parameters of this value will be discussed in the second phase.

-Xss: the stack size assigned to each thread. In some systems that are sensitive to the number of threads, this value is more important. It is generally set to about 256K~1M.

-XX:MaxPermSize= M: the amount of memory allocated to persistent generations

-XX:SurvivorRatio=S: indicates the memory size ratio of the young zone to the survivor zone. The default is 8. The setting of this value has a great impact on system performance. The third phase will focus on the tuning of this parameter.

-XX:+UseConcMarkSweepGC: indicates that the recycler uses CMS CG policy

-XX:+UseParNewGC: indicates that the parallel recycling mechanism is adopted in the young area.

-XX:+CMSParallelRemarkEnabled: indicates that the remark phase of cms is in parallel. It is recommended to use &

-XX:MaxTenuringThreshold=N: indicates the threshold at which objects are promoted to the Tenured area. The setting of this value has a great impact on the system, which will be discussed in the third node.

-XX:+UseCMSCompactAtFullCollection: it is recommended to perform fragmentation after each execution of cms gc.

-XX:+UseCMSInitiatingOccupancyOnly: indicates that cms gc is triggered only based on the parameter CMSInitiatingOccupancyFraction

-Xx:CMSInitiatingOccupancyFraction: indicates that cms gc will be triggered when the percentage of memory usage in tenured (old age) zone exceeds the total size of tenured. This value is generally set to 70%-80%.

-XX:-DisableExplicitGC: forbids the use of the command System.gc (), which is used to trigger garbage collection for the entire JVM. It is generally recommended by full gc for a long time.

-XX:+PrintTenuringDistribution can print the corresponding log. It is strongly recommended that this parameter be enabled for online clusters. &

Tuning using gc parameters in all scenarios

* from the above description of the various GC parameters, we can easily get the following parameter settings recommended in the first stage, which are basically applicable to all scenarios:

-XX:+UseConcMarkSweepGC-XX:+UseParNewGC-XX:+CMSParallelRemarkEnabled-XX:+UseCMSCompactAtFullCollection-XX:+UseCMSInitiatingOccupancyOnly-XX:CMSInitiatingOccupancyFraction=75%-XX:-DisableExplicitGC

Optimize preparation

1. The basic recommended settings are given above by explaining the meaning of each GC parameter, and several parameters that have a significant impact on performance are also mentioned:

2. Xmn, SurvivorRatio and MaxTenuringThreshold, the following will tune the settings of these parameters in the HBase system through theoretical reasoning and experimental verification.

It is important to emphasize that HBase is all configured in BucketCache mode, not LruBlockCache. A large amount of out-of-heap memory is used as read cache, which optimizes GC to a great extent.

It can be seen that BucketCache mode performs much better than LruBlockCache mode GC. It is strongly recommended to configure BucketCache mode online.

GC log analysis

After introducing the basic conditions of the experiment, the GC log is simply explained to facilitate the analysis of the log below. It should be noted that the corresponding log can only be printed when the parameter-XX:+PrintTenuringDistribution is added. It is strongly recommended that this parameter be enabled for online clusters.

Memory Analysis of HBase scene

Therefore, it can be seen that the HBase system belongs to the project with most long-lived objects, so when GC, we only need to eliminate the short-lived objects such as RPC in the Young area to achieve the best GC effect.

Stage 2: NewParSize tuning / / that is-Xmn parameters / / see blog memory for details

NewParSize represents the size of the young region, while the size of the young region directly determines the frequency of the minor gc.

On the one hand, the minor gc frequency determines the length of a single minor gc, the more frequent the gc, the shorter the gc time; on the other hand, it determines the amount of promotion to the old age, the more frequent the gc, the greater the number of objects promoted to the old age. The explanation is:

With the increase of the young area size, the minor gc frequency decreases, and the time of a single gc is longer (the setting of the gc area is larger, and more objects need to be copied at one gc, which is bound to take a long time), and the delay jitter of business read and write operations is larger. On the contrary, the delay jitter of business read and write operation is small and stable. Decreasing the size of the young area increases the minor gc frequency, but accelerates the total number of objects promoted to the old age (each gc, the object age will be increased by one, and when the age exceeds the threshold, it will be promoted to the old age, so the higher the gc frequency, the faster the age will increase), potentially increasing the risk of old gc.

Therefore, setting NewParSize requires a certain balance, neither too large nor too small.

Summary:

1. Xmn=2 is the best choice.

/ / too small Xmn setting will lead to poor CMS GC performance, while too large setting will lead to poor Minor GC performance.

/ / therefore, it is recommended to set the Xmn between 1g and 3G when the JVM Heap is above 64g.

/ / set to 512m~1g under 32g

Details:

Specifically, it is best to go through simple online debugging. I need to emphasize that on many occasions, I have seen that many Xmn online clusters will set the Xmn very large. For example, some clusters have a Xmx of 48g and Xmn of 10g. Check the log and find that the performance of GC is very poor: a single Minor GC is basically between 300ms~500ms, and many CMS GC is more than 1s. It is strongly recommended that scaling up Xmn is not good for GC (whether Minor GC or CMS GC), and don't set it too big.

Stage 3: increase the size of Survivor area (decrease SurvivorRatio) & increase MaxTenuringThreshold

1.-XX:SurvivorRatio=S: indicates the memory size ratio of the young area to the survivor area. The default value is 8. The setting of this value has a great impact on system performance. The tuning of this parameter will be discussed in the third phase.

2.-XX:MaxTenuringThreshold=N: indicates the threshold at which objects in the Tenured area are promoted to the Tenured area. The setting of this value has a great impact on the system, which will be discussed in the third node.

Summary:

1. In general, the default MaxTenuringThreshold=15 is relatively large and does not need to be adjusted.

2. For Minor GC, the SurvivorRatio setting does not have much impact on it. For CMS GC, setting the SurvivorRatio too large is a disaster, and the performance is extremely poor. Compared with the default value SurvivorRatio=8, reducing SurvivorRatio is beneficial to the elimination of short-lived objects more fully, so it is recommended that SurvivorRatio=2

CMS tuning conclusion the cache mode adopts BucketCache policy Offheap mode. For large memory (> 64g), the following configuration is adopted:

-Xmx64g-Xms64g-Xmn2g-Xss256k-XX:MaxPermSize=256m-XX:SurvivorRatio=2-XX:+UseConcMarkSweepGC-XX:+UseParNewGC

-XX:+CMSParallelRemarkEnabled-XX:MaxTenuringThreshold=15-XX:+UseCMSCompactAtFullCollection-XX:+UseCMSInitiatingOccupancyOnly

-XX:CMSInitiatingOccupancyFraction=75-XX:-DisableExplicitGC

Summary:

1. Xmn can increase moderately with the increase of Java allocated heap memory, but it cannot be more than 4g, and the value range is in the range of 1g to 3G.

2. SurvivorRatio is generally recommended to choose 2.

3. MaxTenuringThreshold is set to 15

4. For small memory (less than 64 GB), you only need to change Xmn to 512m-1g in the above configuration.

Finally, a summary:

General scenario

-XX:+UseConcMarkSweepGC-XX:+UseParNewGC-XX:+CMSParallelRemarkEnabled-XX:+UseCMSCompactAtFullCollection-XX:+UseCMSInitiatingOccupancyOnly-XX:CMSInitiatingOccupancyFraction=75%-XX:-DisableExplicitGC-XX:+PrintTenuringDistribution

Cms gc tuning (less than 64G memory)

-Xmx "g-Xms" g-Xmn1g-Xss256k-XX:MaxPermSize=256m-XX:SurvivorRatio=2-XX:+UseConcMarkSweepGC-XX:+UseParNewGC

-XX:+CMSParallelRemarkEnabled-XX:MaxTenuringThreshold=15-XX:+UseCMSCompactAtFullCollection-XX:+UseCMSInitiatingOccupancyOnly

-XX:CMSInitiatingOccupancyFraction=75-XX:-DisableExplicitGC-XX:+PrintTenuringDistribution

Cms gc is configured as follows for large memory (greater than 64g):

-Xmx64g-Xms64g-Xmn2g-Xss256k-XX:MaxPermSize=256m-XX:SurvivorRatio=2-XX:+UseConcMarkSweepGC-XX:+UseParNewGC

-XX:+CMSParallelRemarkEnabled-XX:MaxTenuringThreshold=15-XX:+UseCMSCompactAtFullCollection-XX:+UseCMSInitiatingOccupancyOnly

-XX:CMSInitiatingOccupancyFraction=75-XX:-DisableExplicitGC-XX:+PrintTenuringDistribution

Partner's question 1: about the gc log

Hello, blogger, I would like to ask, the log of gc, / var/log/hbase/gc.regionserver.log will be overwritten every time it is restarted, can it be configured to append write? Sometimes when hbase reports an error and wants to check the problem with gc, once it is rebooted, it will lose the previous gc log information.

You can look at your jvm configuration to set the GC log to 3 or more-Xloggc:$HBASE_LOG_DIR/gc-regionserver-date +% Y%m%d-%H-%M.log-XX:+UseGCLogFileRotation-XX:NumberOfGCLogFiles=3

Question of the small partner 2:

Hello, blogger. I have also done a lot of tests and found that the performance (throughput, delay) of LRU is better than that of CBC under different read-write ratios. I see this point mentioned in the article, but I still don't understand why CBC is worse, so I would like to consult the blogger.

Because bucketcache (offheap mode) uses out-of-heap memory in CBC mode, reading out-of-heap memory is more complex and has more processes than jvm memory, so in full-memory scenarios, LRU is completely better than CBC, and the throughput latency is basically the same in scenarios where the cache basically misses.

What is the full memory scenario?

Scenarios where the amount of data is small or there are a large number of hot reads, most of the reads fall in the BlockCache scenario.

Question 3 of the buddies:

Fan Dashen, can you ask your hbase cluster zookeeper.session.timeout how big this parameter is, and can you adjust this parameter to reduce the RS connection zk timeout hang problem caused by GC? If I adjust to, say, 180 seconds, what will be the impact on hbase? Thank you very much!

If you set a large point for offline clusters, there is no problem that real-time online clusters can not be too large for delay-sensitive clusters.

The question of the partner:

Is it convenient for bloggers to use jmap-heap PID to show the blogger's configuration and then explain these parameters? This may be more convenient for you to read and learn the configuration of the blogger.

The answer on my side is

[root@hdfs-master-80-121hbase] # ps-ef | grep hbase

Hbase 3004 2490 0 Mar06? 00:54:30 / usr/java/jdk1.8.0_102/bin/java-Dproc_regionserver-XX:OnOutOfMemoryError=kill-9% p-Djava.net.preferIPv4Stack=true-Xms4294967296-Xmx4294967296-XX:+UseParNewGC-XX:+UseConcMarkSweepGC-XX:CMSInitiatingOccupancyFraction=70-XX:+CMSParallelRemarkEnabled-XX:+HeapDumpOnOutOfMemoryError-XX:HeapDumpPath=/tmp/hbase_hbase-REGIONSERVER-e72e026ce56e3850d7702a2ca6ecc206_pid3004.hprof-XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh-Dhbase.log.dir=/var/log/hbase-Dhbase. Log.file=hbase-cmf-hbase-REGIONSERVER-hdfs-master-80-121.log.out-Dhbase.home.dir=/opt/cloudera/parcels/CDH-5.9.2-1.cdh6.9.2.p0.3/lib/hbase-Dhbase.id.str=-Dhbase.root.logger=INFO RFA-Djava.library.path=/opt/cloudera/parcels/CDH-5.9.2-1.cdh6.9.2.p0.3/lib/hadoop/lib/native:/opt/cloudera/parcels/CDH-5.9.2-1.cdh6.9.2.p0.3/lib/hbase/lib/native/Linux-amd64-64-Dhbase.security.logger=INFO,RFAS org.apache.hadoop.hbase.regionserver.HRegionServer start

Hbase 3090 3004 0 Mar06? 00:26:57 / bin/bash / usr/lib64/cmf/service/hbase/hbase.sh regionserver start

Hbase 6981 3090 0 18:04? 00:00:00 sleep 1

Root 6983 21889 0 18:04 pts/0 00:00:00 grep-color=auto hbase

[root@hdfs-master-80-121hbase] # jmap-heap 3004

Attaching to process ID 3004, please wait...

Debugger attached successfully.

Server compiler detected.

JVM version is 25.102-b14

Using parallel threads in the new generation.

Using thread-local object allocation.

Concurrent Mark-Sweep GC

Heap Configuration:

MinHeapFreeRatio = 40

MaxHeapFreeRatio = 70

MaxHeapSize = 4294967296 (4096.0MB)

NewSize = 348913664 (332.75MB)

MaxNewSize = 348913664 (332.75MB)

OldSize = 3946053632 (3763.25MB)

NewRatio = 2

SurvivorRatio = 8

MetaspaceSize = 21807104 (20.796875MB)

CompressedClassSpaceSize = 1073741824 (1024.0MB)

MaxMetaspaceSize = 17592186044415 MB

G1HeapRegionSize = 0 (0.0MB)

Heap Usage:

New Generation (Eden + 1 Survivor Space):

Capacity = 314048512 (299.5MB)

Used = 94743000 (90.35396575927734MB)

Free = 219305512 (209.14603424072266MB)

30.168269034817143% used

Eden Space:

Capacity = 279183360 (266.25MB)

Used = 91102232 (86.8818588256836MB)

Free = 188081128 (179.3681411743164MB)

32.63168406598445 used

From Space:

Capacity = 34865152 (33.25MB)

Used = 3640768 (3.47210693359375MB)

Free = 31224384 (29.77789306640625MB)

10.442426867951127% used

To Space:

Capacity = 34865152 (33.25MB)

Used = 0 (0.0MB)

Free = 34865152 (33.25MB)

0.0% used

Concurrent mark-sweep generation:

Capacity = 3946053632 (3763.25MB)

Used = 16010552 (15.268852233886719MB)

Free = 3930043080 (3747.9811477661133MB)

0.4057357931013544% used

14606 interned Strings occupying 1393920 bytes.

Question 6 of the small partner:

Hello, why didn't you use the G1 GC mechanism?

On the one hand, we don't use G1GC because we don't use that much memory, and on the other hand, we need to pay attention to too many parameter configurations to use G1GC well. But it will be a big direction in the future.

Do you have a G1 tuned one? I've been waiting for three years.

Please refer to: http://openinx.github.io/ppt/hbaseconasia2017_paper_18.pdf

Reference link:

Http://hbasefly.com/2016/08/09/hbase-cms-gc/

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.