Hbase cluster planning (cluster business planning, cluster capacity planning, Region planning) 07/09 Update SLTechnology News&Howtos

Hbase cluster planning (cluster business planning, cluster capacity planning, Region planning)

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

/ / hbase cluster business planning, cluster capacity planning, Region planning

Leads to the question:

1. Which businesses should run on a cluster to make the best use of the software and hardware resources of the system?

2. In addition, for a given business, how to plan the hardware capacity of the cluster so that resources are not wasted?

3. Finally, how many Region should be deployed on a given RegionServer? Presumably these questions have confused a lot of HBaser.

Cluster service planning

1. There is rarely only one business running on a HBase cluster, and in most cases, multiple businesses share the cluster, which is actually sharing system hardware and software resources.

2. One is the problem of resource isolation between businesses, that is, each business is logically separated from each other and will not be affected by each other. This problem arises from the business sharing scenario. Once the traffic of one business increases sharply over a period of time, it will inevitably affect other businesses because of excessive consumption of system resources.

3. The second is how to maximize the utilization of system resources in the case of sharing. Ideally, of course, we hope that all the software and hardware resources in the cluster will be utilized to the maximum extent.

To maximize the use of cluster system resources, it first depends on the business demand for system resources.

1. Hard disk capacity sensitive business

/ / this kind of business does not have great requirements for read-write delay and throughput, the only requirement is hard disk capacity.

/ / the upper application usually writes a large amount of data in batches at regular intervals, and then reads a large amount of data in batches on a regular basis.

/ / Features: offline writing, offline reading, hard disk capacity required

2. Bandwidth-sensitive services

/ / most of these services have high write throughput, but there is no requirement for read throughput. For example, real-time log storage business

/ / the upper application transmits a large number of logs in real time through kafka, which requires real-time writing. The reading scenario is usually offline analysis or log retrieval when the last business encounters an exception.

/ / Features: online writing, offline reading, bandwidth required (elk+kafka)

3. Io sensitive business

/ / IO sensitive businesses are generally core businesses.

/ / this kind of business requires higher read and write latency, especially when the read delay is usually less than 100ms, and some businesses may require higher latency.

/ / such as online message storage system, historical order system, real-time recommendation system, etc.

/ / Features: write online, read online, need memory, high IOPS media

Tip:

As for CPU resources, HBase itself is a CPU sensitive system, which is mainly used for data block compression / decompression. All businesses have common requirements for CPU.

Summary:

1. A cluster wants to maximize the utilization of resources, one idea is to 'enhance the strengths and avoid weaknesses' among various businesses, match them reasonably, and take what they need.

2. In fact, the above types of business can be mixed and distributed. It is recommended not to distribute too many businesses of the same type in the same cluster.

3. Theoretically, the resource utilization efficiency of a cluster is as follows: hard disk sensitive business + bandwidth sensitive business + IO sensitive business.

4. It is recommended that core business and non-core business are distributed in the same cluster. It is strongly recommended that too many core businesses are distributed in the same cluster at the same time. (considering the aspect of operation and maintenance)

/ / Core business sharing resources will certainly lead to competition (no matter which one is defeated, it is not what we want to see). At a certain time, in order to ensure the smooth processing of core business, other non-core businesses can only be sacrificed in the case of resource sharing. Even shut down non-core business.

Disadvantages: this kind of business design will produce many small clusters. Many people will use resource isolation rsgroup to isolate business resources (to reduce the number of small clusters). Large clusters will distribute business independently to many independent RS through isolation. In fact, many logical small clusters will be created, so these small clusters are also applicable to the planning ideas mentioned above.

Planning of cluster capacity / / focus

If the hard disk size of a RegionServer is 3.6T * 12 and the total memory size is 128G, will this configuration theoretically waste resources? If so, is it a waste of hard disk or memory? What should a reasonable hard disk / memory match look like? What are the influencing factors?

Here we need to put forward a concept of 'Disk / Java Heap Ratio'', which means that the Java memory size of 1bytes on a RegionServer needs to be matched with the most reasonable hard disk size. Before giving a reasonable explanation, give the results:

Disk Size / Java Heap = RegionSize / MemstoreSize ReplicationFactor HeapFractionForMemstore * 2

According to the default configuration, RegionSize = 10G, corresponding parameter is hbase.hregion.max.filesize;MemstoreSize = 128m, corresponding parameter is hbase.hregion.memstore.flush.size;ReplicationFactor = 3, corresponding parameter is dfs.replication;HeapFractionForMemstore = 0.4, corresponding parameter is hbase.regionserver.global.memstore.lowerLimit

10G=10 1024M 1024B 1024b

-30.42 million 19b

128M=1281024B1024b

The calculation is: 10g / 128m 30.42 = 192, which means that the Java memory size of 1bytes on RegionServer needs to be matched with the hard disk size of 192bytes. Back to the question given earlier, with the total memory size of 128g, use 96g of Java memory for RegionServer (here is a formula: RS memory is 2x31282can3 of system memory, which is about 85.4cm.rs memory is about 3bank, 4128x3way, 4 = 96g). The memory left for the fixed system is 32 GB, and the rest is given to rs. I prefer that when the memory is large, the system memory is 32 GB, and when the memory is small, it can be allocated between 2 GB, 3 GB, 3 thumb, 4), which corresponds to the need to match 96g * 192 = 18T hard disk capacity, while the actual purchase machine configuration is 36T, indicating that almost half of the hard drives will be wasted under the default configuration.

Tip:

In cdh

Hbase.regionserver.global.memstore.lowerLimit = 0.95 default value 0.95

/ / the flush algorithm sorts first by Region size, and then flush in that order until the total Memstore size is as low as lowerlimit.

10G=101024M1024B1024b

-30.952mm 456b

128M=1281024B1024b

If the default configuration of cdh is algorithm, 96G * 465 = 43T

Note: when we set the

Low water mark refreshed by Memstore

Hbase.regionserver.global.memstore.size.lower.limit

If hbase.regionserver.global.memstore.lowerLimit = 0.4, cdh displays a warning:

The Setting Memstore refreshed low watermark to 0.9 or lower can cause performance issues due to under-utilization of available memory.// setting the Memstore refreshed low watermark to 0.9 or lower may cause performance problems due to insufficient available memory utilization. When we have a lot of memory, follow:

Hbase.regionserver.global.memstore.lowerLimit, generally lowerLimit is 5% smaller than upperLimit.

Some parameters need to be tested in large memory.

When I was set to

Hbase.regionserver.global.memstore.upperLimit is set to 0.45

Hbase.regionserver.global.memstore.lowerLimit is set to 0.40

The error report is: Exception in thread "main" java.lang.RuntimeException: Current heap configuration for MemStore and BlockCache exceeds the threshold required for successful cluster operation. The combined value cannot exceed 0.8. Please check the settings for hbase.regionserver.global.memstore.size and hfile.block.cache.size in your configuration. Hbase.regionserver.global.memstore.size is 0.45 hfile.block.cache.size is 0.4 at org.apache.hadoop.hbase.io.util.HeapMemorySizeUtil.checkForClusterFreeMemoryLimit (HeapMemorySizeUtil.java:85) at org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources (HBaseConfiguration.java:82) at org.apache.hadoop.hbase.HBaseConfiguration.create (HBaseConfiguration.java:96) at org.apache.hadoop.hbase.regionserver.HRegionServer.main (HRegionServer.java:2716)

It means:

The current heap configuration of MemStore and BlockCache exceeds the threshold required for a successful cluster operation. The combined value cannot exceed 0.8. Please check the settings of hbase.regionserver.global.memstore.size and hfile.block.cache.size in the configuration. Hbase.regionserver.global.memstore.size is 0.45, hfile.block.cache.size is 0.4.

In other words,

HFile block cache size

Hfile.block.cache.size = 0.4 default

Hbase.regionserver.global.memstore.upperLimit = 0.4 default

However, the hard rule of HBase is calculated according to this parameter, and the value of this parameter plus hbase.regionserver.global.memstore.upperLimit cannot be greater than 0.8. as mentioned above, the hbase.regionserver.global.memstore.upperLimit value is set to 0.4, so hfile.block.cache.size must be set to any value less than 0.14.

/ / the actual operation is:

Hbase.regionserver.global.memstore.upperLimit = 0.45

Hbase.regionserver.global.memstore.lowerLimit = 0.4

Hfile.block.cache.size + hbase.regionserver.global.memstore.upperLimi cannot be greater than 0.8

Hfile.block.cache.size = 0.35

Success

Case 2 in hbase memory planning (read more, write less and write more and read less): read more, write less + BucketCache

Https://blog.51cto.com/12445535/2373788

Where did the above formula come from?

In fact, it is very simple. You only need to calculate the number of Region in terms of hard disk capacity latitude and Java Heap latitude, and then make them equal, as follows:

Number of Region under latitude of hard disk capacity: Disk Size / (RegionSize * ReplicationFactor)

Number of Region at Java Heap latitude: Java Heap HeapFractionForMemstore / (MemstoreSize / 2)

Disk Size / (RegionSize * ReplicationFactor) = Java Heap HeapFractionForMemstore / (MemstoreSize / 2)

= > Disk Size / Java Heap = RegionSize / MemstoreSize ReplicationFactor HeapFractionForMemstore * 2

What is the specific significance of such a formula?

The most intuitive meaning is to judge whether there will be waste of resources under the current given configuration, and whether the memory resources and hard disk resources match. On the other hand, if the hardware resources have been given, for example, the hardware procurement department has purchased 128g of current machine memory, allocated 96g of Java Heap and 40T of hard disk, it is obvious that the two do not match, can you make the two match by modifying the HBase configuration? Of course, you can do this by increasing the RegionSize or decreasing the MemstoreSize. For example, if you increase the default RegionSize from 10g to 20g, when Disk Size / Java Heap = 384J 96g * 384 = 36T, you can basically match the hard disk and memory. / / if it does not match, you need to adjust [increase RegionSize or decrease MemstoreSize to achieve] in addition, if the memory and hard disk do not match in a given configuration, is it better to "waste" memory or hard disk in the actual scenario? The answer is that memory is' wasted'. For example, the purchased machine Java Heap can be allocated to 126G, while the total hard disk capacity is only 18T. By default, Java Heap must be wasted, but the excess memory resources can be allocated to HBase read cache BlockCache by modifying HBase configuration, so as to ensure that Java Heap is not actually wasted.

In addition, there are these resources to pay attention to.

Bandwidth resources:

1. Because HBase consumes network bandwidth resources especially when there are a large number of scan and high-throughput writes.

2. It is strongly recommended that the HBase cluster be deployed in the 10 Gigabit switch room, and 10 Gigabit Nic + bond is preferred for a single machine.

3. If the switch is a gigabit Nic in special circumstances, make sure that all RegionServer machines are deployed under the same switch. Cross-switch will cause a large write delay and seriously affect the business write performance.

CPU Resources:

1. HBase is a CPU-sensitive business. No matter data is written or read, it will consume computing resources because of a large number of compression and decompression operations. So for HBase, the more CPU, the better.

Reference:

Http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html

Region planning

1. Region planning mainly involves two aspects: Region number planning and single Region size planning, these two aspects are not independent, but are interrelated.

2. There are fewer Region corresponding to large Region and more Region corresponding to small Region.

3. Region planning is believed to be a concern of many HBase operation and maintenance students. How many Region is appropriate to run on a given specification of RegionServer? this problem has been bothering me when I first came into contact with HBase. In practical applications, too much or too little Region has certain advantages and disadvantages:

A large number of small Region

Advantages:

It is more beneficial to the load distribution among clusters and to the efficient and stable Compaction. This is because the HFile in the small Region is relatively small, and the Compaction cost is small. For more information, please see: Stripe Compaction http://hbasefly.com/2016/07/25/hbase-compaction-2/.

Disadvantages: the most direct impact: in the case of abnormal downtime or restart of a certain RegionServer, a large number of small Region redistribution and migration is a very time-consuming operation. Generally, a Region migration requires about 1.5s~2.5s. The more Region, the longer the migration time. Directly lead to failover for a long time. A large number of small Region is likely to produce more frequent flush and many small files, which in turn leads to unnecessary Compaction. In special scenarios, once the number of Region exceeds a threshold, it will result in flush at the entire RegionServer level, which will seriously block users from reading and writing. RegionServer management and maintenance costs a lot.

Small amount of large Region:

Advantages:

Rapid restart and downtime recovery in favor of RegionServer can reduce the total number of RCP and help generate less and larger flush

Disadvantages: the effect of Compaction is very poor, which will cause large data write jitter, and poor stability is not conducive to load balancing between clusters.

Summary:

/ / as you can see, in the current working mode of HBase, too much or too little Region is not a good thing. In the actual online environment, you need to choose a compromise. One of the recommendations given in the official documentation is in the range of 20,200, while the size of a single Region is controlled at 10G~30G, which is more realistic. (we recommend that the number of region be 100 and the size of region be 20G)

/ / HBase cannot directly configure the number of Region on a RegionServer. The number of Region most directly depends on the size of the RegionSize. Configuration hbase.hregion.max.filesize,HBase believes that once the size of a Region is greater than the configuration value, it will be split.

/ / it can be seen that for the current HBase, if you want the HBase to work more smoothly (the number of Region is controlled between 20,200 and the size of a single Region is controlled between 10G~30G), the maximum amount of data that can be stored is about 200 * 30g * 3 = 18T. If the amount of data stored exceeds 18T, it will inevitably cause more or less performance problems. Therefore, from the perspective of Region scale, the current upper limit of hard disk capacity that can be reasonably utilized by a single RegionServer is basically 18T.

Tip:

Formula:

Disk Size / Java Heap = RegionSize / MemstoreSize ReplicationFactor HeapFractionForMemstore * 2

Hbase.hregion.max.filesize defaults to 10G. If a RegionServer is expected to run 100g Region, the estimated amount of data on a single RegionServer is: 10G 1003 = 3T. On the other hand, if a RegionServer wants to store 12 terabytes of data, it will split into 400 Region based on a single Region of 10G, which is obviously unreasonable. At this point, you need to adjust the parameter hbase.hregion.max.filesize, and increase this value moderately to 20g or 30g. In fact, the hard disk that a single physical machine can configure is getting larger and larger. For example, 36T is already very common. If you want to use all the capacity to store data, and still assume that there are 100Region distributed on a RegionServer, then the size of each Region will reach a terrible 120g, once the implementation of Compaction will be a disaster.

New concept of the future: Sub-Region

Problem: however, with the continuous decline in hardware costs, a single RegionServer can easily configure 40T + hard disk capacity, if according to the above statement, more and more hard drives are just "spend in the mirror, spend in the water".

1. The community is also aware of this problem and puts forward the concept of Region under the current concept of Sub-Region.

2. It can be simply understood as dividing the current Region into many logically small Sub-Region.

3. Region is still the previous Region, but all previous Compaction based on Region will be executed at a smaller Sub-Region granularity.

4. In this way, a single Region can be configured with a large size, such as 50G and 100G, and more data can be stored on a single RegionServer.

5. Personally, I think Sub-Region function will be a focus of HBase development.

Question 1 asked by your partner:

A very good blog post; do you have any solutions to the problem of large server performance gap in the cluster? There are physical machines (64 cores) and virtual machines (2 cores) in the cluster. Can the number of region borne by the physical machine be 32 times that of the virtual region? by default, the number of region borne by each regionserver will tend to be the same. Thank you.

At present, there is no particularly good way to disable automatic balance manual migration of region~.

Question of the small partner 2:

Hello, the author, this formula Java Heap * HeapFractionForMemstore / (MemstoreSize / 2), why MemstoreSize / 2?

It is generally believed that only half of the space in Memstore is full.

Ask

Thank you for your reply. I would also like to ask how the value of "half" is obtained, and some configuration parameters of hbase are estimated according to the usual use experience.

In addition, we are now equipped with about 9T hard drives, Hbase Java Heap = 64g, writing more and reading less.

Hbase.regionserver.global.memstore.size = 0.5

Hbase.regionserver.global.memstore.size.lower.limit = 0.45

Hfile.block.cache.size = 0.25

Hbase.hregion.memstore.flush.size = 256m

Hbase.hregion.memstore.block.multiplier = 8

Hbase.hregion.max.filesize = 15G / / replicate=3, up to 200region

Are these parameters properly configured?

The hard disk is relatively small. Under such a hard disk size, the RegionSize is 15G RegionSize 3 copies, 200G Region is recommended to reserve a certain hard disk for 9T data, so hbase.hregion.max.filesize = 10G may be more appropriate.

Thank you. The hard drive of our equipment is 10T, which has been reserved.

Question 3 of the buddies

Thank you for sharing, which mode is more appropriate for disk arrays in the cluster? JBOD or RAID 0 or something else, how to evaluate it?

The choice of HBase disk array mode theoretically depends on the Hadoop disk array mode. Usually Hadoop may prefer to choose JBOD. Please refer to: http://zh.hortonworks.com/blog/why-not-raid-0-its-about-time-and-snowflakes/

Why hdfs clusters need to choose JBOD disk array mode.

There are two reasons:

Bad disk impact: if it is RAID 0, once a disk is broken, you need to uninstall the disk, and you must remount a new disk to resume reading and writing, otherwise the whole DN needs to be kicked out of the cluster. In the case of JBOD, once a disk is broken, the cluster can automatically detect and restore the normal long tail effect by uninstalling the disk: the read and write performance in the RAID 0 scenario basically depends on the read and write performance of the worst disk, which may have some effects.

Of course, in practice, the specific choice of JBOD or RAID 0 needs to be combined with the scheme provided by the company to determine that RAID 0 has more performance advantages in the short term and JBOD may be better in the long run.

Ask again:

The hard disk mounted on each RegionServer is 10T, and we currently use the RAI 0 disk array. The data is getting colder and hotter over time and is a scenario of writing more and reading less. Due to various reasons, the company uses gigabit network cards, and is now extremely worried about the failure of a certain RegionServer, resulting in insufficient bandwidth of the entire cluster resources. Is this scenario more suitable for JBOD than RAID 0? Will JBOD become a performance bottleneck for writes?

A: a RegionServer failure may lead to insufficient cluster resources and bandwidth bottlenecks in write performance and scan performance under the condition of gigabit network cards, but this scenario has nothing to do with JBOD and RAID 0. There is a problem with the composition of all disks into RAID 0, that is, if the performance of a disk is poor or degraded, it may lead to a decline in scan performance, as well as the impact of bad disks. Writing usually makes full use of all the hard disk bandwidth on a single RS. Generally speaking, all disks are writing, so the optimization of RAID 0 striping on performance improvement is not obvious, so JBOD will not become the bottleneck of write performance.

Ask again:

Can JBOD perform multi-disk concurrent write performance? I don't know if it is better to do RAID 0 alone on each disk, and will the "bad disk effect" also exist?

It is possible to do RAID 0 alone for each disk.

Question 4 of the buddies:

Hello, landlord, if the hbase.hregion.max.filesize setting is too small, then the probability of region splitting increases, and splitting consumes resources, which in turn affects the performance of hbase real-time writes. So on my cluster, I set the hbase.hregion.max.filesize to be very large, with a size of 100g-300g. This size is that I evaluated my total storage, and then set the pre-partition in the table so that regionsize* regionnum=allsize (total storage) And I set up hbase.hregion.majorcompaction=0, which avoids frequent split and major, but in your blog, "then the size of each Region will reach a terrible 120g, once the implementation of Compaction will be a disaster." does this mean the merger of major? I don't have manual major yet. I'm going to try major table on the weekend. I don't know if there will be any problems. Again, if the amount of data to be stored is relatively large and needs to be stored in real time, is there any good way to set up hbase.hregion.max.filesize to avoid splitting and merging?

The execution of compaction by such a large region consumes a lot of io resources and bandwidth resources, which has a great impact on system reading and writing. Why does splitting affect writing?

/ / merging affects both reading and writing, while splitting split has no effect on writing.

Ask again:

What I need to do now is to enter the database in real time. I use the flume+kafka+sparkstreaming hbase, which is a bach every 10 seconds. If the hbase form is doing split or major compaction, my bach will slow down when there are two situations in region. Usually my bach is 5-6 seconds, but in the above two cases, my bach takes 1 to 2 minutes to finish, and it appears once in two minutes, so my streaming will never catch up. The delay of data storage is increasing. I solve the slow problem above is solved on this csdn blog, https://blog.csdn.net/imgxr/article/details/80130456, seems to be your blog, too? If I turn on automatic split, can I enter the library with major compaction in real time?

/ / automatic split can be enabled, but pre-partitioning and rowkey hashing are required. Major compaction recommends manual execution.

Question 5 of the buddies:

Number of Region at Java Heap latitude: Java Heap * HeapFractionForMemstore / (MemstoreSize / 2)

I feel confused about this 2. What is the meaning of this 2?

Answer:

Dividing by 2 means that the average memstore is only half-used and not full.

Question 6 of the small partner:

For a hbase cluster, there is no limit to the number of tables, anyway, as long as the recommended range of the total number of region for all tables is between 20,200 and 200, and the size of a single Region is controlled within 10G~30G, right?

There are no restrictions on the table

Question 6 of the small partner:

I found that HBase has a Replication mechanism. I checked it and found that it is used for HBase cluster backup. It has several uses:

1. Data backup and disaster recovery

2. Data collection

3. Geographical distribution of data.

4. Online data service and offline data analysis

I have a few questions:

1. Is data backup necessary? Hdfs itself does not have a copy function, HBase data will certainly not be lost, unless all the hdfs servers are bombed, it is possible to lose data, so I think the Replication mechanism for data backup should be meaningless

2. Disaster recovery. I guess, for example, I have two computer rooms, each of which is a HBase cluster, one providing services and one as a backup, and then the first computer room is paralyzed, such as the network cable is cut off, and the second computer room is used as the main service immediately, so the recovery is fast.

3. What is the geographical distribution of data? I haven't found this either. Can you tell me briefly?

If you use three copies of replication for data backup, and if you do it well enough (across racks and computer rooms), it really doesn't make much sense to use HDFS for data backup. Replication may play a greater role in serving the geographical distribution of highly available data. I understand that it should be deployed across computer rooms in different places.

Question 7 of the small partner:

"in special scenarios, once the number of Region exceeds a threshold, it will result in the entire RegionServer level of flush, seriously blocking users from reading and writing." If the number of region exceeds a threshold, it means that if the number of region is more, then there will be more memstore. If the total amount exceeds the set percentage of the total rs, it will lead to rs level flush? Or is the number of hlog files in rs exceeding the limit?

Well, the first way to understand that the total actual occupied memstore size exceeds the percentage set by rs leads to rs-level flush

Reference link: http://hbasefly.com/2016/08/22/hbase-practise-cluster-planning/

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.