Summary of performance optimization of HBasetrillionaire storage 07/19 Update SLTechnology News&Howtos

Summary of performance optimization of HBasetrillionaire storage

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

The main cluster of hbase has been running steadily in the production environment for one and a half years, the largest number of region in a single table has reached more than 7200, and ten billion items have been added to the inventory every day. The understanding of hbase has experienced a process of ignorance to familiarity. In order to cope with the pressure of business data, hbase storage is also upgraded from the initial single machine multi-thread to distributed storage with disaster recovery mechanism. In order to detect problems in the cluster as early as possible, an alarm system is developed to comprehensively monitor hbase cluster services and applications. Summarizing some experience in hbase optimization (for version 0.94) is also a description of hbase work over the past two years.

Server side

The default value for the number of threads requested by 1.hbase.regionserver.handler.count:rpc is 10, which is recommended for production environments. It is not better to use 100 as large as possible, especially when the request content is very large, such as scan/put several megabytes of data, which will take up too much memory, which may lead to frequent GC and even memory overflows.

2.hbase.master.distributed.log.splitting: the default value is true, and it is recommended to set it to false. Turn off hbase's distributed log slicing. When log needs replay, master is responsible for playback.

3.hbase.regionserver.hlog.splitlog.writer.threads: default value is 3, recommended setting is 10, and the number of threads used for log cutting

4.hbase.snapshot.enabled: snapshot function. Default is false (off). It is recommended to set it to true. Especially for some key tables, it is a good choice to use snapshots to back up regularly.

5.hbase.hregion.max.filesize: the default is 10 GB. If the StoreFile in any column familiy exceeds this value, the Region will be split in two, because the region split will have a short region offline time (usually less than 5 seconds). To reduce the impact on the business, it is recommended to manually time the split, which can be set to 60 GB.

The interval between region master merging of 6.hbase.hregion.majorcompaction:hbase is 1 day by default. It is recommended to set it to 0. Automatic major master merging is prohibited. Major merging will rewrite all storefile under a store into a storefile file. In the process of merging, the data marked with deletion will be deleted. In the production cluster, the master merge can last for several hours, in order to reduce the impact on business. It is recommended that major mergers be done manually or through scripting or api periodically during business troughs.

7.hbase.hregion.memstore.flush.size: the default value is 128m, in bytes. Once memstore exceeds this value, it will be flush. If the jvm memory of regionserver is sufficient (more than 16G), it can be adjusted to 256m.

8.hbase.hregion.memstore.block.multiplier: default is 2. If the memory size of a memstore has exceeded hbase.hregion.memstore.flush.size * hbase.hregion.memstore.block.multiplier, the write operation of the memstore will be blocked. To avoid blocking, it is recommended to set it to 5. If it is too large, there is a risk of OOM. If the message "Blocking updates for'on region: memstore size is > = than blocking size" appears in the regionserver log, it is time to adjust this value.

9.hbase.hstore.compaction.min: the default value is 3. If the total number of storefile in any store exceeds this value, the default merge operation will be triggered. You can set 5x8 to merge storefile files in a manual periodic major compact to reduce the number of merges, but this will extend the merging time. The previous corresponding parameter is hbase.hstore.compactionThreshold.

10.hbase.hstore.compaction.max: the default value is 10, and the maximum number of storefile merged at a time is to avoid OOM.

11.hbase.hstore.blockingStoreFiles: the default is 7, if any store (not. META. If the number of storefile files in the store in the table is greater than this value, then split or compact is performed before flush memstore, and the region is added to the flushQueue to delay refreshing. During this period, the write operation will be blocked until the compact completes or exceeds the time configured by hbase.hstore.blockingWaitTime (default 90s). It can be set to 30 to prevent memstore from failing to flush in time. When a large number of "Region has too many store files; delaying flush up to 90000ms" appears in the regionserver operation log, this value needs to be adjusted.

12.hbase.regionserver.global.memstore.upperLimit: the default value of 0.4 is the proportion of upper in total memory occupied by all memstore. When this value is reached, the region that needs flush most will be found from the whole regionserver for flush until the total memory ratio is reduced below this number, and the default value can be used.

13.hbase.regionserver.global.memstore.lowerLimit: the default value is 0.35, and the default value is fine.

14.hbase.regionserver.thread.compaction.small: the default value is 1. The number of threads in the thread pool when Minor Compaction is done by regionserver can be set to 5.

15.hbase.regionserver.thread.compaction.large: the default value is 1. The number of threads in the thread pool when Major Compaction is done by regionserver can be set to 8.

16.hbase.regionserver.lease.period: the default is 60000 (60s). The lease timeout for the client to connect to regionserver must be reported by the client within this time, otherwise the client will be considered dead. This had better be adjusted according to the actual business situation.

17.hfile.block.cache.size: the default value of 0.25 is the memory limit of block cache of regionserver. In read-biased business, you can increase this value appropriately. It is important to note that the sum of hbase.regionserver.global.memstore.upperLimit and hfile.block.cache.size must be less than 0.8.

18.dfs.socket.timeout: the default value is 60000 (60s). It is recommended to make a reasonable setting based on the exception found in the log monitoring of the actual regionserver. For example, if we set it to 900000, you need to change the hdfs-site.xml at the same time if you modify this parameter.

19.dfs.datanode.socket.write.timeout: default is 480000 (480s). Sometimes when regionserver does merge, datanode write timeout may occur, 480000 millis timeout while waiting for channel to be ready for write. Modification of this parameter requires changing hdfs-site.xml at the same time.

Jvm and garbage collection parameters:

Export HBASE_REGIONSERVER_OPTS= "- Xms36g-Xmx36g-Xmn1g-XX:+UseParNewGC-XX:+UseConcMarkSweepGC-XX:+UseCMSCompactAtFullCollection-XX:CMSFullGCsBeforeCompaction=15-XX:CMSInitiatingOccupancyFraction=70-verbose:gc-XX:+PrintGCDetails-XX:+PrintGCTimeStamps-Xloggc:/data/logs/gc-$ (hostname)-hbase.log"

Due to the large memory of our server (96G), we give part of regionserver jvm memory up to 64G, so far, it has not happened once that full gc,hbase has really made a lot of efforts in memory use control, such as the implementation of various blockcache, careful students can look at the source code.

Client end

1.hbase.client.write.buffer: default is 2m, and write cache size is recommended. It is recommended to set to 5m in bytes. Of course, the larger the size, the more memory it takes. In addition, the storage performance under 10m has been tested, but it is not as good as 5m.

2.hbase.client.pause: default is 1000 (1s). If you want to read or write with low latency, it is recommended to set it to 200. this value is usually used for failed retry, region search, etc.

3.hbase.client.retries.number: the default is 10, and the maximum number of retries on the client is 11. Combined with the above parameters, the total retry time is 71s.

4.hbase.ipc.client.tcpnodelay: default is false. It is recommended to set it to true and disable message buffering.

5.hbase.client.scanner.caching:scan cache, default is 1, to avoid taking up too much memory of client and rs. Generally, it is reasonable to be less than 1000. If a piece of data is too large, you should set a small value, usually setting the number of data items for a query of business requirements.

If scanning the data is not helpful for the next query, you can set the setCacheBlocks of scan to false to avoid using caching

6.table needs to be closed after use. Close scanner.

7. Limit the scan range: specify column clusters or columns to query, specify startRow and endRow

8. Using Filter can greatly reduce network consumption

9. Through java multi-thread storage and query, and control the timeout. Later, I will share the code of my hbase stand-alone multithreaded storage.

10. Points for attention in building a table:

Turn on compression

Reasonable design of rowkey

Pre-partition

Turn on bloomfilter

Zookeeper tuning

1.zookeeper.session.timeout: the default value is 3 minutes, which cannot be configured too short to avoid session timeout and hbase stops service. The online production environment is configured for 1 minute. If it is too long, when regionserver dies, zk will have to wait for this timeout (fixed by patch). As a result, master cannot migrate region in time.

Number of 2.zookeeper: 5 or 7 nodes are recommended. Give each zookeeper about 4G of memory, preferably a separate disk.

The maximum number of connections for 3.hbase.zookeeper.property.maxClientCnxns:zk. Default is 300, which does not need to be adjusted.

4. If the swappiness of the operating system is set to 0, the swap partition will be used only if there is not enough physical memory, and it will take more time to avoid GC recycling. When the session timeout of zk is exceeded, there will be a false alarm of regionserver downtime.

Hdfs tuning

1.dfs.name.dir:namenode data storage address, which can be configured on different disks and configured with a nfs remote file system, so that namenode data can have multiple backups.

The number of processing threads of the 2.dfs.namenode.handler.count:namenode node RPC. Default is 10, which can be set to 60.

The number of processing threads of the 3.dfs.datanode.handler.count:datanode node RPC. Default is 3, which can be set to 30.

The maximum number of files processed by 4.dfs.datanode.max.xcievers:datanode at the same time. Default is 256and can be set to 8192.

Other

Column family names, column names, and rowkey are all stored in hfile, so these items should be kept as short as possible when designing the table structure.

The number of region in regionserver should not exceed 1000. Too much region will result in a lot of memstore, which may lead to memory overflow and increase the time consuming of major compact.

For reprint, please indicate the original link: http://blog.csdn.net/odailidong/article/details/41794403

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.