Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to tune HBase

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

Editor to share with you how to tune HBase, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

1. Design of the watch

1.1 create multiple Region in advance

By default, an Region partition is automatically created when the HBase table is created, and when the data is imported, all HBase clients write to this Region until the Region is large enough. One way to speed up batch writes is to create some empty Regions in advance, so that when the data is written to HBase, it will load balance the data in the cluster according to the Region partition.

1.2 Row Key Design

Row key in HBase is used to retrieve records in a table. The following three methods are supported:

(1) access through a single RowKey: Get operation according to a certain RowKey key value

(2) Scan through the Range of RowKey: scan within this range by setting Start RowKey and End RowKey

(3) full table scan: that is, directly scan all row records in the whole table.

In HBase, RowKey can be any string with a maximum length of 64KB. In practical applications, it is generally 10~100bytes, stored as an array of byte [] bytes, and is generally designed to be of fixed length.

RowKey is stored in lexicographic order, so when designing a RowKey, you should take full advantage of this sorting feature by storing data that is often read together and data that may be accessed recently.

For example: if the data recently written to the HBase table is most likely to be accessed, consider using the timestamp as part of the RowKey. Because it is lexicographically sorted, you can use Long.MAX_VALUE-timestamp as the RowKey, which ensures that the newly written data can be quickly hit when read.

1.3 Column Family design

Don't define too many column family in one table. Currently, Hbase does not handle tables with more than 2 to 3 Column family very well. Because when a Column Family is in Flush, its adjacent Column Family will also be triggered by Flush because of the correlation effect, which eventually leads to more IBO in the system.

1.4 In Memory settings (optional)

When you create a table, you can use HColumnDescriptor.setInMemory (true) to put the table in the RegionServer cache to ensure that it is hit by cache when reading.

1.5 version limit

When creating a table, you can set the maximum version of the data in the table through HColumnDescriptor.setMaxVersions (int maxVersions), and for some data that is not particularly important, you can set setMaxVersions (1).

1.6 data lifecycle restrictions

When creating a table, you can set the storage life of the data in the table through HColumnDescriptor.setTimeToLive (int timeToLive), and the expired data will be deleted automatically. For example, if you only need to store the data of the last two days, you can set setTimeToLive (2 * 24 * 60 * 60).

1.7 Compact and Split (optional)

In practical applications, Compact and split can be triggered manually if necessary.

1.8 use compression (optional)

Compression needs to be measured according to the actual business and machine performance to measure whether to sacrifice CPU in exchange for storage savings, and can save Imax O and network overhead. You can use Lzo or Snappy compression, which can be compressed by 4 times or 5 times.

two。 Read optimization

2.1 scan cache

When scanning, you can set to read more than one bar at a time, cache data, and reduce the cost of Icano. Code implementation:

HTable.setScannerCaching (50); / / Parameter 50 means 50 entries are scanned at once

2.2 scan the specified column

If you specify the required Column Family when Scan, you can reduce the amount of data transmitted over the network, otherwise the default scan operation will return the data of all Column Family in the whole row.

2.3 release resources

After fetching data through scan, remember to close ResultScanner, otherwise there may be problems with RegionServer (the corresponding Server resources cannot be released).

3. Write optimization

3.1 write cache

When writing the HBase table, it is best not to write one item at a time, but to set it in the code in a batch way:

HTable.setAutoFlush (false, false); / / do not let hbase automatically refresh data to the database

HTable.setWriteBufferSize (1024 * 1024 * 10); / / cache size 10m

The flush operation is triggered when the cached data reaches 10m, and when hTable.flushCommits (); or hTable.close (); also flush the data into the database. And the API of Hbase supports the way that data has been inserted in list.

4. Parameter optimization

4.1 connection time

Parameter: zookeeper.session.timeout

The connection timeout between RegionServer and Zookeeper. Default: 3 minutes (180000ms). We configure: 300000ms (5min).

4.2 Thread count control

Parameter: hbase.regionserver.handler.count

The number of IO threads for request processing in RegionServer. The default value is 10, and we configure it to be 200.

4.3 split threshold

Parameter: hbase.hregion.max.filesize

The size threshold of a single region trigger split. Default value: 256m, we configure: 4G.

4.4 enable mslab solution

Parameter: hbase.hregion.memstore.mslab.enabled

Reduce Full GC caused by memory fragmentation and improve overall performance. Default value: true.

4.5 scan cache

Parameter: hbase.client.scanner.caching

The number of data items obtained at a time by scanner calling the next method. The default value is 1.

4.6 MemStore size control

Parameter: hbase.regionserver.global.memstore.upperLimit/lowerLimit

Hbase.regionserver.global.memstore.upperLimit: prevent memstore from being too late to flush into storefile, and the heap takes up too much memory. When all memstore of a region occupies more than 40% of, HBASE will force block all updates (requests) and flush these memstore to release memory.

The default value of hbase.regionserver.global.memstore.lowerLimit is fine, and there is no need to adjust it.

These are all the contents of the article "how to tune HBase". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report