Brief Analysis of HBase Client API 04/28 Update SLTechnology News&Howtos

Brief Analysis of HBase Client API

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

I watched HBase's Client API once in 2 hours on the plane yesterday, and I have a few lessons:

1. It is best to close autoFlush and set WriterBuffer reasonably when Put Mini record:

Because each Put has to make a RPC call + WAL (shutdown is very large for write promotion) + Server side processing, if for large batches of small data writes, the time consumed by RPC's RTT will become the write loss point, so it can be submitted in batches through local buffer. The default WriteBuffer size is 2MB. When autoFlush is turned off, the client put will be written to an ArrayList every 10 times. When size exceeds WriteBuffer size, flushCommit will be performed. The Put of WB will be grouped according to RS, and each RS will be processed by RPC call.

When submitted to the server, if an exception occurs, the Put that has been written in the WB will be deleted, and the failed submission will be retained for exception handling.

However, the size of WB needs to be set properly, because it takes up local and RS memory.

The local memory footprint is well estimated, while the maximum memory consumption on the server side is: hbase.client.write.buffer * hbase.regionserver.handler.count * number ofregion server

Batch/cache settings for 2.Scanner:

The specific processing flow of Scan is shown below:

The setting of Caching mainly affects the call of RSnext (which can be understood as "row-oriented" batch), while batch is the number of keyvalue obtained by RSRegionScanner per nextInternal (which can be understood as "column-oriented" batch).

Therefore, the specific number of times SCAN calls RPC is determined by two parameters = total number of cells / (caching*min (batch,cells/row))

The next (n) of scanner here is actually similar to the fetch in MYSQL JDBC. It is simulated in loop on the client side, rather than batch fetch on the server side. In fact, the scan here is very similar to the cursor in mysql, so you can understand one and the other is natural.

However, the same problem with WB is memory consumption and network transmission, which is turned off when it is finished.

Handling of 3.HConnection:

Referred to as HC, are generated by the HCManager of shared, and a HC is stored in the MAP type of HBASE_INSTANCES of HCManager, that is to say, the same Client+Conf shares the HC. One advantage is that the ZK connection is shared first. In fact, when split/merge, only one HC is OK.

The disadvantage is that these connections remain until the client process exits, resulting in a ZK connection super-maxClientCnxns exception.

4.Coprocessor:

Similar comparison of MySQL's trigger and procedure. I'll explain it in more detail later.

5.Counter

This counter is very easy to use, but is it a little heavy to use HBase to count compared to redis: P

6.RowLock

This is supposed to be banned, the RS killer. You can put rpc handler hold in lease.period....

7. Manage API:

Split/Compact operation and maintenance tools:)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.