Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What do cache and batch mean in hbase

2025-01-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly shows you "what is the meaning of cache and batch in hbase". The content is easy to understand and clear. I hope it can help you solve your doubts. Let the editor lead you to study and learn "what is the meaning of cache and batch in hbase".

Cache:

By default, if you need to query data from hbase, when getting the result ResultScanner, hbase performs a RPC operation on each Row returned each time you call the ResultScanner.next () operation. Even if you use ResultScanner.next (int nbRows), you can only call the RsultScanner.next () operation iteratively on the client side, you can understand that hbase will execute the query request in iterator mode, the query operation will actually be performed when the next () operation is executed, and a RPC operation will be performed for each Row.

So it's obvious to think that if I only make a single RPC call to multiple Row that return query results, then the actual communication overhead will be reduced. This is the origin of the hbase configuration property "hbase.client.scanner.caching". Setting cache can be displayed either statically in the hbase configuration file or dynamically in the program.

Cache is worth setting is not the bigger the better, you need to make a balance. The higher the value of cache, the higher the performance of the query, but at the same time, each call to next () operation takes longer, because the more data is obtained and the larger the amount of data, the longer it takes to transfer to the client. Once you exceed the value owned by maximum heap the client process, you will report an outofmemoryException exception. When transferring rows data to the client, a ScannerTimeOutException exception is thrown if it takes too long.

Batch:

In the case of cache, we are generally talking about relatively small row, so what should we do if a Row is particularly large? Be aware that as the value of cache increases, the memory consumed in client process increases with the increase of row. A similar operation is also provided in hbase to address this situation: Batch. It can be understood that cache is a row-oriented optimization process and batch is a column-oriented optimization process. It is used to control how many columns will be returned each time the next () operation is called, for example, if you set setBatch (5), then each Result instance will return 5 columns, if your number of columns is 17, then you will get four Result instances, each containing 5 columns.

RPCs= (Rows* Cols per Row) / Min (Cols per Row, Batch size) / Scanner caching

The above is all the content of the article "what is the meaning of cache and batch in hbase". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report