In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces what is the use of Scan class attribute maxResultSize in HBase. It is very detailed and has certain reference value. Friends who are interested must finish it!
If you have seen the startup process of HRegionServer, you will find that he also has a similar attribute maxScannerResultSize (set through hbase.client.scanner.max.result.size in the configuration file). In fact, this value is the default value of maxResultSize when Scan scans. What exactly is the use of this maxResultSize? let's take a look at the following source code (extracted from hbase0.98.9 HRegionServer's scan method):
LOG.info ("* 4444*maxResultSize:" + maxResultSize + "; rows:" + rows); synchronized (scanner) {while (I)
< rows) { // Stop collecting results if maxScannerResultSize is set and we have exceeded it if ((maxScannerResultSize < Long.MAX_VALUE) && (currentScanResultSize >= maxResultSize) {LOG.info ("* kreak I:" + I); break;} / / Collect values to be returned here boolean moreRows = scanner.nextRaw (values) If (! values.isEmpty ()) {for (Cell cell: values) {KeyValue kv = KeyValueUtil.ensureKeyValue (cell); LOG.info ("* kv:" + kv + "; kv.heapSize ():" + kv.heapSize ()) LOG.info ("* currentScanResultSize:" + currentScanResultSize); currentScanResultSize + = kv.heapSize (); totalKvSize + = kv.getLength ();} results.add (Result.create (values)); iTunes + } if (! moreRows) {break;} values.clear ();}}
You will see some debug code, no wonder.
Look at "if ((maxScannerResultSize)"
< Long.MAX_VALUE) && (currentScanResultSize >(= maxResultSize)) "A very important judgment of this conditional judgment statement is currentScanResultSize > = maxResultSize, and the currentScanResultSize here is actually the statistics of the bytes of all KeyValue in each row, meaning that when maxResultSize is set in the Scan object (if not set, the default value is maxScannerResultSize of HRegionServer), when scanning the data in HRegionServer, the bytes statistics of the checked data will be compared with this value, and the result is that if the maxResultSize is relatively small Then 10 pieces of data that can be queried at a time need to be divided into multiple queries. The value of its maxResultSize will not affect the result of the query, but will only affect the number of remote calls initiated by scan. This may be a bit abstract. Let me give you an example:
There are records in my HBase database: row-10,row-11,...,row-19,row-20,row-21,...,row-29,...,row-91,row-92,...,row-99
You can get the same result by querying in two of the following three ways:
Keyvalues= {row-10/colfam1:col-5/1423054405356/Put/vlen=8/mvcc=0, row-10/colfam2:col-33/1423054405467/Put/vlen=9/mvcc=0} keyvalues= {row-100/colfam1:col-5/1423054437916/Put/vlen=9/mvcc=0, row-100/colfam2:col-33/1423054437979/Put/vlen=10/mvcc=0} keyvalues= {row-11/colfam1:col-5/1423054405753/Put/vlen=8/mvcc=0 Row-11/colfam2:col-33/1423054405869/Put/vlen=9/mvcc=0} keyvalues= {row-12/colfam1:col-5/1423054406160/Put/vlen=8/mvcc=0, row-12/colfam2:col-33/1423054406268/Put/vlen=9/mvcc=0} keyvalues= {row-13/colfam1:col-5/1423054406541/Put/vlen=8/mvcc=0, row-13/colfam2:col-33/1423054406646/Put/vlen=9/mvcc=0} keyvalues= {row-14/colfam1:col-5/1423054406937/Put/vlen=8/mvcc=0 Row-14/colfam2:col-33/1423054407028/Put/vlen=9/mvcc=0} keyvalues= {row-15/colfam1:col-5/1423054407305/Put/vlen=8/mvcc=0, row-15/colfam2:col-33/1423054407424/Put/vlen=9/mvcc=0} keyvalues= {row-16/colfam1:col-5/1423054407715/Put/vlen=8/mvcc=0, row-16/colfam2:col-33/1423054407813/Put/vlen=9/mvcc=0} keyvalues= {row-17/colfam1:col-5/1423054408084/Put/vlen=8/mvcc=0 Row-17/colfam2:col-33/1423054408198/Put/vlen=9/mvcc=0} keyvalues= {row-18/colfam1:col-5/1423054408490/Put/vlen=8/mvcc=0, row-18/colfam2:col-33/1423054408598/Put/vlen=9/mvcc=0} keyvalues= {row-19/colfam1:col-5/1423054408895/Put/vlen=8/mvcc=0, row-19/colfam2:col-33/1423054409007/Put/vlen=9/mvcc=0} keyvalues= {row-2/colfam1:col-5/1423054402056/Put/vlen=7/mvcc=0, row-2/colfam2:col-33/1423054402181/Put/vlen=8/mvcc=0}
Method 1:
Scan scan3 = new Scan (); scan3.setCaching (9) Scan3.addColumn (Bytes.toBytes ("colfam1"), Bytes.toBytes ("col-5")) .addColumn (Bytes.toBytes ("colfam2"), Bytes.toBytes ("col-33")) .setStartRow (Bytes.toBytes ("row-10")) .setStopRow (Bytes.toBytes ("row-20")) ResultScanner scanner3 = table.getScanner (scan3); for (Result res: scanner3) {System.err.println (res);} scanner3.close ()
Method 2:
Scan scan3 = new Scan (); / / scan3.setCaching (9) Scan3.addColumn (Bytes.toBytes ("colfam1"), Bytes.toBytes ("col-5")) .addColumn (Bytes.toBytes ("colfam2"), Bytes.toBytes ("col-33")) .setStartRow (Bytes.toBytes ("row-10")) .setStopRow (Bytes.toBytes ("row-20")) .setMaxResultSize (5) ResultScanner scanner3 = table.getScanner (scan3); for (Result res: scanner3) {System.err.println (res);} scanner3.close ()
Method 3:
Scan scan3 = new Scan (); scan3.setCaching (9) Scan3.addColumn (Bytes.toBytes ("colfam1"), Bytes.toBytes ("col-5")) .addColumn (Bytes.toBytes ("colfam2"), Bytes.toBytes ("col-33")) .setStartRow (Bytes.toBytes ("row-10")) .setStopRow (Bytes.toBytes ("row-20")) .setMaxResultSize (5) ResultScanner scanner3 = table.getScanner (scan3); for (Result res: scanner3) {System.err.println (res);} scanner3.close ()
The difference between method 1 and method 2 is that scan in method 1 sets the caching property to 9, while method 2 does not set this property, but sets the maxResultSize property.
The difference between method 2 and method 3 is that method 3 sets the caching attribute to 9 on the basis of method 2.
Based on the above example, make the following points:
1. If you do not set the caching attribute of scan, in this case, to query the attributes from row-10 to row-20, you need to initiate at least 11 remote accesses in client to obtain data from HRegionServer, and query only one record at a time.
2. For maxResultSize, it only works for one remote access to client. If only one piece of data is fetched from a remote call, the setting of this value is meaningless. For batch data acquisition, that is, after Scan sets caching, this value will have a limiting effect. For example, in this example, Scan sets caching to 9 and maxResultSize to 5, and you can know in advance that the bytes of each row of data is 112.Under this condition, combined with the restriction code in the scan method in HRegionServer, even if Scan sets caching to 9, only one record can be fetched for a remote call because "if (maxScannerResultSize)"
< Long.MAX_VALUE) && (currentScanResultSize >= maxResultSize) "when performing this logical check, it was break and jumped out of the loop. Therefore, Scan sets caching to 9. Ideally, 12 records can be fetched after two remote calls, but since maxResultSize is set to 5, only one record can be returned when checking the number of bytes that can be returned by each remote call.
3. Method 2 has exactly the same effect as method 3. Method 1 only needs client to initiate three remote calls to get the required data.
4. The meaning of maxResultSize: limit the total number of bytes fetched by client from HRegionServer each time, and the total number of bytes is calculated by KeyValue of row.
The above is all the content of the article "what is the use of Scan class attribute maxResultSize in HBase". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.