Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to understand the HBase Scan process

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to understand the HBase Scan process". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn how to understand the HBase Scan process.

At present, the reading process of HBase seems to be complicated, mainly due to:

The table data of HBase is divided into several levels: HRegion- > HStore- > [HFile,HFile,...,MemStore]

RegionServer's LSM-Like storage engine continues to flush to generate new HFile and new MemStore for subsequent data writing, and in order to prevent performance degradation caused by too many files to be scanned during Scan due to too much HFile, background threads will timely generate new HFile and delete HFile completed by Compact.

Various optimizations in the specific implementation, such as lazy seek optimization, lead to code complexity.

The reading process is full of various Scanner, as shown in the following figure:

+-+ | | +-+ RegionScanner+-+ | +-+-+ | | +-vicifurome + +-vaflur + +-vicifurome + | | | StoreScanner | | StoreScanner | | StoreScanner | +-+-- +-+-- + | | +-- + | +-+ | | | +-v StoreFileScanner + +-v Muhashi + +-v Murray + | StoreFileScanner | | StoreFileScanner | | MemStoreScanner | | | +-+-+ | | | +-v talk + +-v talk + +-v-" + | HFileScanner | | HFileScanner | | HFileScanner | +- -+

In HBase, a table can have multiple Column Family, and in the process of a Scan, a StoreScanner object is responsible for reading data from each Column Family (later called Store). The data of each Store consists of an in-memory MemStore and a HFile file on disk. Correspondingly, the StoreScanner object employs a MemStoreScanner and N StoreFileScanner to actually read the data.

Logically, reading a row of data requires

Read each Store sequentially

For each Store, merge the relevant HFile under the Store and the MemStore in memory

In terms of implementation, both steps are done through the heap. RegionScanner is read through a heap composed of several StoreScanner below

Done, represented by the member variable KeyValueHeap storeHeap of RegionScanner

The multiple Scanner that make up the StoreScanner are obtained in the RegionScannerImpl constructor:

For (Map.Entry entry: scan.getFamilyMap (). EntrySet ()) {Store store = stores.get (entry.getKey ()); / / actually StoreScanner type KeyValueScanner scanner = store.getScanner (scan, entry.getValue (), this.readPt); if (this.filter = = null | |! scan.doLoadColumnFamiliesOnDemand () | | this.filter.isFamilyEssential (entry.getKey () {scanners.add (scanner) } else {joinedScanners.add (scanner);}}

Inside store.getScanner (scan, entry.getValue (), this.readPt) is a StoreScanner of new, and the logic is all in the constructor of StoreScanner.

Inside the constructor, you actually find the relevant HFile and MemStore, and then build the heap. Note that this heap is StoreScanner-level, a StoreScanner, a heap, and the elements in the heap are the StoreFileScanner and MemStoreScanner corresponding to the HFile and MemStore contained below.

Get the relevant HFile and MemStore logic in StoreScanner::getScannersNoCompaction (), the internal filter will filter out the unwanted HFile according to the TimeRange,KeyRange specified by the request, and also use bloom filter to filter out the unwanted HFIle. Next, call the

SeekScanners (scanners, matcher.getStartKey (), explicitColumnQuery & & lazySeekEnabledGlobally, isParallelSeekEnabled)

The seek,seekKey for these StoreFileScanner and MemStoreScanner is matcher.getStartKey ()

The structure is as follows

Return new KeyValue (row, family, null, HConstants.LATEST_TIMESTAMP, Type.DeleteFamily); Seek semantics

Seek is for KeyValue. The semantics of seek is seek to the specified KeyValue. If the specified KeyValue does not exist, then seek to the next of the specified KeyValue

One. For example, suppose there are two columns an and b in a column family named X, and two lines of rowkey in the file are aaa and

Bbb, as shown in the following table.

Column Family X

Rowkeycolumn acolumn baaa1abcbbb2def

If the HBase client sets the start key of the scan request to aaa, then matcher.getStartKey () will be initialized to (rowkey, family, qualifier,timestamp,type) = (aaa,X,null,LATEST_TIMESTAMP,Type.DeleteFamily). According to the comparison principle of KeyValue, this KeyValue is more than the first column an of the aaa line.

Small (because there is no qualifier), so for this StoreFileScanner seek, it will seek to the first column an of the aaa line

Actually

SeekScanners (scanners, matcher.getStartKey (), explicitColumnQuery & & lazySeekEnabledGlobally, isParallelSeekEnabled)

It is possible not to actually seek StoreFileScanner, but to do the work of lazy seek,seek until it has to be done. Later, we will specifically talk about lazy seek.

The above gets the corresponding StoreScanner of all the column family involved in the request scan, and then calls the following function to build the heap:

Protected void initializeKVHeap (List scanners, List joinedScanners, HRegion region) throws IOException {this.storeHeap = new KeyValueHeap (scanners, region.comparator); if (! joinedScanners.isEmpty ()) {this.joinedHeap = new KeyValueHeap (joinedScanners, region.comparator);}}

KeyValueScanner is an interface that represents a KeyValue that can be iterated out.

Both Scanner,StoreFileScanner,MemStoreScanner and StoreScanner implement this interface. The comparator type here is KVScannerComparator, which is used to compare two KeyValueScanner. Actually, KVComparator is used internally, which is used to compare two KeyValue. As you can see from the back, in fact, the characteristic of this heap made up of KeyValueScanner, the heap top KeyValueScanner, is that its KeyValue is the smallest.

The heap is represented by the class KeyValueHeap to see what the KeyValueHeap constructor has done

KeyValueHeap (List

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report