Detailed explanation of hbase read and write request 07/19 Update SLTechnology News&Howtos

Detailed explanation of hbase read and write request

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Thursday, February 28, 2019

hbase read/write request

Detailed explanation of hbase reading and writing process

read request process

1. The client finds the RegionServer where the target data is located through ZooKeeper, -ROOT-table and.META. table (that is, the host address of the Region where the data is located).

2. zk returns the result to the client

3. Contact RegionServer to query target data

4. RegionServer locates the Region where the target data is located and sends a query request.

5. Region First search in Memstore, hit and return

6. If you can't find it in Memstore, scan it in StoreFile. BloomFilter is applied to quickly determine whether the data to be queried is in this StoreFile.

Bloom Filter: quickly determines whether an element is in a large set, but it has a weakness: it has a certain false positive rate.

(Misjudgment rate: Bloom filter may judge that there is no element in the set, but if Bloom filter judges that an element does not exist in the set, then the element must not be in the set)

BlockCache

BlockCache is called read cache.

HBase will cache the Block block of a file lookup into Cache, so that subsequent requests for the same or adjacent data lookup can be obtained directly from memory to avoid expensive IO operations.

This section refers to the link: blog.51cto.com/12445535/2363376

hbase write request

//To learn more about hbase write requests, please read: hbase data write process in-depth analysis https://blog.51cto.com/12445535/2370653

Write Request Processing Summary

1 Client submits write request to region server

2 region server Find target region

3 region Check if data is consistent with schema

4 If client does not specify version, get current system time as data version

5 Write updates to WAL log

6 Write updates to Memstore

7 Determine if Memstore needs flush as Store file.

Hbase When doing data insertion operations, first find the Region corresponding to RowKey. How to find it? This is easy because the. META. table stores the starting RowKey for each Region in each table.

Suggestion: Do mass data insertion operation, avoid putting operation of incrementing rowkey

If all RowKeys of the put operation are incremented, then imagine that when a part of the data is inserted, it happens to split, and then all the data after the split starts to be inserted into the second Region, resulting in data hot spots.

Write Request Process//Details Description

As mentioned above, hbase uses MemStore and StoreFile to store updates to tables.

1. When updating (writing), data is first written to Log(WAL log) and Memory (MemStore). The data in MemStore is sorted. When MemStore accumulates to a certain threshold, a new MemStore will be created, and the old MemStore will be added to the flush queue, which will be flushed to disk by a separate thread and become a StoreFile.

At the same time, the system will record a redo point in zookeeper, indicating that the changes before this time have been persisted. (minor compact)

3, when the system accident, may lead to memory (MemStore) data loss, then use Log(WAL log) to recover the data after the checkpoint.

As mentioned earlier, StoreFile is read-only and cannot be modified once it is created. So Hbase updates are really incremental operations.

5. When the StoreFile in a Store reaches a certain threshold, a major compact will be carried out to merge the modifications to the same key together to form a large StoreFile. When the StoreFile size reaches a certain threshold, the StoreFile will be split and divided into two StoreFiles.

6, because the update to the table is constantly appended, when processing read requests, you need to access all StoreFile and MemStore in the Store, merge them according to row key, because StoreFile and MemStore are sorted, and StoreFile has an in-memory index, the merge process is still relatively fast.

Tip:

Client Write-> Store in MemStore until MemStore is full-> Flush into a StoreFile until it grows to a certain threshold-> Trigger Compact Merge operation-> Merge multiple StoreFiles into a StoreFile, merge versions and delete data at the same time-> When StoreFiles are Compact, gradually form larger and larger StoreFiles-> Trigger Split operation when the size of a single StoreFile exceeds a certain threshold. Divide the current Region into 2 Regions, Region will be offline, and the 2 child Regions from the new Split will be assigned to the corresponding HRegionServer by HMaster, so that the pressure of the original 1 Region can be distributed to 2 Regions. From this process, we can see that HBase only adds data, and there are obtained update and deletion operations, which are all done in Compact phase. Therefore, User write operations only need to enter memory and immediately return, thus ensuring high I/O performance.

The process of writing data supplements:

Working mechanism: Every HRegionServer has an HLog object, HLog is a class that implements Write Ahead Log. Every time a user writes to Memstore, a piece of data is also written to the HLog file. The HLog file periodically scrolls new files and deletes old files (data that have been persisted to StoreFile). When HRegionServer terminates unexpectedly, HMaster will sense it through ZooKeeper. HMaster will first process the legacy HLog file, split the log data of different Regions into corresponding Region directories, and then put the invalid Region into the corresponding Region directory.(With logs just split) Redistribute, HRegionServer that receives these Regions will find that there are historical HLogs to process during the process of loading Regions, so it will replay the data in HLog to MemStore, and then flush to StoreFiles to complete the data recovery.

Reference link: www.cnblogs.com/qingyunzong/p/8692430.html

http://hbasefly.com/2016/03/23/hbase_writer/

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.