The principle of Hbase data storage and the process of reading and writing data 07/12 Update SLTechnology News&Howtos

The principle of Hbase data storage and the process of reading and writing data

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

1. The data storage principle of HBase.

A HRegionServer is responsible for managing many region, a * region contains many store, and a column family is divided into a store** if there is only one column family in a table, then there is only one store in each region. If there are N column families in a table, then there are N store in each region, and only one memstorememstore in each store is a memory area. The written data is first written to memstore for buffering, and then the data is brushed to disk.

There are many StoreFile in a store, and the final data is saved on the HDFS in many HFile data structure files.

StoreFile is an abstract object of HFile. If StoreFile is said to be HFile, every time memstore writes data to disk, a corresponding new HFile file will be generated.

2. HBase read data flow

Description: HBase cluster, only one meta table, this table has only one region, the region data is saved on a HRegionServer

1. The client first connects with zk; finds the region location of the meta table from zk, that is, the data of the meta table is stored on a certain HRegionServer; the client establishes a connection with this HRegionServer, and then reads the data in the meta table; the region information of all user tables is stored in the meta table, and we can view the meta table information 2 according to the namespace, table name and rowkey information to be queried through scan 'hbase:meta'. Find the region information corresponding to the written data 3, find the regionServer corresponding to the region, then send request 4, find and locate the corresponding region5, first look up the data from the memstore, if not, read the memory of the Regionserver on the HBase from the BlockCache is divided into two parts: one part is used as Memstore, mainly used for writing; the other part, as BlockCache, is mainly used to read data 6. If it is not found in BlockCache, read the data on StoreFile and read the data from storeFile. Instead of directly returning the result data to the client, the data is first written to BlockCache in order to speed up the subsequent query. Then the result is returned to the client. 3. HBase write data flow

1. The client first finds the region location of the meta table from zk, and then reads the data in the meta table. The region information of the user table is stored in the meta table.

2. According to namespace, table name and rowkey information. Find the region information corresponding to the written data

3. Find the regionServer corresponding to the region, and send the request

4. Write the data to HLog (write ahead log) and memstore respectively

5. When memstore reaches the threshold, the data is brushed to disk to generate storeFile files.

6, delete HLog historical data supplement: HLog (write ahead log): also known as Wall means Write ahead log, similar to binlog in mysql, used for disaster recovery, HLog records all changes in data, once the data is modified, it can be recovered from log.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.