The underlying principle of hbase 07/01 Update SLTechnology News&Howtos

The underlying principle of hbase

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

The underlying logical architecture of 1.hbase (1) Architectural differences between new and old versions of hbase

This is the architecture diagram of the old version of hbase, with only one Hlog in a regionserver.

This is a new version of the diagram, which can have 30 Hlog in each regionserver.

Changes to the old and new versions:

Prior to -0. 96, there was only one HLog for a regionserver, and management metadata had. Meta. -root- two metadata tables.

After -0. 98, a regionserver can have multiple Hlog and manage metadata, only. Meta. Watch.

(2) two special tables in hbase

-.Meat.: records the region mapping information split from all the tables of the user (the rowkey range of each region, as well as the existing nodes),. MEAT. There can be multiple region. Each region sliced out of the corresponding user table corresponds to .meta. A record in the table

-- ROOT-: recorded. META. Table Region information,-ROOT- only one Region, in any case will not split, the same. META. A region sliced out of the table is a record in the-ROO- table.

(3) hbase architecture role responsibilities 1) client

Before -Client accesses the user table, you need to visit zookeeper, find the location of the region of the corresponding-ROOT- table, and then visit the-ROOT- table to find .meta. The access location of the table, and then find .meta. Table, finally through .meta. The table finds the location of user data to access, which requires multiple network operations, and the client side will do cache cache (that is, if the record of the next query has been queried in the last time, it can be obtained directly in the cache without the above operations)

2) zookeeper

-zookeeper provides a failover mechanism for HBASE to select the master master to avoid a single point of failure. In fact, the master in HBASE has little impact on the cluster for a period of time, because when the master goes down, the HBASE cluster can still do: view and upload operations, but cannot create or modify tables. However, it is not possible for master to be down for a long time, because master needs to be loaded.

-the addressing entry where all Region are stored:-which server is the ROOT- table on. -ROOT- the location information of this table

-monitors the status of regionserver in real time and notifies master of the online and offline information of regionserver in real time

-the schema that stores the HBASE (including what Table there are and what Column Family each Table has); by default: / hbase/table: is the directory in zookeeper where table names in HBASE are stored

3) Master

-assign region to regionserver (and do load balancing)

-find the failed RegionServer and reallocate the Region on it (that is, if there is a corresponding RegionServer downtime, master will copy the region on its down node to other nodes) with high fault tolerance.

Garbage file (HBase) collection on -HDFS

-schema for dealing with HBASE (creation, deletion, modification of tables, addition of column clusters … )

4) RegionServer

-RegionServer maintains the Region assigned to it by Master and handles IO requests for these Region (that is, additions, deletions, and modifications to data in the table)

-responsible for interacting with the underlying file system hdfs and storing data to hdfs

-responsible for merging HFile in Store

-RegionServer is responsible for the Region that the Split becomes too large during operation, and is responsible for the Compact (split) operation.

Principle of syncopation:

First time: 128 * (1x 1)

The second time: 128 * (3x3)

The third time: 128 * (5: 5) until the result of the calculation is more than 10G, then it is divided according to 10G.

The underlying physical storage principle of 2.hbase (1) the overall physical structure

The image above shows a regionserver storage.

All lines in -Table are dictionary sorted by rowkey, and then split into different region according to the range of rowkey

The default size of -Region is 10G, and there is only one HRegion at the beginning of each table. As the data is inserted into the table, the HRegion grows. When it reaches a threshold, HRegion will divide into two new HRegion. When there are more rows in the table, there will be more and more HRegion

-Region is the smallest unit of distributed storage and load balancing in Hbase. The data in the same region must be stored on the same node, but after region is split, it can be stored on different nodes.

Although -Region is the smallest unit of load, it is not the smallest unit of physical storage. In fact, a region consists of one or more store, each of which stores all the data in a column cluster in the region. Each Strore consists of a MemStore and 0 or more StoreFile

(2) MemStore and StoreFile

A region consists of multiple store, each store contains all the data of a column cluster, a region is composed of multiple store, and each store contains all the data of a column cluster.

principle: when writing data, data is now written to Memstore. When the amount of data in Memstore reaches a certain threshold, regionserver starts the flushcache process to write Storefile, and each write forms a separate HFile. (that is, when the threshold is reached, the data stored in the Memstore is first written to disk in the form of hfile, and when there are multiple Hfile on the disk, it is merged into a single storefile.)

(3) storefile and hfile structure

StoreFile is saved on HDFS in HFile format. See the following figure for the data organization format of HFile:

Among them: first of all, the HFile file is of variable length, and there are only two pieces of fixed length: Trailer and FileInfo.

-Trailer: there is a pointer to the starting point of other data blocks

-FileInfo: record the metadata information of this file

-Data: stores the data in the table

-Meta: saves user-defined kv pairs

The index of -Data Index:data, where the key of each index is the key of the first record of the indexed block

The index of -Meta Index:Meta, which records the starting position of Meta data

Magic in -Data: used for verification to determine whether there is data corruption

Among them, in addition to trailer and fileinfo two fixed-length data, the other data can be compressed.

Introduction of the Kmuri V key in data:

Two fixed-length values of , representing the length of key and the length of value, respectively. This is followed by key, starting with a fixed-length number, indicating the length of the rowkey, followed by rowkey, followed by a fixed-length value, followed by the column cluster name (preferably 16), followed by Qualifier (column name), followed by two fixed-length values representing TimeStamp and KeyType (Put/Delete). The Value part doesn't have such a complex structure, so it's pure binary data.

(4) Hlog---WAL

WAL means Write ahead log, which is used for disaster recovery. HLog records all changes to the data. Once the data is modified, it can be recovered from the Log.

Explanation of disaster recovery: at the beginning, the data of region is stored in metestore in memory, but the threshold has not yet been reached, and the data is still in memory and has not been persisted to disk. If the machine suddenly goes down at this time, the data stored in memory will be lost, and hlog is needed to recover the data. But hlog will only save the log that is not synchronized to the disk, the data that has been synchronized to the disk, and that part of the log will be stored in the oldWAL directory and deleted 10 minutes later.

The file structure of HLog:

The Key of -HLog Sequence File is a HLogKey object, and the HLogKey records the attribution information of the written data. In addition to the names of table and region, it also includes that sequence number and timestamp,timestamp are "write time", the starting value of sequence number is 0, or the last time it is stored in the file system sequence number.

The Value of -HLog Sequece File is the KeyValue object of HBase, that is, the KeyValue in the corresponding HFile

Addressing Mechanism of 3.hbase

introduction: read and write occurs on regionserver, and each RegionSever serves a certain number of Region services. If client wants to read and write to a row of data, which regionserver should we access? Can be solved by addressing

(1) the addressing mechanism of the older version of region (before 0.96)

Explanation:

-client requests zookeeper to get the regionserver address where-root- is located.

-client request-the regionserver where the root- is located, and get the. META. Address of the table. Client will cache the relevant information of-ROOT- for the next quick access.

-client request. Meta. The regionserver of the table to get the address of the regionserver where the data is accessed (there is still a cache)

-client requests to access the regionserver where the data is located and obtain the corresponding data

(2) New region addressing method (version 0.98)

-Client requests ZooKeeper to get .meta. The address of the RegionServer where it is located

-Client request. Meta. The RegionServer gets the RegionServer address where the access data is located, and the Client will send .meta. Cache the related information for the next quick access

-Client requests the RegionServer where the data resides to get the required data

The reading and writing process of 4.hbase (1) Reading process:

-the client uses zookeeper and the-root- table and .mate. Table finds the regionserver where the target data is located (addressing)

-contact regionserve to query target data

-Region first looks in memstore, and returns if hit.

-if memstore cannot be found, scan in storefile. In order to quickly determine whether the data to be queried is in this StoreFile, BloomFilter (Bloom filtering) is applied.

(2) Writing process:

-Client first finds the regionserver where the corresponding region is based on rowkey (addressing)

-Client submits a request to regionserver

-Regionserver found the target region

-Regionserver checks whether the data is consistent with Schema

-if the client does not specify a version, get the current system time as the data version

-writes updates to Hlog

-writes data to memstore

-determines whether flush of memstore is required to be storefile

Note:

-the data is first written to HLog (WAL Log) and then to memory (MemStore) when it is updated. The data in MemStore is sorted.

-Storefile is read-only and cannot be modified once created, so the update / modification of HBASE is actually a continuous additional operation. According to the retention policy of the version, the old data will be deleted.

The working mechanism of 5.master and Regionserver (1) the working mechanism of Regionserver, 1) the distribution of region

only one regionserver can be assigned to a region at any time. Master keeps track of what regionserver is currently available. And which region is currently assigned to which regionserver and which region has not yet been assigned. When a new region needs to be allocated, the master sends a request to load the region to a regionserver with free space. Assign this region to this regionserver.

2) RegionServer online

Master uses zookeeper to track RegionServer status. When a RegionServer starts, it first creates its own znode under the server directory on the ZooKeeper. Because Master subscribes to change messages in the ZooKeeper server directory, Master can get real-time notification from ZooKeeper when files are added or deleted in the server directory. So as soon as RegionServer is online, Master can get the news immediately.

3) RegionServer offline

when RegionServer goes offline, it disconnects from zookeeper, and ZooKeeper automatically releases the exclusive lock on the file that represents the server. Master can confirm that there is no traffic between regionserver and zookeeper, and regionserver may be down.

(2) the working mechanism of master 1) the startup steps of master:

-acquire a lock that uniquely represents Active Master from the zookeeper to prevent other Master from becoming Master

-scan the nodes of server on zookeeper for a list of currently available regionserver nodes.

-communicates with each regionserver to get the correspondence between the currently assigned region and regionserver.

-scans the collection of .meta. Region, calculates the currently unallocated region, and puts them on the list of region to be allocated.

2) offline of master:

Because master only maintains the metadata of tables and region, but does not participate in the process of data IO, master offline only causes all metadata modifications to be frozen (cannot create table, cannot modify table schema, cannot load balance region), and table data can be read and written normally. So master can go offline for a short time. As can be seen from the launch process, the information saved by Master is all redundant (all can be collected or calculated from other parts of the system)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.