What is the storage structure of hbase? 07/11 Update SLTechnology News&Howtos

What is the storage structure of hbase?

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "what is the storage structure of hbase". Many people will encounter such a dilemma in the operation of actual cases, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

HBase definition

HBase is a highly reliable, high-performance, column-oriented and scalable distributed storage system. Large-scale structured storage clusters can be built on cheap PC Server by using Hbase technology.

HBase is an open source implementation of Google Bigtable, similar to Google Bigtable using GFS as its file storage system, HBase uses Hadoop HDFS as its file storage system; Google runs MapReduce to deal with massive data in Bigtable, HBase also uses Hadoop MapReduce to deal with massive data in HBase; Google Bigtable uses Chubby as a collaborative service and HBase uses Zookeeper as its counterpart.

Characteristics of HBase

Tables in HBase generally have the following characteristics.

1) large: a table can have hundreds of millions of rows and millions of columns.

2) column-oriented: list-oriented storage and permission control, column (cluster) independent retrieval.

3) sparse: columns that are NULL do not take up storage space, so tables can be designed to be very sparse.

HBase access interface

HBase supports many kinds of access, and the common interfaces to access HBase are as follows.

1. Native Java API, the most conventional and efficient access method, is suitable for Hadoop MapReduce Job parallel batch processing of HBase table data.

2. HBase Shell,HBase command line tool, the simplest interface, suitable for HBase management.

3. Thrift Gateway, using Thrift serialization technology, supports many languages, such as Cymbology, PHP, Python, etc., and is suitable for other heterogeneous systems to access HBase table data online.

4. REST Gateway, which supports REST-style Http API to access HBase and removes language restrictions.

5, Pig, you can use Pig Latin streaming programming language to operate the data in HBase, similar to Hive, the essence is also compiled into MapReduce Job to deal with HBase table data, suitable for data statistics.

6. Hive, the current Release version of Hive has not yet added support for HBase, but in the next version of Hive 0.7.0, HBase will be supported, and HBase can be accessed using a language similar to SQL.

HBase storage structure

As can be seen from the architecture diagram of HBase, the storage in HBase includes HMaster, HRegionServer, HRegion, Store, MemStore, StoreFile, HFile, HLog and so on. This course uniformly introduces their role, that is, storage structure. The following is a diagram of the HBase storage architecture:

Each table in HBase is divided into multiple child tables (HRegion) according to a certain range by row keys. By default, a HRegion exceeding 256m will be divided into two. This process is managed by HRegionServer, while the allocation of HRegion is managed by HMaster.

The role of HMaster:

1. Assign region to Region server.

2. Be responsible for the load balancing of Region server.

3. Find the invalid Region server and redistribute the region on it.

4. Garbage file collection on HDFS.

5. Process schema update request.

HRegionServer function:

1. Maintain the region assigned to him by master and process io requests for these region.

2. Be responsible for shredding the region that becomes too large in the process of running.

As you can see, client does not need master to access the data on hbase (addressing access zookeeper and region server, data read-write access region server), master only maintains the metadata information of table and region (the metadata information of table is stored on zookeeper), and the load is very low. When HRegionServer accesses a child table, it creates a HRegion object, and then creates an Store instance for each column family of the table, each Store has a MemStore and 0 or more StoreFile corresponding to it, each StoreFile corresponds to a HFile, and the HFile is the actual storage file. Therefore, there are as many Store as there are column families in a HRegion. A HRegionServer will have multiple HRegion and a HLog.

HRegion

The table is separated into multiple Region in the direction of the row. Region is the smallest unit of distributed storage and load balancing in HBase, that is, different region can be on different Region Server, but the same Region will not be split into multiple server.

Region is separated by size, with only one region at the beginning of each table. As the data continues to be inserted into the table, the region grows, and when a column family of region reaches a threshold (the default is 256m), it is split into two new region.

Each region is identified by the following information:

1 、

< 表名,startRowkey,创建时间>

2. By catalog table (- ROOT- and .meta.) Record the endRowkey of the region

HRegion location: the Region Server to which the Region is assigned is completely dynamic, so a mechanism is needed to locate the specific region server where the Region is located.

HBase uses a three-tier structure to locate region:

1. Get the location of the-ROOT- table through the file / hbase/rs in zk. -the ROOT- table has only one region.

2. Find the requested row in .meta through the-ROOT- table. The location of the corresponding region in the table. .META. Each region in the table is a row record in the-ROOT- table.

3. Through .meta. The table finds the location of the desired user table region. Each region in the user table is in .meta. There is a row of records in the table.

The ROOT- table is never separated into multiple region, ensuring that a maximum of three jumps are needed to locate any region. Client will save and cache the location information of the query, the cache will not expire actively, and the client will always use the cached items until it encounters an error. When an error occurs-that is, the area is moved, the client will look at the .meta. Get the new location of the area,. META. Do not move the area, the client will check again-ROOT-. Therefore, if all the caches on the client fail, it will take six network trips to locate the correct region, three of which are used to detect cache failures and the other three to obtain location information.

Store

Each region consists of one or more store, at least one store,hbase will put the data accessed together in a store, that is, create a store for each ColumnFamily, and if there are several ColumnFamily, there will be several Store. A Store consists of a memStore and 0 or more StoreFile. HBase determines whether the region needs to be segmented by the size of the store.

MemStore

Writes that reach the Regionserve are first appended to the commit log (commit log) and then added to the in-memory memStore. When the size of the memStore reaches a threshold (the default 64MB), the memStore is flush to the file, that is, a snapshot is generated. Currently, hbase will have a thread responsible for the flush operation of memStore.

StoreFile

After the data in memStore memory is written to a file, the underlying StoreFile,StoreFile is saved in HFile format.

HFile

The storage format of KeyValue data in HBase is the binary format file of hadoop. First of all, the HFile file is indefinite in length, and there are only two pieces of fixed length: Trailer and FileInfo. There is a pointer to the starting point of other data blocks in Trailer, and FileInfo records some meta information about the file. Data Block is the basic unit of hbase io. In order to improve efficiency, there is block cache mechanism based on LRU in HRegionServer. The size of each Data block can be specified by parameters when creating a Table (default block size 64KB), large Block is good for sequential Scan, and small Block is good for random query. In addition to the Magic at the beginning, each Data block is spliced together by KeyValue pairs, and the Magic content is a random number designed to prevent data corruption, as follows.

The Data Block section is used to store the data in the table, which can be compressed. The Meta Block segment (optional) is used to save user-defined kv segments and can be compressed. The FileInfo section is used to store the meta-information of HFile and cannot be compressed. Users can also add their own meta-information in this section. The Data Block Index section (optional) is used to hold the index of Meta Blcok. Trailer this part is fixed in length. Save the offset of each segment, when reading a HFile, you will first read the Trailer,Trailer to save the starting position of each segment (the Magic Number of the segment is used for security check), and then the DataBlock Index will be read into memory, so that when retrieving a key, you do not need to scan the entire HFile, but only need to find the block where the key is located in memory, read the entire block into memory through a disk io, and then find the needed key. DataBlock Index was eliminated by LRU mechanism. The Data Block,Meta Block of HFile is usually stored by compression. After compression, the network IO and disk IO can be greatly reduced. The resulting overhead, of course, is to spend cpu for compression and decompression. The compression of the target HFile supports two ways: gzip and lzo.

HLog (WAL log): Wal means write ahead log, which is used for disaster recovery. HLog records all changes to the data, and once the region server goes down, it can be recovered from the log.

LogFlusher

As mentioned earlier, the data arrives at HRegionServer in the form of KeyValue, which is written to a SequenceFile after it is written to WAL. It looks fine, but because data streams are often cached to improve performance when they are written to the file system. In this way, some of the data that was supposed to be in the log file is actually in memory. Here, we provide a class of LogFlusher. It calls HLog.optionalSync (), which periodically calls Hlog.sync () based on hbase.regionserver.optionallogflushinterval (the default is 10 seconds). In addition, HLog.doWrite () calls Hlog.sync () periodically based on hbase.regionserver.flushlogentries (default of100 seconds). Sync () itself calls HLog.Writer.sync (), which is implemented by SequenceFileLogWriter.

LogRoller

The size of Log is limited by the hbase.regionserver.logroll.period of $HBASE_HOME/conf/hbase-site.xml, which defaults to one hour. So every 60 minutes, a new log file is opened. Over time, there will be a lot of files to maintain. First, LogRoller calls HLog.rollWriter (), scrolling the log regularly, and then using HLog.cleanOldLogs () to clear the old log. It first gets the largest sequence number in the storage file, then checks to see if there is a log whose "sequence number" of all entries is lower than this value, and if so, deletes the log. Each region server maintains one HLog instead of one per region, so that logs from different region (from different table) are mixed. The purpose of this is to continuously append a single file, which can reduce the number of disk addresses compared to writing multiple files at the same time, thus improving the write performance of table. In case of trouble, if a region server is offline, in order to restore the region on it, the log on the region server needs to be split and then distributed to other region server for recovery.

This is the end of the content of "what is the storage structure of hbase". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.