In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article will explain to you in detail about the use of HBase file index, the editor thinks it is very practical, so share it for you to do a reference, I hope you can get something after reading this article.
HBase overall structure diagram
Brief introduction of some terms
HMaster
Responsible for the management of HRegionServer access, responsible for the management and allocation of Region, responsible for managing the creation of Table, deletion and modification and other operations.
HRegion
Each Table can be split into multiple Region, and each Region is a row interval in the Table. For example, a Table with a RowKey of 0-100 can be split into 0-50 and 51-100 Region.
HRegionServer
Each HRegionServer manages multiple Region and is responsible for reading and writing to Region, etc.
HLog
Each HRegionServer has a HLog to record all operations, mainly used to repair data when it is corrupted. Physically, it's Hadoop's Sequence File.
Store
Multiple Store are managed under each HRegion, and each Store corresponds to a Family in the Table for data management.
StoreFile
Persistent data class of Family
MemStore
There is a MemStore in each Store, which is used to cache the operation on the Family. When the MemStore is cached to a point size, it will be converted into StoreFile Flush to HDFS.
HFile
StoreFile is only a lightweight package of HFile. The Table data files saved in HDFS are all in HFile format.
The overall structure of the index
In HBase, from the perspective of the overall framework, the distribution of indexes is divided into the following layers.
A. in Zookeeper, according to the startup of HMaster, the address of the RegionServer assigned-ROOT- Table is saved.
B. In the-ROOT- Table, the information of the RegionServers distributed after meta Table split into multiple region is saved.
C. In .meta, the RegionServers address distributed by the regions of each Table is kept.
D,-ROOT- exist only one Region, while .meta can be split into multiple.
-the table structure of ROOT- and META is the same, as follows
Rowkey
Info
Regioninfo
Server
Serverstartcode
TableName
StartKey
TimeStamp
Startkey
Endkey
Family list
Address
Load the startup time of the current shard
-ROOT- example:
Suppose. Meta splits into two Region, distributed on two RegionServer.
Rowkey
Info
Regioninfo
Server
Serverstarcode
.META Table1
Pk0
12345278
RegionServer1
.META Table1
Pk1000
123451278
RegionServer2
.META Table2
Pk0
123431278
RegionServer1
.META Table2
Pk1000
123457278
RegionServer2
Example of .meta:
Rowkey
Info
Regioninfo
Server
Serverstarcode
Table1
Pk0
12345278
RegionServer1
Table1
Pk1000
123451278
RegionServer2
Table1
Pk2000
12345878
RegionServer3
……
……
……
……
Table2
Pk0
12345278
RegionServer1
Table2
Pk1000
12345478
RegionServer2
Table2
Pk2000
12345778
RegionServer3
Positioning process of RegionServer
When Client provides TableName and RowKey for put, get and delete operations on the data in a Table, Client obtains the RegionServer information of-ROOT- from Zookeeper, and then obtains the ReginoServer of .meta from-ROOT- according to RowKey, and then locates to the RegionServer where RowKey is located. Because Client caches location information such as-ROOT-, .META and Region during interaction, you only need to query the location once in the best case and 6 times in the worst case [you need to query it recursively from Table Region].
The data storage structure of HFile:
As shown in the figure above, the file length of HFile is longer, where File Info and Trailer are fixed length, and Trailer has a starting point pointing to File Info\ Data Index\ Meta Index. The Index block records the starting point of the Data\ Meta block. In Data blocks, Magic is used to identify whether the data is corrupted, and multiple KeyValue information is stored in each Data block.
The above picture shows the data structure of KeyValue.
The above picture also shows the data structure of HFile.
The entire region file path looks like this:
/
There are HFile data files under each column-family, and the name of the file is based on any number generated by the built-in random number generator in Java. The code ensures that there is no collision, for example, when it finds that a newly generated number already exists, it continues to look for an unused number.
Operation of Region
When you locate the RegionServer where the RowKey is located, you can get the corresponding Region,RegionName from the RowKey saved in .meta according to the RegionName.
Get:
1. The HRegion.get API detects the Family first to ensure that the Family in Get is consistent with that in Table.
2. According to the information of Family, find the corresponding Store, get the StoreScanner instance in Store, and add it to a scanners queue.
3. In StoreScanner, there are two instances, MemstoreScanner and HFileScanner, which are used to traverse the keyValue values in MemStore and HFile, respectively.
4. Because there are multiple HFile, the HFileScanner will be filtered once. Through the DataIndex of HFile, the position will be pointed to the StarRow,DataIndex where the firstKey information of the current DataBlock is saved. If the KeyValue is not in the current HFile, the search for HFileScanner will be turned off.
5. It should be noted that after RegionServer starts, the DataIndex of HFile is saved in memory.
6. When StoreScanner queries the corresponding keyValue, first use MemstoreScanner to find from MemStore, if there is no corresponding data, then use HFileScanner to traverse from the DataBlock of HFile, DataIndex can quickly locate the location of Block.
7. Because HFile has been persisted into HDFS, each IO read of HFile only reads the size of a Data data block. The location of Data can be queried according to the DataIndex information of HFile.
8. If you are configured to use Bloom Filters, you can quickly confirm whether a RowKey or value is in a HFile.
On "what is the use of HBase file index" this article is shared here, I hope the above content can be of some help to you, so that you can learn more knowledge, if you think the article is good, please share it out for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.