In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
Today, I will talk to you about the theoretical knowledge of HBase, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.
I. data storage
Data storage: HBase data storage depends on HDFS and does not store data itself, so HBase is actually a tool for table data management.
Data backup: the data reliability of HBase depends entirely on HDFS for data backup (three copies). Therefore, HBase does not need to care about the reliability of data, but only needs to care about the reliability of services.
Second, how to guarantee the reliability of HBase service?
By transferring administrative authority.
The Region in HBase (the real data exists in HDFS) is managed by Region Server. When ZooKeeper finds that RS is dead, HMaster will migrate the Region data on this RS (in fact, the so-called migration only transfers the administrative rights of Region to other RS, and the location of data storage is still on HDFS, unchanged)
2. Data structure and query
Index: HBase only has an index in rowkey, so it is mainly oriented to simple queries based on rowkey. HBase is not suitable for complex queries, such as queries based on multiple fields (no index, slow query) and table association is not supported directly.
HBase is stored in 3D order, and the data in HBase can be quickly located through three dimensions: rowkey (row key), column key (column family and qualifier) and TimeStamp (timestamp).
Rowkey in HBase can uniquely identify a row of records. When querying with HBase, there are the following ways:
Through get, specify rowkey to get the only record
Set startRow and stopRow parameters for range matching in scan mode
Full table scan, that is, directly scan all row records in the entire table
III. HDFS
HDFS: a separate Architecture of Storage and Computing
HBase relies on HDFS to provide the final underlying data storage service, multiple copies (three copies) to ensure high availability.
The HDFS directory structure of the HBase table is as follows
/ hbase / data / (Namespaces in the cluster) / (Tables of the cluster) / (Regions of the table) / (column family of the Region) / (StoreFiles of the column family)
The HDFS directory structure of HLog is as follows
/ hbase / WALs / (RegionServers) / (WAL files for the RegionServer) IV. The role of HMaster in Hbase
HMaster is mainly responsible for the management of Table table and HRegion functionally, including:
1. Manage users' operations of adding, deleting, modifying and querying Table tables.
2. Manage the load balance of HRegion server and adjust the HRegion distribution
3. To be responsible for the allocation of new HRegion after the split of HRegion
4. After the HRegion server is down, be responsible for the HRegion migration on the invalid HRegion server.
5. What data is stored in MemStore?
Each Store consists of a MemStore and multiple StoreFile. The reason why there are MenStore and so many StoreFile is that it is impossible to refresh to disk every time a piece of data comes in. Frequently writing to disk can result in inefficiency and discontiguous data. So it is common to cache the data to a certain amount in memory and then flush it to disk in one breath. Each time flush will flush the data in memory to disk, but there will be multiple CF data stored in memory, so flush will generate multiple StoreFile at a time, and some StoreFile may be very small.
This is why it is required that the CF of HBase should not be set too much, too much will frequently flush the disk, and small files will trigger time-consuming Compact operations
The logical structure of HFile VI. After reading the namenode of HMaster and Zookeeper and HDFS, do you have any further understanding of HBase theory? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 231
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.