In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains "what is the concept of HBase data model". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let Xiaobian take you to learn "What is the concept of HBase data model"!
Introduction:
HBase(Hadoop Database): A high-reliability, high-performance, column-oriented, scalable distributed storage system that uses HBase technology to build large-scale organizational storage clusters on inexpensive PC servers.
HBase is a knockoff version of BigTable, but it stands on the shoulders of the giant Hadoop!
HBase is a non-relational database (NoSQL database)
NoSQL relational database data model is suitable for structured, unstructured and semi-structured data. It is suitable for structured data. Scalability is easy to expand, such as adding nodes and expanding. It is difficult to operate different standard SQL or SQL-like schema. Fields must be predefined at any time. It is not convenient to expand.
HBase stores data in table form
Features of the table:
Large: A table can have hundreds of millions of rows and millions of columns.
Column-oriented: column-oriented (family-oriented) storage and permission control, column (family-independent) retrieval
Sparse: null columns do not take up storage space, so tables can be designed very loosely
Below is the Hadoop ecosystem:
HBase is located in the structured storage layer;
Hadoop HDFS provides high-reliability underlying storage support for HBase
Hadoop MapReduce provides high-performance computing power for HBase
Zookeeper provides stability and failover for HBase
Pig and Hive also provide high-level language support for HBase, making statistical processing of data on HBase very easy. Sqoop provides convenient RDBMS data import function for HBase, which makes it very convenient to migrate traditional databases to HBase.
Key concepts of the data model:
RowKey: The primary key used to retrieve records. There are only three ways to access rows in HTable:
Access via a single rowkey
by rowkey range
full table scan
Qualifier: dynamically extensible column, which does not need to be defined in HTable schema first, similar to Column in relational database
Famliy: A family with multiple qualifiers under it. TTL, Versions, Comperssion wait are all set at this level.
Version/Timestamp: The value under family:qualifier corresponding to a single rowkey can allow multiple versions, distinguished by millisecond timestamps, or set a range to take data corresponding to any version
Cell: The smallest unit of storage. Value is uniquely determined from rowkey+family:qualifier+timestamp
Namespace: table namespace, default "default" space if not specified
HTable system table:
hbase:namespace: the namespace where the tables of htable are stored
hbase:meta: stores rowkey information, location information, etc. of each region of htable
hbase:ad: records information about table operation permissions, supports ACL to qualifier, permission classes READ ('R '),WRITE ('W'),EXEC ('X '),CREATE ('C'),ADMIN ('A ')
HBase Storage Framework:
Zookeeper: HBase cluster depends on this component
Zookeeper Quorum Storage-ROOT-Table Address, HMaster Address
HRegionServer registers itself in Zookeeper in Ephedal mode, and HMaster senses the health status of each HRegionServer at any time.
HMaster: Mainly responsible for managing Table and Region
Manage user's operations of adding, deleting, checking and correcting tables
Manage Load Balancer of HRegionServer, Adjust Region Distribution
After Region Split, responsible for the distribution of new Region
Responsible for Region migration on failed HRegionServer after HRegionServer outage
HRegionServer: The core module, mainly responsible for corresponding user I/O requests, reading and writing data to HDFS file system
HRegionServer manages a list of HRegion objects
Each HRegion corresponds to a Region in Table, and an HRegion consists of multiple HStores.
Each HStore corresponds to a Column Family store in Table
Column Family is a centralized storage unit, so it is more efficient to put Columns with the same IO characteristics in a Column Family.
Region: When the Table grows larger with the increasing number of records, it will gradually split into multiple splits, which become regions. A region is represented by [startkey,endkey). Different regions will be assigned to the corresponding RegionServer by the Master for management.
HStore: The core of HBase storage. MemStore is a Sorted Memory Buffer.
HLog: In a distributed system environment, it is impossible to avoid system errors or downtime. Once HRegionServer unexpectedly exits, the memory data in MemStore will be lost. To prevent data loss, HLog is introduced.
Every time a user writes to Memstore, a piece of data is also written to the HLog file, which periodically scrolls to new files and deletes old files (data persisted to StoreFile). When HRegionServer terminates unexpectedly, HMaster will sense it through Zookeeper. HMaster will first process the legacy HLog files, split the log data of different regions into corresponding region directories, and then redistribute the invalid regions. HRegionServer that receives these regions will find that there are historical HLog files to process in the process of Load Region, so it will Replay the data in HLog to MemStore, and then flush to StoreFiles to complete data recovery.
HDFS: HDFS is a distributed file system. It implements distributed storage by dividing a large file into fixed-size blocks. The default size of each Block is 128MB. Each Block has multiple backups and is deployed on different data nodes to ensure data security. Currently, all of HBase's underlying data is stored in HDFS as files. The HBase side itself does not solidify and stores data information
At this point, I believe that everyone has a deeper understanding of "what is the concept of HBase data model". Let's actually operate it! Here is the website, more related content can enter the relevant channels for inquiry, pay attention to us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.