In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Overview
HBase is a distributed column storage system built on HDFS.
HBase is developed based on GoogleBigTable model, a typical key/value system.
HBase is an important member of ApacheHadoop ecosystem, which is mainly used for massive structured data storage.
Logically, HBase stores data by table, row, and column.
Like hadoop, the Hbase goal relies mainly on scale-out, increasing computing and storage capacity by increasing the number of cheap commercial servers.
The characteristics of Hbase Table
Large: a table can have billions of rows and millions of columns
Schemaless: each row has a sortable primary key and as many columns as possible, columns can be dynamically added as needed, and different rows in the same table can have distinct columns.
Column-oriented: column-oriented storage and permission control, column (family) independent retrieval
Sparse: null columns do not take up storage space, and tables can be designed to be very sparse
Multiple versions of data: there can be multiple versions of data in each cell. By default, the version number is automatically assigned, which is the timestamp when the cell is inserted.
Single data type: the data in Hbase is a string and has no type.
Hbase data model
Hbase logical View
Pay attention to the English instructions in the picture above
Basic concepts of Hbase
RowKey: is Bytearray, is the "primary key" of each record in the table, easy to find quickly, the design of Rowkey is very important.
ColumnFamily: column family, which has a name (string) and contains one or more related columns
Column: belongs to a certain columnfamily,familyName:columnName, and each record can be added dynamically
VersionNumber: type is Long, default is system timestamp, which can be customized by the user
Value (Cell): Bytearray
Hbase physical model
Each columnfamily is stored in a separate file on HDFS, and null values are not saved.
Key and Version number have a copy in each column family
HBase maintains a multi-level index for each value, namely:
Physical storage:
1. All the lines in Table are arranged according to the dictionary order of rowkey
2. Table is divided into multiple Region in the direction of the row.
3. Region is divided by size, and there is only one region at the beginning of each table. With the increase of data, the region increases continuously. When it reaches a threshold, region will divide into two new region, and then there will be more and more region.
4. Region is the smallest unit of distributed storage and load balancing in Hbase, and different Region is distributed on different RegionServer.
5. Although Region is the smallest unit of distributed storage, it is not the smallest unit of storage. Region consists of one or more Store, each store holds an columnsfamily;, each Strore consists of a memStore and 0 to more StoreFile, the StoreFile contains the HFile;memStore stored in memory, and the StoreFile is stored on the HDFS.
HBase architecture and basic components
Description of Hbase basic components:
Client
Includes interfaces to access HBase and maintains cache to speed up access to HBase, such as region location information
Master
U assign region to Regionserver
U is responsible for the load balancing of Regionserver
Discovery of failed Regionserver and reallocation of region on it
Manage the user's operation of adding, deleting, changing and searching table
RegionServer
ü Regionserver maintains region and processes IO requests for these region
ü Regionserver is responsible for shredding region that becomes too large during operation.
Zookeeper action
By election, it is guaranteed that only one master,Master and RegionServers in the cluster will register with ZooKeeper at any time when they are started.
U store the addressing entry for all Region
Real-time monitor the online and offline information of Regionserver. And notify Master in real time.
ü store schema and Table metadata of HBase
By default, HBase manages ZooKeeper instances, such as starting or stopping ZooKeeper
The introduction of ü Zookeeper makes Master no longer a single point of failure.
Write-Ahead-Log (WAL)
This mechanism is used for fault tolerance and recovery of data:
There is a HLog object in each HRegionServer. HLog is a class that implements WriteAheadLog. Each user operation writes a piece of data to the HLog file (see the following HLog file format). The HLog file periodically scrolls out the new file and deletes the old file (data that has been persisted to the StoreFile). When HRegionServer terminates unexpectedly, HMaster will perceive through Zookeeper that HMaster will first process the legacy HLog file, split the Log data of different Region into the corresponding region directory, then redistribute the invalid region, and then get the HRegionServer of these region. In the process of Load Region, you will find that there is a historical HLog to deal with, so the data in ReplayHLog will be sent to MemStore, and then flush to StoreFiles to complete data recovery.
HBase fault tolerance
Master fault tolerance: Zookeeper reselects a new Master
In the process of no Master, data reading is still going on as usual.
In the process without master, region segmentation and load balancing cannot be carried out.
RegionServer fault tolerance: regularly report the heartbeat to Zookeeper. If the heartbeat does not occur in time, Master will reassign the Region on the RegionServer.
On other RegionServer, the "pre-write" log on the failed server is split by the master server and dispatched to the new RegionServer
Zookeeper fault tolerance: Zookeeper is a reliable service, generally configured with 3 or 5 Zookeeper instances
Region positioning process:
Looking for RegionServer
ZooKeeper-- >-ROOT- (single Region)-- > .meta.-- > user table
-ROOT-
The U table contains .meta. The region list where the table is located, and the table will have only one Region
The location of the-ROOT- table is recorded in ü Zookeeper.
.META.
The U table contains all the user space region lists, as well as the server address of the RegionServer.
Hbase usage scenario
Storing large amounts of data (100s ofTBs) needhigh write throughputneedefficient random access (key lookups) within large datasetsneedto scale gracefully with dataforstructured and semi-structured datadon'tneed fullRDMS capabilities (cross row/cross table transaction,joins,etc.)
Large amount of data storage, large amount of data and high concurrent operation
Random read and write operations are required on the data
Read and write access is a very simple operation.
Comparison between Hbase and HDFS
Both of them have good fault tolerance and scalability, and can be extended to hundreds of nodes.
HDFS is suitable for batch scenarios
Random search of data is not supported.
Not suitable for incremental data processing
Data update is not supported
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.