Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Introduction to hbase

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

Overview

HBase is a distributed column storage system built on HDFS.

HBase is developed based on GoogleBigTable model, a typical key/value system.

HBase is an important member of ApacheHadoop ecosystem, which is mainly used for massive structured data storage.

Logically, HBase stores data by table, row, and column.

Like hadoop, the Hbase goal relies mainly on scale-out, increasing computing and storage capacity by increasing the number of cheap commercial servers.

The characteristics of Hbase Table

Large: a table can have billions of rows and millions of columns

Schemaless: each row has a sortable primary key and as many columns as possible, columns can be dynamically added as needed, and different rows in the same table can have distinct columns.

Column-oriented: column-oriented storage and permission control, column (family) independent retrieval

Sparse: null columns do not take up storage space, and tables can be designed to be very sparse

Multiple versions of data: there can be multiple versions of data in each cell. By default, the version number is automatically assigned, which is the timestamp when the cell is inserted.

Single data type: the data in Hbase is a string and has no type.

Hbase data model

Hbase logical View

Pay attention to the English instructions in the picture above

Basic concepts of Hbase

RowKey: is Bytearray, is the "primary key" of each record in the table, easy to find quickly, the design of Rowkey is very important.

ColumnFamily: column family, which has a name (string) and contains one or more related columns

Column: belongs to a certain columnfamily,familyName:columnName, and each record can be added dynamically

VersionNumber: type is Long, default is system timestamp, which can be customized by the user

Value (Cell): Bytearray

Hbase physical model

Each columnfamily is stored in a separate file on HDFS, and null values are not saved.

Key and Version number have a copy in each column family

HBase maintains a multi-level index for each value, namely:

Physical storage:

1. All the lines in Table are arranged according to the dictionary order of rowkey

2. Table is divided into multiple Region in the direction of the row.

3. Region is divided by size, and there is only one region at the beginning of each table. With the increase of data, the region increases continuously. When it reaches a threshold, region will divide into two new region, and then there will be more and more region.

4. Region is the smallest unit of distributed storage and load balancing in Hbase, and different Region is distributed on different RegionServer.

5. Although Region is the smallest unit of distributed storage, it is not the smallest unit of storage. Region consists of one or more Store, each store holds an columnsfamily;, each Strore consists of a memStore and 0 to more StoreFile, the StoreFile contains the HFile;memStore stored in memory, and the StoreFile is stored on the HDFS.

HBase architecture and basic components

Description of Hbase basic components:

Client

Includes interfaces to access HBase and maintains cache to speed up access to HBase, such as region location information

Master

U assign region to Regionserver

U is responsible for the load balancing of Regionserver

Discovery of failed Regionserver and reallocation of region on it

Manage the user's operation of adding, deleting, changing and searching table

RegionServer

ü Regionserver maintains region and processes IO requests for these region

ü Regionserver is responsible for shredding region that becomes too large during operation.

Zookeeper action

By election, it is guaranteed that only one master,Master and RegionServers in the cluster will register with ZooKeeper at any time when they are started.

U store the addressing entry for all Region

Real-time monitor the online and offline information of Regionserver. And notify Master in real time.

ü store schema and Table metadata of HBase

By default, HBase manages ZooKeeper instances, such as starting or stopping ZooKeeper

The introduction of ü Zookeeper makes Master no longer a single point of failure.

Write-Ahead-Log (WAL)

This mechanism is used for fault tolerance and recovery of data:

There is a HLog object in each HRegionServer. HLog is a class that implements WriteAheadLog. Each user operation writes a piece of data to the HLog file (see the following HLog file format). The HLog file periodically scrolls out the new file and deletes the old file (data that has been persisted to the StoreFile). When HRegionServer terminates unexpectedly, HMaster will perceive through Zookeeper that HMaster will first process the legacy HLog file, split the Log data of different Region into the corresponding region directory, then redistribute the invalid region, and then get the HRegionServer of these region. In the process of Load Region, you will find that there is a historical HLog to deal with, so the data in ReplayHLog will be sent to MemStore, and then flush to StoreFiles to complete data recovery.

HBase fault tolerance

Master fault tolerance: Zookeeper reselects a new Master

In the process of no Master, data reading is still going on as usual.

In the process without master, region segmentation and load balancing cannot be carried out.

RegionServer fault tolerance: regularly report the heartbeat to Zookeeper. If the heartbeat does not occur in time, Master will reassign the Region on the RegionServer.

On other RegionServer, the "pre-write" log on the failed server is split by the master server and dispatched to the new RegionServer

Zookeeper fault tolerance: Zookeeper is a reliable service, generally configured with 3 or 5 Zookeeper instances

Region positioning process:

Looking for RegionServer

ZooKeeper-- >-ROOT- (single Region)-- > .meta.-- > user table

-ROOT-

The U table contains .meta. The region list where the table is located, and the table will have only one Region

The location of the-ROOT- table is recorded in ü Zookeeper.

.META.

The U table contains all the user space region lists, as well as the server address of the RegionServer.

Hbase usage scenario

Storing large amounts of data (100s ofTBs) needhigh write throughputneedefficient random access (key lookups) within large datasetsneedto scale gracefully with dataforstructured and semi-structured datadon'tneed fullRDMS capabilities (cross row/cross table transaction,joins,etc.)

Large amount of data storage, large amount of data and high concurrent operation

Random read and write operations are required on the data

Read and write access is a very simple operation.

Comparison between Hbase and HDFS

Both of them have good fault tolerance and scalability, and can be extended to hundreds of nodes.

HDFS is suitable for batch scenarios

Random search of data is not supported.

Not suitable for incremental data processing

Data update is not supported

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report