HBase Learning Section v1.2 07/01 Update SLTechnology News&Howtos

HBase Learning Section v1.2

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

1. HBase

A highly reliable, high-performance, column-oriented, scalable, distributed column storage open source database built on HDFS, mainly used to store massive data, while using mapreduce to deal with data in HBase, using zookeeper as a collaborative service. It is relatively easy to read and write, and conditional queries are not supported.

2. Comparison between HBase and HDFS

Both have good fault tolerance and expansibility.

HDFS is suitable for batch processing scenarios, but it does not support immediate data search, incremental data processing and data update.

3. Characteristics of HBase:

Massive data: can support millions of columns, divided into multiple region

Schemaless: each row has a sortable primary key and as many columns as possible, columns can be dynamically added as needed, and different rows in the same table can have distinct columns.

Column-oriented storage: column-oriented storage and permission control, column independent retrieval

Record sparse: listed as empty (NULL) and does not take up storage space

Multiple versions of data: there can be multiple versions of data per unit, which can be distinguished by timestamp

Single data type: all data is a string

4. HBase structure composition

Row key:

Byte array

The primary key of each record in the table

Convenient and quick search

Timestamp:

The timestamp corresponding to each data operation is regarded as the version number of the data

Column Family:

Have a name (string)

Contains one or more related columns (Column)

Column:

Value:

5. Supported operation

All operations are based on rowkey

Support for CRUD (create, read, update, delete) as well as put, get, multiput, scan

There is no built-in join operation, which can be implemented using MapReduce

6. HBase maintains a multi-level index for each value, namely

All rows in Table are arranged according to the dictionary order of row key, and split into multiple region,region in the direction of rows. It is the smallest unit of distributed storage and load balancing in HBase, and different Region is distributed to different RegionServer.

When region increases to a threshold, region will be divided into two new region, and then there will be more and more region.

Region consists of one or more store, and each store holds one columns family

Each store consists of a memStore and 0 to more StoreFile

MemStore is stored in memory and StoreFile is stored on HDFS

7. Basic components of HBase

Client: communicate with HMaster and HRegionServer through RPC and maintain cache to speed up access to HBase

Zookeeper:

Ensure that there is always a HMaster in the cluster

Address entry to store all Region

Real-time monitor the online and offline information of Region server, and notify Master in real time

Store schema and Table metadata for HBase

HMaster:

Assign region to Region server

Load balancing of complex Region server

Discover failed Region server and reallocate region on it

Manage users' operations on adding, deleting, modifying and querying table

HRegionServer:

Maintain region and process Igamo requests to region

Responsible for shredding region that becomes too large during operation

8.HBase fault-tolerant mechanism

HMaster:zookeeper reselects a new Master

In the process without Master, data reading still proceeds as usual.

In the process without Master, region segmentation and load balancing cannot be performed.

HRegionServer fault tolerance: regularly report the heartbeat to Zookeeper, if the heartbeat does not occur within a certain period of time

HMaster reassigns the Region on this HRegionServer to another RegionServer

The "pre-write" log on the failed server is split by the master server and sent to the new RegionServer

9. HBase access method

Native Java API: conventional and efficient

Create a Configuration object (containing various configuration information hbase-default.xml,hbase-site.xml)

Configuration conf=HbaseConfiguration.create ()

Build a HTable handle (provide a Configuration object and provide the name of the Table to be accessed)

HTable table=new HTable (conf,tableName)

Provide only row-level transactions, strict row consistency, concurrent read, sequential write

Perform operations (batch processing that supports put,get,delete,scan, etc.)

Table.getTableName ()

Close the HTable handle (flush memory data to disk and free resources)

Table.close ()

HBase Shell: for management

Thrift Gateway: serialization, supporting languages such as cymbals, Pythons, etc.

Start thrift server:hbase-daemon.sh start thtift

Generate Hbase thrift client interface file

Thrift-gen xxx Hbase.thrift

Write client code

Eg:1. Thrift-gen py hbase.thrift

2. ${HBASE_HOME} / src/examples/thrift/DemoClient.py

3. Python DemoClient.py

Rest Gateway:Rest style Http API

Mapreduce: using Mapreduce jobs to process Hbase data

Provide api such as TableMapper, TableReducer, TableInputFormat, TableOutputFormat, etc.

The main usage of 10.HBase Shell

Enter the console: bin/hbase shell

Create tables: create TABLE_NAME,COLUMN_Family1, COLUMN_Family2,...COLUMN_FamilyN

Add record: put TABLE_NAME,ROW_NAME,COLUMN_Family:COLUMN_NAME,VALUE

View record: get TABLE_NAME,ROW_NAME # returns the most recent value by default

Number of statistical records: count TABLE_NAME

Delete the table:

Disable TABLE_NAME

Drop TABLE_NAME

Delete record:

Delete TABLE_NAME,ROW_NAME,COLUMN_Family:COLUMN_NAME # delete an entry

Delete TABLE_NAME,ROW_NAME # Delete all

Delete a column cluster

Disable TABLE_NAME

Alter TABLE_NAME, {NAME= > 'tab1_add', METHOD= >' delete'}

Enable TABLE_NAME

Full table scan:

Scan TABLE_NAME

Specify all data in the column: scan TABLE_NAME,COLUMN_Family

View all tables: list

View server status: status

View hbase version: version

View table structure: descirbe TABLE_NAME

Determine whether the table exists: exists TABLE_NAME

Determine whether the table is enable is_enabled TABLE_NAME

Clearing table: truncate TABLE_NAME

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.