Installation and explanation of hbase 07/13 Update SLTechnology News&Howtos

Installation and explanation of hbase

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Hbase installation, architecture, usin

Install hbase:

Download the installation package:

Hbase-1.0.0-bin.tar.gz

Mdkir / data/hbase

Mdkir / data/hbase/tmp

Mkdir / data/hbase/

Hadoop dfs-mkdir / user/hadoop/rongxin/hbase

two。 Decompression

[hadoop@master hbase] $tar-zxvf hbase-1.0.0-bin.tar.gz

3. Go to the lib directory of hbase and replace the version of the hadoop jar package

Modify the configuration file

Hbase-env.sh

Hbase-site.xml

[hadoop@master hbase] $vim hbase-1.0.0/conf/regionservers

Slave1

Slave2

Vim / etc/profile

Export HBASE_HOME=/data/hbase/hbase-1.0.0

Export PATH=$PATH:$HBASE_HOME/bin

Source / etc/profile

Start hbase:

[hadoop@master hbase] $start-hbase.sh

Check through jps to see if it has been started:

[hadoop@master hbase] $jps

32211 HMaster

[hadoop@slave1 lib] $jps

20584 HRegionServer

[hadoop@slave2 lib] $jps

2698 HRegionServer

Check through web to see if hbase has been started:

Http://master:16030/master-status

Common command line tools for H He base:

Name command expression

Create table create 'table name', 'column name 1' column name 2 'column name N'

Add records put 'table name', 'row name', 'column name:', 'value'

View records get 'table name', 'row name'

View the total number of records in the table count 'table name'

Delete records delete 'table name', 'row name', 'column name'

Delete a table

The table must be blocked before it can be deleted. The first step is disable 'Table name'

Two-step drop 'table name'

View all records scan "Table name"

View the contents of a table in a column

Have data

Scan "Table name", ['column name:']

To update the record is to rewrite it and overwrite it.

Here are some instructions for common commands. Enter help for help in hbaseshell. In this article, we will first introduce

The first three and the last two will be introduced in the next blog post.

COMMAND GROUPS:

Group name: general

Commands: status, version

Group name: ddl

Commands: alter, create, describe, disable,drop, enable, exists, is_disabled, is_enabled, list

Group name: dml

Commands: count, delete, deleteall, get,get_counter, incr, put, scan, truncate

Group name: tools

Commands: assign, balance_switch, balancer,close_region, compact, flush, major_compact, move

Split, unassign, zk_dump

Group name: replication

Commands: add_peer, disable_peer,enable_peer, remove_peer, start_replication, stop_replication

I. General operation

1. Query server status

Hbase (main): 024pur0 > status

3 servers, 0 dead,1.0000 average load

two。 Query hive version

Hbase (main): 025 0 > version

0.90.4, r1150278,Sun Jul 24 15:53:29 PDT 2011

II. L DDL operation

1. Create a table

Hbase (main): 011 create 0 > member','member_id','address','info'

0 row (s) in 1.2210seconds

two。 Get a description of the table

Hbase (main): 012 0 > list

TABLE

Member

1 row (s) in 0.0160seconds

Hbase (main): 006 describe 0 > member'

DESCRIPTION ENABLED

{NAME= > 'member', FAMILIES = > [{NAME= >' address', BLOOMFILTER = > 'NONE'

REPLICATION_SCOPE = > '0mm, true

VERSIONS = > '34th, COMPRESSION = >' NONE',TTL = > '21474836474th, BLOCKSIZE = >

'65536', IN_MEMORY = >'fa

Lse', BLOCKCACHE = > 'true'}, {NAME = >' info', BLOOMFILTER = > 'NONE'

REPLICATION_SCOPE = > '0mm, VERSI

ONS = > '3Qing, COMPRESSION = >' NONE', TTL= > '21474836474th, BLOCKSIZE = >' 65536'

IN_MEMORY = > 'false'

BLOCKCACHE = >

'true'}]}

1 row (s) in 0.0230seconds

3. Delete a column family, alter,disable,enable

We have built three column families before, but we found that the column family member_id is redundant because it is the primary key, so

We need to delete it.

Hbase (main): 003member', 0 > alter 'member', {NAME= >' member_id',METHOD= > 'delete'}

ERROR: Table memberis enabled. Disable it first before altering.

An error is reported. When deleting a column family, you must first drop the table to disable.

Hbase (main): 004disable 0 > member'

0 row (s) in 2.0390seconds

Hbase (main): 005NAME= 0 > alter'member', {NAME= > 'member_id',METHOD= >' delete'}

0 row (s) in 0.0560seconds

Hbase (main): 006 describe 0 > member'

DESCRIPTION ENABLED

{NAME= > 'member', FAMILIES = > [{NAME= >' address', BLOOMFILTER = > 'NONE'

REPLICATION_SCOPE = > '0room.false

VERSIONS = > '34th, COMPRESSION = >' NONE',TTL = > '21474836474th, BLOCKSIZE = >

'65536', IN_MEMORY = >'fa

Lse', BLOCKCACHE = > 'true'}, {NAME = >' info', BLOOMFILTER = > 'NONE'

REPLICATION_SCOPE = > '0mm, VERSI

ONS = > '3Qing, COMPRESSION = >' NONE', TTL= > '21474836474th, BLOCKSIZE = >' 65536'

IN_MEMORY = > 'false'

BLOCKCACHE = >

'true'}]}

1 row (s) in 0.0230seconds

The column family has been deleted, so we continue to change the table enable

Hbase (main): 008 enable 0 > member'

0 row (s) in 2.0420seconds

4. List all the tables

Hbase (main): 028 0 > list

TABLE

Member

Temp_table

2 row (s) in 0.0150seconds

5.drop a table

Hbase (main): 029 disable 0 > temp_table'

0 row (s) in 2.0590seconds

Hbase (main): 030 drop 0 > temp_table'

0 row (s) in 1.1070seconds

6. Query whether the table exists

Hbase (main): 021 exists 0 > member'

Table member

Doesexist

0 row (s) in 0.1610seconds

7. Determine whether the table is enable

Hbase (main): 034Rank 0 > is_enabled 'member'

True

0 row (s) in 0.0110seconds

8. Determine whether the table is disable

Hbase (main): 032 is_disabled 0 > member'

False

0 row (s) in 0.0110seconds

3. DML operation

1. Insert several records

Put'member','scutshuxue','info:age','24'

Put'member','scutshuxue','info:birthday','1987-06-17'

Put'member','scutshuxue','info:company','alibaba'

Put'member','scutshuxue','address:contry','china'

Put'member','scutshuxue','address:province','zhejiang'

Put'member','scutshuxue','address:city','hangzhou'

Put'member','xiaofeng','info:birthday','1987-4-17'

Put'member','xiaofeng','info:favorite','movie'

Put'member','xiaofeng','info:company','alibaba'

Put'member','xiaofeng','address:contry','china'

Put'member','xiaofeng','address:province','guangdong'

Put'member','xiaofeng','address:city','jieyang'

Put'member','xiaofeng','address:town','xianqiao'

two。 Get a piece of data

Get all the data of an id

Hbase (main): 001get 0 > member','scutshuxue'

COLUMN CELL

Address:city timestamp=1321586240244

Value=hangzhou

Address:contry timestamp=1321586239126

Value=china

Address:province timestamp=1321586239197

Value=zhejiang

Info:age timestamp=1321586238965

Value=24

Info:birthday timestamp=1321586239015, value=1987-06-

seventeen

Info:company timestamp=1321586239071

Value=alibaba

6 row (s) in 0.4720seconds

Get an id, all the data of a column family

Hbase (main): 002 get 0 > member','scutshuxue','info'

COLUMN CELL

Info:age timestamp=1321586238965

Value=24

Info:birthday timestamp=1321586239015, value=1987-06-

seventeen

Info:company timestamp=1321586239071

Value=alibaba

3 row (s) in 0.0210seconds

Get an id, all the data of a column in a column family

Hbase (main): 002 get 0 > member','scutshuxue','info:age'

COLUMN CELL

Info:age timestamp=1321586238965

Value=24

1 row (s) in 0.0320seconds

6. Update a record

Change the age of scutshuxue to 99

Hbase (main): 004 0 > put 'member','scutshuxue','info:age',' 99'

0 row (s) in 0.0210seconds

Hbase (main): 005hbase 0 > get 'member','scutshuxue','info:age'

COLUMN CELL

Info:age timestamp=1321586571843

Value=99

1 row (s) in 0.0180seconds

3. Get two versions of data through timestamp

Hbase (main): 010 0 > get

'member','scutshuxue', {COLUMN= > 'info:age',TIMESTAMP= > 1321586238965}

COLUMN CELL

Info:age timestamp=1321586238965

Value=24

1 row (s) in 0.0140seconds

Hbase (main): 011VO > get

'member','scutshuxue', {COLUMN= > 'info:age',TIMESTAMP= > 1321586571843}

COLUMN CELL

Info:age timestamp=1321586571843

Value=99

1 row (s) in 0.0180seconds

4. Full table scan:

Hbase (main): 013 scan 0 > member'

ROW COLUMN+CELL

Scutshuxue column=address:city, timestamp=1321586240244

Value=hangzhou

Scutshuxue column=address:contry, timestamp=1321586239126

Value=china

Scutshuxue column=address:province, timestamp=1321586239197

Value=zhejiang

Scutshuxue column=info:age,timestamp=1321586571843

Value=99

Scutshuxue column=info:birthday, timestamp=1321586239015, value=1987-06-

seventeen

Scutshuxue column=info:company, timestamp=1321586239071

Value=alibaba

Temp column=info:age, timestamp=1321589609775

Value=59

Xiaofeng column=address:city, timestamp=1321586248400

Value=jieyang

Xiaofeng column=address:contry, timestamp=1321586248316

Value=china

Xiaofeng column=address:province, timestamp=1321586248355

Value=guangdong

Xiaofeng column=address:town, timestamp=1321586249564

Value=xianqiao

Xiaofeng column=info:birthday, timestamp=1321586248202, value=1987-4-

seventeen

Xiaofeng column=info:company, timestamp=1321586248277

Value=alibaba

Xiaofeng column=info:favorite, timestamp=1321586248241

Value=movie

3 row (s) in 0.0570seconds

5. Delete the 'info:age' field of the value whose id is temp

Hbase (main): 016 delete 0 > member','temp','info:age'

0 row (s) in 0.0150seconds

Hbase (main): 018 get 0 > member','temp'

COLUMN CELL

0 row (s) in 0.0150seconds

6. Delete the entire row

Hbase (main): 001deleteall 0 > member','xiaofeng'

0 row (s) in 0.3990seconds

7. Query how many rows there are in the table:

Hbase (main): 019 count 0 > member'

2 row (s) in 0.0160seconds

8. Add the info:age' field to the 'xiaofeng' this id' and use counter to implement the increment

Hbase (main): 057:0*incr 'member','xiaofeng','info:age'

COUNTER VALUE = 1

Hbase (main): 058 get 0 > member','xiaofeng','info:age'

COLUMN CELL

Info:age timestamp=1321590997648

Value=\ X00\ x00\ x01

1 row (s) in 0.0140seconds

Hbase (main): 059 incr 0 > member','xiaofeng','info:age'

COUNTER VALUE = 2

Hbase (main): 060 get 0 > member','xiaofeng','info:age'

COLUMN CELL

Info:age timestamp=1321591025110

Value=\ X00\ x00\ x02

1 row (s) in 0.0160seconds

Gets the value of the current count

Hbase (main): 069 get_counter 0 > member','xiaofeng','info:age'

COUNTER VALUE = 2

9. Empty the entire table:

Hbase (main): 035 truncate 0 > member'

Truncating 'member'table (it may take a while):

-Disabling table...

-Dropping table...

-Creating table...

0 row (s) in 4.3430seconds

As you can see, hbase first drops the disable, and then drop drops and rebuilds the table to achieve the function of truncate.

The architecture of e Hbase:

HBase access interface

1. Native Java API, the most conventional and efficient access method, suitable for Hadoop MapReduce Job parallel batch processing

HBase table data

2. HBase Shell,HBase command line tool, the simplest interface, suitable for HBase management

3. Thrift Gateway, using Thrift serialization technology, supports multiple languages such as Candlespace, PHP, Python, etc.

Access HBase table data online with other heterogeneous systems

4. REST Gateway, which supports REST-style Http API access to HBase and removes language restrictions

5. Pig, you can use Pig Latin streaming programming language to manipulate the data in HBase, similar to Hive, the most essential

Finally, it is compiled into MapReduce Job to deal with HBase table data, which is suitable for data statistics.

6. Hive, the current Release version of Hive has not added support for HBase, but in the next version Hive

HBase will be supported in 0.7.0, and HBase can be accessed using a language similar to SQL

HBase data model

Table & Column Family

Row

Key

Timestamp

Column Family

URI Parser

T3 url= http://www.taobao.com title= daily special

T2 host=taobao.com

R2 T5 url= http://www.alibaba.com content= every day …

T4 host=alibaba.com

? Row Key: row key, primary key of Table, records in Table are sorted by Row Key

? Timestamp: timestamp. The timestamp corresponding to each data operation can be regarded as the version number of the data.

? Column Family: column cluster, Table consists of one or more Column Family horizontally, one

Column Family can be composed of any number of Column, that is, Column Family supports dynamic expansion without prior

Define the number and type of Column. All Column are stored in binary format, and users need to make their own type.

Conversion.

Table & Region

When the Table becomes larger as the number of records increases, it will gradually split into multiple splits, becoming a regions, a

Region is indicated by [startkey,endkey). Different region will be assigned by Master to the corresponding RegionServer for management.

Reason:

-ROOT- & & .meta. Table

There are two special Table,-ROOT- and .meta in HBase.

Meta.: records the Region information of the user table,. META. Can have multiple regoin

?-ROOT-: recorded. META. Region information of the table.-ROOT- has only one region.

? The location of the-ROOT- table is recorded in Zookeeper

Before Client can access user data, you need to access zookeeper, then access the-ROOT- table, and then access .meta.

Table, and finally find the location of the user data to access, which requires multiple network operations, but the client side will do it.

Cache cache.

MapReduce on HBase

The most convenient and practical model for running batch operations on HBase systems is still MapReduce, as shown in the following figure:

The relationship between HBase Table and Region is similar to that between HDFS File and Block. HBase provides supporting

TableInputFormat and TableOutputFormat API, it is convenient to use HBase Table as Hadoop

MapReduce's Source and Sink, for MapReduce Job application developers, need little attention.

Details of the HBase system itself.

HBase system architecture

Client

HBase Client uses HBase's RPC mechanism to communicate with HMaster and HRegionServer, and for management classes

Client and HMaster perform RPC; for data reading and writing operations, and Client and HRegionServer for RPC

Zookeeper

In addition to storing the address of the-ROOT- table and the address of the HMaster in Zookeeper Quorum, HRegionServer will also

The health status of HRegionServer. In addition, Zookeeper avoids the single point problem of HMaster, as described below

HMaster

There is no single point problem with HMaster. Multiple HMaster can be started in HBase through the Master Election of Zookeeper.

The mechanism ensures that there is always a Master running, and HMaster is mainly responsible for the management of Table and Region functionally:

1. Manage users' operations of adding, deleting, modifying and querying Table

two。 Manage HRegionServer load balance and adjust Region distribution

3. After Region Split, responsible for the allocation of new Region

4. Responsible for Regions migration on failed HRegionServer after HRegionServer downtime

HRegionServer

HRegionServer is mainly responsible for reading and writing data to and from the HDFS file system in response to user Istroke O requests, which is the most important in HBase.

The core module.

HRegionServer internally manages a series of HRegion objects, each HRegion corresponding to one of the Table

Region,HRegion consists of multiple HStore. Each HStore corresponds to a Column Family in the Table

You can see that each Column Family is actually a centralized storage unit, so it is best to have something in common

The column of the IO feature is placed in a Column Family, which is the most efficient.

HStore storage is the core of HBase storage, which consists of two parts, one is MemStore, the other is

StoreFiles . MemStore is Sorted Memory Buffer, and the data written by the user is first put into MemStore, when

When the MemStore is full, it will be Flush into a StoreFile (the underlying implementation is HFile), when the number of StoreFile files increases

When the length reaches a certain threshold, the Compact merge operation will be triggered, and multiple StoreFiles will be merged into a single StoreFile.

In the process, version merging and data deletion will be carried out, so you can see that HBase can only add data, all updates and

Delete operations are carried out in the subsequent compact process, which makes the user's write operation as long as it goes into memory

To return immediately, ensuring the high performance of HBase Iamp O. After StoreFiles Compact, it will gradually become bigger and bigger.

StoreFile, when the size of a single StoreFile exceeds a certain threshold, the Split operation will be triggered, and the current Region

If the Split becomes 2 Region, the father Region will be offline, and the Region of the 2 children from the new Split will be assigned to the phase by HMaster.

Should be HRegionServer, so that the pressure of the original 1 Region can be diverted to 2 Region. The following figure describes

The process of Compaction and Split:

After understanding the basic principles of the above HStore, you must also understand the functions of HLog, because the above HStore is found in the

There is no problem under the premise that the system is working properly, but in a distributed system environment, system errors or

Downtime, so once HRegionServer exits unexpectedly, the memory data in MemStore will be lost, which requires citing

I'm in the HLog. There is a HLog object in every HRegionServer, and HLog is an implementation Write Ahead Log

Class, when each user operation is written to MemStore, a piece of data is also written to the HLog file (HLog text)

The HLog file periodically scrolls out the new file and deletes the old file (which has been persisted to StoreFile)

Data). When HRegionServer terminates unexpectedly, HMaster senses through Zookeeper that HMaster first

The legacy HLog files will be processed, and the Log data of different Region will be split into the corresponding region.

And then redistribute the invalid region to get the HRegionServer of these region in Load

In the process of Region, you will find that there is a historical HLog to deal with, so the data in the Replay HLog will go to

In MemStore, then flush to StoreFiles to complete the data recovery.

HBase storage format

All the data files in HBase are stored on the Hadoop HDFS file system, mainly including the two texts mentioned above

Piece type:

1. HFile, the storage format of KeyValue data in HBase. HFile is the binary format file of Hadoop.

In fact, StoreFile makes a lightweight packaging for HFile, that is, the bottom layer of StoreFile is HFile.

2. The storage format of WAL (Write Ahead Log) in HLog File,HBase, which is physically Hadoop.

Sequence File

HFile

The following figure shows the storage format of HFile:

First of all, the HFile file is indefinite in length, and there are only two pieces of fixed length: Trailer and FileInfo. As shown in the picture

Shows that there is a pointer in Trailer to the starting point of other blocks of data. Some Meta letters of the file are recorded in File Info

Information, for example: AVG_KEY_LEN, AVG_VALUE_LEN, LAST_KEY, COMPARATOR

MAX_SEQ_ID_KEY et al. Data Index and Meta Index blocks record the start of each Data block and Meta block

Point.

Data Block is the basic unit of HBase iUnivero. In order to improve efficiency, there are LRU-based HRegionServer

Block Cache mechanism. The size of each Data block can be specified by parameter when creating a Table, large

Block is good for sequential Scan, and small Block is good for random query. Every Data block except the Magic at the beginning is

The KeyValue pairs are spliced together, and the Magic content is a random number designed to prevent data corruption. Later meeting

Details the internal structure of each KeyValue pair.

Each KeyValue pair in HFile is a simple byte array. But this byte array contains a lot

It has many items and has a fixed structure. Let's take a look at the specific structure inside:

It begins with two fixed-length values that represent the length of the Key and the length of the Value. And then there's Key, here we go.

Is a fixed-length value that represents the length of the RowKey, followed by RowKey, followed by a fixed-length value, table

Show the length of Family, then Family, then Qualifier, then two fixed-length numeric values that represent

Time Stamp and Key Type (Put/Delete). The Value part doesn't have such a complex structure, it's just pure binary.

To make the data.

HLogFile

The above figure shows the structure of the HLog file. In fact, the HLog file is an ordinary Hadoop Sequence File.

The Key of Sequence File is a HLogKey object, and the attribution information of the written data is recorded in HLogKey, except for table

And region names, but also include sequence number and timestamp,timestamp are "write time"

The starting value of sequence number is 0, or the last time sequence number is saved in the file system.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.