In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Hbase installation, architecture, usin
Install hbase:
Download the installation package:
Hbase-1.0.0-bin.tar.gz
Mdkir / data/hbase
Mdkir / data/hbase/tmp
Mkdir / data/hbase/
Hadoop dfs-mkdir / user/hadoop/rongxin/hbase
two。 Decompression
[hadoop@master hbase] $tar-zxvf hbase-1.0.0-bin.tar.gz
3. Go to the lib directory of hbase and replace the version of the hadoop jar package
Modify the configuration file
Hbase-env.sh
Hbase-site.xml
[hadoop@master hbase] $vim hbase-1.0.0/conf/regionservers
Slave1
Slave2
Vim / etc/profile
Export HBASE_HOME=/data/hbase/hbase-1.0.0
Export PATH=$PATH:$HBASE_HOME/bin
Source / etc/profile
Start hbase:
[hadoop@master hbase] $start-hbase.sh
Check through jps to see if it has been started:
[hadoop@master hbase] $jps
32211 HMaster
[hadoop@slave1 lib] $jps
20584 HRegionServer
[hadoop@slave2 lib] $jps
2698 HRegionServer
Check through web to see if hbase has been started:
Http://master:16030/master-status
Common command line tools for H He base:
Name command expression
Create table create 'table name', 'column name 1' column name 2 'column name N'
Add records put 'table name', 'row name', 'column name:', 'value'
View records get 'table name', 'row name'
View the total number of records in the table count 'table name'
Delete records delete 'table name', 'row name', 'column name'
Delete a table
The table must be blocked before it can be deleted. The first step is disable 'Table name'
Two-step drop 'table name'
View all records scan "Table name"
View the contents of a table in a column
Have data
Scan "Table name", ['column name:']
To update the record is to rewrite it and overwrite it.
Here are some instructions for common commands. Enter help for help in hbaseshell. In this article, we will first introduce
The first three and the last two will be introduced in the next blog post.
COMMAND GROUPS:
Group name: general
Commands: status, version
Group name: ddl
Commands: alter, create, describe, disable,drop, enable, exists, is_disabled, is_enabled, list
Group name: dml
Commands: count, delete, deleteall, get,get_counter, incr, put, scan, truncate
Group name: tools
Commands: assign, balance_switch, balancer,close_region, compact, flush, major_compact, move
Split, unassign, zk_dump
Group name: replication
Commands: add_peer, disable_peer,enable_peer, remove_peer, start_replication, stop_replication
I. General operation
1. Query server status
Hbase (main): 024pur0 > status
3 servers, 0 dead,1.0000 average load
two。 Query hive version
Hbase (main): 025 0 > version
0.90.4, r1150278,Sun Jul 24 15:53:29 PDT 2011
II. L DDL operation
1. Create a table
Hbase (main): 011 create 0 > member','member_id','address','info'
0 row (s) in 1.2210seconds
two。 Get a description of the table
Hbase (main): 012 0 > list
TABLE
Member
1 row (s) in 0.0160seconds
Hbase (main): 006 describe 0 > member'
DESCRIPTION ENABLED
{NAME= > 'member', FAMILIES = > [{NAME= >' address', BLOOMFILTER = > 'NONE'
REPLICATION_SCOPE = > '0mm, true
VERSIONS = > '34th, COMPRESSION = >' NONE',TTL = > '21474836474th, BLOCKSIZE = >
'65536', IN_MEMORY = >'fa
Lse', BLOCKCACHE = > 'true'}, {NAME = >' info', BLOOMFILTER = > 'NONE'
REPLICATION_SCOPE = > '0mm, VERSI
ONS = > '3Qing, COMPRESSION = >' NONE', TTL= > '21474836474th, BLOCKSIZE = >' 65536'
IN_MEMORY = > 'false'
BLOCKCACHE = >
'true'}]}
1 row (s) in 0.0230seconds
3. Delete a column family, alter,disable,enable
We have built three column families before, but we found that the column family member_id is redundant because it is the primary key, so
We need to delete it.
Hbase (main): 003member', 0 > alter 'member', {NAME= >' member_id',METHOD= > 'delete'}
ERROR: Table memberis enabled. Disable it first before altering.
An error is reported. When deleting a column family, you must first drop the table to disable.
Hbase (main): 004disable 0 > member'
0 row (s) in 2.0390seconds
Hbase (main): 005NAME= 0 > alter'member', {NAME= > 'member_id',METHOD= >' delete'}
0 row (s) in 0.0560seconds
Hbase (main): 006 describe 0 > member'
DESCRIPTION ENABLED
{NAME= > 'member', FAMILIES = > [{NAME= >' address', BLOOMFILTER = > 'NONE'
REPLICATION_SCOPE = > '0room.false
VERSIONS = > '34th, COMPRESSION = >' NONE',TTL = > '21474836474th, BLOCKSIZE = >
'65536', IN_MEMORY = >'fa
Lse', BLOCKCACHE = > 'true'}, {NAME = >' info', BLOOMFILTER = > 'NONE'
REPLICATION_SCOPE = > '0mm, VERSI
ONS = > '3Qing, COMPRESSION = >' NONE', TTL= > '21474836474th, BLOCKSIZE = >' 65536'
IN_MEMORY = > 'false'
BLOCKCACHE = >
'true'}]}
1 row (s) in 0.0230seconds
The column family has been deleted, so we continue to change the table enable
Hbase (main): 008 enable 0 > member'
0 row (s) in 2.0420seconds
4. List all the tables
Hbase (main): 028 0 > list
TABLE
Member
Temp_table
2 row (s) in 0.0150seconds
5.drop a table
Hbase (main): 029 disable 0 > temp_table'
0 row (s) in 2.0590seconds
Hbase (main): 030 drop 0 > temp_table'
0 row (s) in 1.1070seconds
6. Query whether the table exists
Hbase (main): 021 exists 0 > member'
Table member
Doesexist
0 row (s) in 0.1610seconds
7. Determine whether the table is enable
Hbase (main): 034Rank 0 > is_enabled 'member'
True
0 row (s) in 0.0110seconds
8. Determine whether the table is disable
Hbase (main): 032 is_disabled 0 > member'
False
0 row (s) in 0.0110seconds
3. DML operation
1. Insert several records
Put'member','scutshuxue','info:age','24'
Put'member','scutshuxue','info:birthday','1987-06-17'
Put'member','scutshuxue','info:company','alibaba'
Put'member','scutshuxue','address:contry','china'
Put'member','scutshuxue','address:province','zhejiang'
Put'member','scutshuxue','address:city','hangzhou'
Put'member','xiaofeng','info:birthday','1987-4-17'
Put'member','xiaofeng','info:favorite','movie'
Put'member','xiaofeng','info:company','alibaba'
Put'member','xiaofeng','address:contry','china'
Put'member','xiaofeng','address:province','guangdong'
Put'member','xiaofeng','address:city','jieyang'
Put'member','xiaofeng','address:town','xianqiao'
two。 Get a piece of data
Get all the data of an id
Hbase (main): 001get 0 > member','scutshuxue'
COLUMN CELL
Address:city timestamp=1321586240244
Value=hangzhou
Address:contry timestamp=1321586239126
Value=china
Address:province timestamp=1321586239197
Value=zhejiang
Info:age timestamp=1321586238965
Value=24
Info:birthday timestamp=1321586239015, value=1987-06-
seventeen
Info:company timestamp=1321586239071
Value=alibaba
6 row (s) in 0.4720seconds
Get an id, all the data of a column family
Hbase (main): 002 get 0 > member','scutshuxue','info'
COLUMN CELL
Info:age timestamp=1321586238965
Value=24
Info:birthday timestamp=1321586239015, value=1987-06-
seventeen
Info:company timestamp=1321586239071
Value=alibaba
3 row (s) in 0.0210seconds
Get an id, all the data of a column in a column family
Hbase (main): 002 get 0 > member','scutshuxue','info:age'
COLUMN CELL
Info:age timestamp=1321586238965
Value=24
1 row (s) in 0.0320seconds
6. Update a record
Change the age of scutshuxue to 99
Hbase (main): 004 0 > put 'member','scutshuxue','info:age',' 99'
0 row (s) in 0.0210seconds
Hbase (main): 005hbase 0 > get 'member','scutshuxue','info:age'
COLUMN CELL
Info:age timestamp=1321586571843
Value=99
1 row (s) in 0.0180seconds
3. Get two versions of data through timestamp
Hbase (main): 010 0 > get
'member','scutshuxue', {COLUMN= > 'info:age',TIMESTAMP= > 1321586238965}
COLUMN CELL
Info:age timestamp=1321586238965
Value=24
1 row (s) in 0.0140seconds
Hbase (main): 011VO > get
'member','scutshuxue', {COLUMN= > 'info:age',TIMESTAMP= > 1321586571843}
COLUMN CELL
Info:age timestamp=1321586571843
Value=99
1 row (s) in 0.0180seconds
4. Full table scan:
Hbase (main): 013 scan 0 > member'
ROW COLUMN+CELL
Scutshuxue column=address:city, timestamp=1321586240244
Value=hangzhou
Scutshuxue column=address:contry, timestamp=1321586239126
Value=china
Scutshuxue column=address:province, timestamp=1321586239197
Value=zhejiang
Scutshuxue column=info:age,timestamp=1321586571843
Value=99
Scutshuxue column=info:birthday, timestamp=1321586239015, value=1987-06-
seventeen
Scutshuxue column=info:company, timestamp=1321586239071
Value=alibaba
Temp column=info:age, timestamp=1321589609775
Value=59
Xiaofeng column=address:city, timestamp=1321586248400
Value=jieyang
Xiaofeng column=address:contry, timestamp=1321586248316
Value=china
Xiaofeng column=address:province, timestamp=1321586248355
Value=guangdong
Xiaofeng column=address:town, timestamp=1321586249564
Value=xianqiao
Xiaofeng column=info:birthday, timestamp=1321586248202, value=1987-4-
seventeen
Xiaofeng column=info:company, timestamp=1321586248277
Value=alibaba
Xiaofeng column=info:favorite, timestamp=1321586248241
Value=movie
3 row (s) in 0.0570seconds
5. Delete the 'info:age' field of the value whose id is temp
Hbase (main): 016 delete 0 > member','temp','info:age'
0 row (s) in 0.0150seconds
Hbase (main): 018 get 0 > member','temp'
COLUMN CELL
0 row (s) in 0.0150seconds
6. Delete the entire row
Hbase (main): 001deleteall 0 > member','xiaofeng'
0 row (s) in 0.3990seconds
7. Query how many rows there are in the table:
Hbase (main): 019 count 0 > member'
2 row (s) in 0.0160seconds
8. Add the info:age' field to the 'xiaofeng' this id' and use counter to implement the increment
Hbase (main): 057:0*incr 'member','xiaofeng','info:age'
COUNTER VALUE = 1
Hbase (main): 058 get 0 > member','xiaofeng','info:age'
COLUMN CELL
Info:age timestamp=1321590997648
Value=\ X00\ x00\ x01
1 row (s) in 0.0140seconds
Hbase (main): 059 incr 0 > member','xiaofeng','info:age'
COUNTER VALUE = 2
Hbase (main): 060 get 0 > member','xiaofeng','info:age'
COLUMN CELL
Info:age timestamp=1321591025110
Value=\ X00\ x00\ x02
1 row (s) in 0.0160seconds
Gets the value of the current count
Hbase (main): 069 get_counter 0 > member','xiaofeng','info:age'
COUNTER VALUE = 2
9. Empty the entire table:
Hbase (main): 035 truncate 0 > member'
Truncating 'member'table (it may take a while):
-Disabling table...
-Dropping table...
-Creating table...
0 row (s) in 4.3430seconds
As you can see, hbase first drops the disable, and then drop drops and rebuilds the table to achieve the function of truncate.
The architecture of e Hbase:
HBase access interface
1. Native Java API, the most conventional and efficient access method, suitable for Hadoop MapReduce Job parallel batch processing
HBase table data
2. HBase Shell,HBase command line tool, the simplest interface, suitable for HBase management
3. Thrift Gateway, using Thrift serialization technology, supports multiple languages such as Candlespace, PHP, Python, etc.
Access HBase table data online with other heterogeneous systems
4. REST Gateway, which supports REST-style Http API access to HBase and removes language restrictions
5. Pig, you can use Pig Latin streaming programming language to manipulate the data in HBase, similar to Hive, the most essential
Finally, it is compiled into MapReduce Job to deal with HBase table data, which is suitable for data statistics.
6. Hive, the current Release version of Hive has not added support for HBase, but in the next version Hive
HBase will be supported in 0.7.0, and HBase can be accessed using a language similar to SQL
HBase data model
Table & Column Family
Row
Key
Timestamp
Column Family
URI Parser
R1
T3 url= http://www.taobao.com title= daily special
T2 host=taobao.com
T1
R2 T5 url= http://www.alibaba.com content= every day …
T4 host=alibaba.com
? Row Key: row key, primary key of Table, records in Table are sorted by Row Key
? Timestamp: timestamp. The timestamp corresponding to each data operation can be regarded as the version number of the data.
? Column Family: column cluster, Table consists of one or more Column Family horizontally, one
Column Family can be composed of any number of Column, that is, Column Family supports dynamic expansion without prior
Define the number and type of Column. All Column are stored in binary format, and users need to make their own type.
Conversion.
Table & Region
When the Table becomes larger as the number of records increases, it will gradually split into multiple splits, becoming a regions, a
Region is indicated by [startkey,endkey). Different region will be assigned by Master to the corresponding RegionServer for management.
Reason:
-ROOT- & & .meta. Table
There are two special Table,-ROOT- and .meta in HBase.
Meta.: records the Region information of the user table,. META. Can have multiple regoin
?-ROOT-: recorded. META. Region information of the table.-ROOT- has only one region.
? The location of the-ROOT- table is recorded in Zookeeper
Before Client can access user data, you need to access zookeeper, then access the-ROOT- table, and then access .meta.
Table, and finally find the location of the user data to access, which requires multiple network operations, but the client side will do it.
Cache cache.
MapReduce on HBase
The most convenient and practical model for running batch operations on HBase systems is still MapReduce, as shown in the following figure:
The relationship between HBase Table and Region is similar to that between HDFS File and Block. HBase provides supporting
TableInputFormat and TableOutputFormat API, it is convenient to use HBase Table as Hadoop
MapReduce's Source and Sink, for MapReduce Job application developers, need little attention.
Details of the HBase system itself.
HBase system architecture
Client
HBase Client uses HBase's RPC mechanism to communicate with HMaster and HRegionServer, and for management classes
Client and HMaster perform RPC; for data reading and writing operations, and Client and HRegionServer for RPC
Zookeeper
In addition to storing the address of the-ROOT- table and the address of the HMaster in Zookeeper Quorum, HRegionServer will also
Register yourself in Zookeeper as Ephemeral so that HMaster can perceive each at any time
The health status of HRegionServer. In addition, Zookeeper avoids the single point problem of HMaster, as described below
HMaster
There is no single point problem with HMaster. Multiple HMaster can be started in HBase through the Master Election of Zookeeper.
The mechanism ensures that there is always a Master running, and HMaster is mainly responsible for the management of Table and Region functionally:
1. Manage users' operations of adding, deleting, modifying and querying Table
two。 Manage HRegionServer load balance and adjust Region distribution
3. After Region Split, responsible for the allocation of new Region
4. Responsible for Regions migration on failed HRegionServer after HRegionServer downtime
HRegionServer
HRegionServer is mainly responsible for reading and writing data to and from the HDFS file system in response to user Istroke O requests, which is the most important in HBase.
The core module.
HRegionServer internally manages a series of HRegion objects, each HRegion corresponding to one of the Table
Region,HRegion consists of multiple HStore. Each HStore corresponds to a Column Family in the Table
You can see that each Column Family is actually a centralized storage unit, so it is best to have something in common
The column of the IO feature is placed in a Column Family, which is the most efficient.
HStore storage is the core of HBase storage, which consists of two parts, one is MemStore, the other is
StoreFiles . MemStore is Sorted Memory Buffer, and the data written by the user is first put into MemStore, when
When the MemStore is full, it will be Flush into a StoreFile (the underlying implementation is HFile), when the number of StoreFile files increases
When the length reaches a certain threshold, the Compact merge operation will be triggered, and multiple StoreFiles will be merged into a single StoreFile.
In the process, version merging and data deletion will be carried out, so you can see that HBase can only add data, all updates and
Delete operations are carried out in the subsequent compact process, which makes the user's write operation as long as it goes into memory
To return immediately, ensuring the high performance of HBase Iamp O. After StoreFiles Compact, it will gradually become bigger and bigger.
StoreFile, when the size of a single StoreFile exceeds a certain threshold, the Split operation will be triggered, and the current Region
If the Split becomes 2 Region, the father Region will be offline, and the Region of the 2 children from the new Split will be assigned to the phase by HMaster.
Should be HRegionServer, so that the pressure of the original 1 Region can be diverted to 2 Region. The following figure describes
The process of Compaction and Split:
After understanding the basic principles of the above HStore, you must also understand the functions of HLog, because the above HStore is found in the
There is no problem under the premise that the system is working properly, but in a distributed system environment, system errors or
Downtime, so once HRegionServer exits unexpectedly, the memory data in MemStore will be lost, which requires citing
I'm in the HLog. There is a HLog object in every HRegionServer, and HLog is an implementation Write Ahead Log
Class, when each user operation is written to MemStore, a piece of data is also written to the HLog file (HLog text)
The HLog file periodically scrolls out the new file and deletes the old file (which has been persisted to StoreFile)
Data). When HRegionServer terminates unexpectedly, HMaster senses through Zookeeper that HMaster first
The legacy HLog files will be processed, and the Log data of different Region will be split into the corresponding region.
And then redistribute the invalid region to get the HRegionServer of these region in Load
In the process of Region, you will find that there is a historical HLog to deal with, so the data in the Replay HLog will go to
In MemStore, then flush to StoreFiles to complete the data recovery.
HBase storage format
All the data files in HBase are stored on the Hadoop HDFS file system, mainly including the two texts mentioned above
Piece type:
1. HFile, the storage format of KeyValue data in HBase. HFile is the binary format file of Hadoop.
In fact, StoreFile makes a lightweight packaging for HFile, that is, the bottom layer of StoreFile is HFile.
2. The storage format of WAL (Write Ahead Log) in HLog File,HBase, which is physically Hadoop.
Sequence File
HFile
The following figure shows the storage format of HFile:
First of all, the HFile file is indefinite in length, and there are only two pieces of fixed length: Trailer and FileInfo. As shown in the picture
Shows that there is a pointer in Trailer to the starting point of other blocks of data. Some Meta letters of the file are recorded in File Info
Information, for example: AVG_KEY_LEN, AVG_VALUE_LEN, LAST_KEY, COMPARATOR
MAX_SEQ_ID_KEY et al. Data Index and Meta Index blocks record the start of each Data block and Meta block
Point.
Data Block is the basic unit of HBase iUnivero. In order to improve efficiency, there are LRU-based HRegionServer
Block Cache mechanism. The size of each Data block can be specified by parameter when creating a Table, large
Block is good for sequential Scan, and small Block is good for random query. Every Data block except the Magic at the beginning is
The KeyValue pairs are spliced together, and the Magic content is a random number designed to prevent data corruption. Later meeting
Details the internal structure of each KeyValue pair.
Each KeyValue pair in HFile is a simple byte array. But this byte array contains a lot
It has many items and has a fixed structure. Let's take a look at the specific structure inside:
It begins with two fixed-length values that represent the length of the Key and the length of the Value. And then there's Key, here we go.
Is a fixed-length value that represents the length of the RowKey, followed by RowKey, followed by a fixed-length value, table
Show the length of Family, then Family, then Qualifier, then two fixed-length numeric values that represent
Time Stamp and Key Type (Put/Delete). The Value part doesn't have such a complex structure, it's just pure binary.
To make the data.
HLogFile
The above figure shows the structure of the HLog file. In fact, the HLog file is an ordinary Hadoop Sequence File.
The Key of Sequence File is a HLogKey object, and the attribution information of the written data is recorded in HLogKey, except for table
And region names, but also include sequence number and timestamp,timestamp are "write time"
The starting value of sequence number is 0, or the last time sequence number is saved in the file system.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.