In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
1. HBase
A highly reliable, high-performance, column-oriented, scalable, distributed column storage open source database built on HDFS, mainly used to store massive data, while using mapreduce to deal with data in HBase, using zookeeper as a collaborative service. It is relatively easy to read and write, and conditional queries are not supported.
2. Comparison between HBase and HDFS
Both have good fault tolerance and expansibility.
HDFS is suitable for batch processing scenarios, but it does not support immediate data search, incremental data processing and data update.
3. Characteristics of HBase:
Massive data: can support millions of columns, divided into multiple region
Schemaless: each row has a sortable primary key and as many columns as possible, columns can be dynamically added as needed, and different rows in the same table can have distinct columns.
Column-oriented storage: column-oriented storage and permission control, column independent retrieval
Record sparse: listed as empty (NULL) and does not take up storage space
Multiple versions of data: there can be multiple versions of data per unit, which can be distinguished by timestamp
Single data type: all data is a string
4. HBase structure composition
Row key:
Byte array
The primary key of each record in the table
Convenient and quick search
Timestamp:
The timestamp corresponding to each data operation is regarded as the version number of the data
Column Family:
Have a name (string)
Contains one or more related columns (Column)
Column:
Value:
5. Supported operation
All operations are based on rowkey
Support for CRUD (create, read, update, delete) as well as put, get, multiput, scan
There is no built-in join operation, which can be implemented using MapReduce
6. HBase maintains a multi-level index for each value, namely
All rows in Table are arranged according to the dictionary order of row key, and split into multiple region,region in the direction of rows. It is the smallest unit of distributed storage and load balancing in HBase, and different Region is distributed to different RegionServer.
When region increases to a threshold, region will be divided into two new region, and then there will be more and more region.
Region consists of one or more store, and each store holds one columns family
Each store consists of a memStore and 0 to more StoreFile
MemStore is stored in memory and StoreFile is stored on HDFS
7. Basic components of HBase
Client: communicate with HMaster and HRegionServer through RPC and maintain cache to speed up access to HBase
Zookeeper:
Ensure that there is always a HMaster in the cluster
Address entry to store all Region
Real-time monitor the online and offline information of Region server, and notify Master in real time
Store schema and Table metadata for HBase
HMaster:
Assign region to Region server
Load balancing of complex Region server
Discover failed Region server and reallocate region on it
Manage users' operations on adding, deleting, modifying and querying table
HRegionServer:
Maintain region and process Igamo requests to region
Responsible for shredding region that becomes too large during operation
8.HBase fault-tolerant mechanism
HMaster:zookeeper reselects a new Master
In the process without Master, data reading still proceeds as usual.
In the process without Master, region segmentation and load balancing cannot be performed.
HRegionServer fault tolerance: regularly report the heartbeat to Zookeeper, if the heartbeat does not occur within a certain period of time
HMaster reassigns the Region on this HRegionServer to another RegionServer
The "pre-write" log on the failed server is split by the master server and sent to the new RegionServer
9. HBase access method
Native Java API: conventional and efficient
Create a Configuration object (containing various configuration information hbase-default.xml,hbase-site.xml)
Configuration conf=HbaseConfiguration.create ()
Build a HTable handle (provide a Configuration object and provide the name of the Table to be accessed)
HTable table=new HTable (conf,tableName)
Provide only row-level transactions, strict row consistency, concurrent read, sequential write
Perform operations (batch processing that supports put,get,delete,scan, etc.)
Table.getTableName ()
Close the HTable handle (flush memory data to disk and free resources)
Table.close ()
HBase Shell: for management
Thrift Gateway: serialization, supporting languages such as cymbals, Pythons, etc.
Start thrift server:hbase-daemon.sh start thtift
Generate Hbase thrift client interface file
Thrift-gen xxx Hbase.thrift
Write client code
Eg:1. Thrift-gen py hbase.thrift
2. ${HBASE_HOME} / src/examples/thrift/DemoClient.py
3. Python DemoClient.py
Rest Gateway:Rest style Http API
Mapreduce: using Mapreduce jobs to process Hbase data
Provide api such as TableMapper, TableReducer, TableInputFormat, TableOutputFormat, etc.
The main usage of 10.HBase Shell
Enter the console: bin/hbase shell
Create tables: create TABLE_NAME,COLUMN_Family1, COLUMN_Family2,...COLUMN_FamilyN
Add record: put TABLE_NAME,ROW_NAME,COLUMN_Family:COLUMN_NAME,VALUE
View record: get TABLE_NAME,ROW_NAME # returns the most recent value by default
Number of statistical records: count TABLE_NAME
Delete the table:
Disable TABLE_NAME
Drop TABLE_NAME
Delete record:
Delete TABLE_NAME,ROW_NAME,COLUMN_Family:COLUMN_NAME # delete an entry
Delete TABLE_NAME,ROW_NAME # Delete all
Delete a column cluster
Disable TABLE_NAME
Alter TABLE_NAME, {NAME= > 'tab1_add', METHOD= >' delete'}
Enable TABLE_NAME
Full table scan:
Scan TABLE_NAME
Specify all data in the column: scan TABLE_NAME,COLUMN_Family
View all tables: list
View server status: status
View hbase version: version
View table structure: descirbe TABLE_NAME
Determine whether the table exists: exists TABLE_NAME
Determine whether the table is enable is_enabled TABLE_NAME
Clearing table: truncate TABLE_NAME
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.