In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
1. Brief introduction
HBase is a distributed, column-oriented open source database, derived from a paper by google, "bigtable: a distributed Storage system for structured data". HBase is an open source implementation of Google Bigtable. It uses Hadoop HDFS as its file storage system, Hadoop MapReduce to deal with massive data in HBase, and Zookeeper as a collaborative service.
2. Table structure of HBase
HBase stores data as a table. The table consists of rows and columns. Columns are divided into column families / column clusters (column family).
Row Key
Column-family1
Column-family2
Column-family3
Column1
Column2
Column1
Column2
Column3
Column1
Key1
T1:abc
T2:gdxdf
T4:dfads
T3:hello
T2:world
Key2
T3:abc
T1:gdxdf
T4:dfads
T3:hello
T2:dfdsfa
T3:dfdf
Key3
T2:dfadfasd
T1:dfdasddsf
T2:dfxxdfasd
T1:taobao.com
As shown in the figure above, key1,key2,key3 is the only row key value for three records, and column-family1,column-family2,column-family3 is three column families, each of which includes several columns. For example, column-family1 consists of two columns under the column family, and the name is column1 and column2,t1:abc,t2:gdxdf is the only unit cell determined by row key1 and column-family1-column1. There are two data in this cell, abc and gdxdf. The timestamps of the two values are different. They are T1 and T2 respectively. Hbase will return the latest time value to the requestor.
The specific meanings of these nouns are as follows:
(1) Row Key
Like nosql databases, row key is the primary key used to retrieve records. There are only three ways to access rows in hbase table:
(1.1) access through a single row key
(1.2) range through row key
(1.3) full table scan
The Row key line key (Row key) can be any string (the maximum length is 64KB, which is usually 10-100bytes in practical applications). Within hbase, row key is saved as a byte array.
When storing, the data is sorted and stored according to the byte order of Row key. When designing a key, you should fully sort the storage feature and put together the row stores that are often read together. (location correlation)
Note:
The result of lexicographic order of int is 1, 10, 100, 11, 12, 13, 14, 14, 16, 17, 18, 19, 20, 21, … , 9,91,92,93,94,95,96,97,98,99 . To maintain the natural order of × ×, the row key must be filled to the left with 0.
One read and write of a row is an atomic operation (no matter how many columns are read and written at a time). This design decision makes it easy for users to understand the behavior of the program when concurrent updates are performed on the same row.
(2) column family of column families
Each column in the hbase table belongs to a column family. Column families are part of the chema of the table (while columns are not) and must be defined before using the table. Column names are prefixed with column families. For example, courses:history and courses:math all belong to the courses column family.
Access control, disk and memory usage statistics are all carried out at the column family level. In practical applications, control permissions on column families can help us manage different types of applications: we allow some applications to add new basic data, some applications can read basic data and create inherited column families, and some applications are only allowed to browse data (or even not all data for privacy reasons).
(3) Unit Cell
The storage unit determined by row and columns in HBase is called cell. The unit uniquely determined by {row key, column (= +), version}. The data in cell is typeless and is all stored in bytecode form.
(4) timestamp timestamp
Each cell holds multiple versions of the same data. The version is indexed by timestamp. The type of timestamp is 64-bit integer. The timestamp can be assigned by hbase (automatically when the data is written), where the timestamp is the current system time accurate to milliseconds. The timestamp can also be explicitly assigned by the customer. If the application wants to avoid data version conflicts, it must generate its own unique timestamps. In each cell, different versions of the data are sorted in reverse chronological order, meaning that the latest data comes first.
In order to avoid the burden of management (including storage and indexing) caused by too many versions of data, hbase provides two ways to recycle data versions. One is to save the last n versions of the data, and the other is to save the most recent version (for example, the last seven days). You can set it for each column family.
3. The basic usage of HBase shell
Hbase provides a shell terminal for user interaction. You can see the help information for the command by executing help get.
Demonstrate the use of hbase with an example of a student's score sheet on the Internet.
Name
Grad
Course
Math
Art
Zkb
five
ninety-seven
eighty-seven
Baoniu
four
eighty-nine
eighty
Here grad is a column for the table, course is a column family for the table, this column family is composed of two columns math and art, of course, we can create more column families in course according to our needs, such as computer,physics and other corresponding columns to join the course column family. What should be noted in the figure is the value of 90, and the columns under the column family can also be unnamed.
(1) create a table scores with two column families grad and courese
Hbase (main): 001create'scores','grade', 0 > course'
0 row (s) in 0.4780 seconds
(2) check which tables are available in the current HBase
Hbase (main): 002purl 0 > list
TABLE
Scores
1 row (s) in 0.0270 seconds
(3) View the construction of the table
Hbase (main): 004VR 0 > describe'scores'
DESCRIPTION ENABLED
{NAME = > 'scores', FAMILIES = > [{NAME = >' course', BLOOMFILTER = > 'NONE', REPLICATION_SCOPE = >' 0mm, true
COMPRESSION = > 'NONE', VERSIONS = >' 319, TTL = > '2147483647, BLOCKSIZE = >' 65536, IN_MEMORY = > 'fal
Se', BLOCKCACHE = > 'true'}, {NAME = >' grade', BLOOMFILTER = > 'NONE', REPLICATION_SCOPE = >' 0mm, COMPR
ESSION = > 'NONE', VERSIONS = >' 319, TTL = > '2147483647, BLOCKSIZE = >' 65536, IN_MEMORY = > 'false'
BLOCKCACHE = > 'true'}]}
1 row (s) in 0.0390 seconds
(4) add a row of data, the row name is zkb column family grad, and the column name is "" value bit 5.
Hbase (main): 006VR 0 > put'scores','zkb','grade:','5'
0 row (s) in 0.0420 seconds
(5) add a column to the column family course of the data in the row of zkb, 97 >
Hbase (main): 007VR 0 > put'scores','zkb','course:math','97'
0 row (s) in 0.0270 seconds
(6) add a column to the column family course of the data in the row of zkb, 87 >
Hbase (main): 008 0 > put'scores','zkb','course:art','87'
0 row (s) in 0.0260 seconds
(7) add a row of data, the row name is baoniu column family grad, the column name is "" and the value is 4.
Hbase (main): 009 0 > put'scores','baoniu','grade:','4'
0 row (s) in 0.0260 seconds
(8) add a column to the column family course of the data in the row of baoniu, 89 >
Hbase (main): 010 0 > put'scores','baoniu','course:math','89'
0 row (s) in 0.0270 seconds
(9) add a column to the column family course of the data in the row of Jerry, 80 >
Hbase (main): 011 0 > put'scores','baoniu','course:art','80'
0 row (s) in 0.0270 seconds
(10) View the relevant data of zkb in scores table
Hbase (main): 012 0 > get'scores','zkb'
COLUMN CELL
Course:art timestamp=1316100110921, value=87
Course:math timestamp=1316100025944, value=97
Grade: timestamp=1316099975625, value=5
3 row (s) in 0.0480 seconds
(11) View all data in the scores table
Note: the scan command can specify startrow,stoprow to scan multiple row, for example: scan 'user_test', {COLUMNS = >' info:username',LIMIT = > 10, STARTROW = > 'test',STOPROW= >' test2'}
Hbase (main): 013pur0 > scan'scores'
ROW COLUMN+CELL
Baoniu column=course:art, timestamp=1316100293784, value=80
Baoniu column=course:math, timestamp=1316100234410, value=89
Baoniu column=grade:, timestamp=1316100178609, value=4
Zkb column=course:art, timestamp=1316100110921, value=87
Zkb column=course:math, timestamp=1316100025944, value=97
Zkb column=grade:, timestamp=1316099975625, value=5
2 row (s) in 0.0470 seconds
(12) View all data in the scores table, all data of the courses column family
Hbase (main): 017COLUMNS 0 > scan'scores', {COLUMNS = > 'course'}
ROW COLUMN+CELL
Baoniu column=course:art, timestamp=1316100293784, value=80
Baoniu column=course:math, timestamp=1316100234410, value=89
Zkb column=course:art, timestamp=1316100110921, value=87
Zkb column=course:math, timestamp=1316100025944, value=97
2 row (s) in 0.0350 seconds
(13) Delete scores table
Hbase (main): 024pur0 > disable'scores'
0 row (s) in 0.0330 seconds
Hbase (main): 025 0 > drop'scores'
0 row (s) in 1.0840 seconds
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.