In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Tuesday, February 19, 2019
-------------------
hbase basic concepts
Introduction to hbase database
Hbase is a database system built on top of hdfs to provide high reliability, high performance, column storage, scalability, and real-time read-write.
It is between nosql and RDBMS, can only retrieve data through the primary key (row key) and the range of the primary key, only supports single-row transactions (you can implement complex operations such as multi-table join through hive support). mainly used to
Store unstructured and semi-structured loose data.
Like Hadoop, Hbase aims to rely primarily on horizontal scaling, increasing compute and storage capacity by adding inexpensive commodity servers.
hbase table structure
Tables in HBase generally have this feature:
1 Large: A table can have 1 billion rows and 1 million columns
2 Column oriented: Column (family) oriented storage and permission control, column (family) independent retrieval. //traditional mysql data is row-oriented storage
Sparse: For null columns, they do not take up storage space, so tables can be designed to be very sparse.
hbase Cluster Structure//See Figure for the role of each component
hbase table structure: When creating a table, there is no need to limit the fields in the table, just specify a number of column families.
When inserting data, column families can store any number of splits (kv pairs, column names & column values)//fields or any number
To query the value of a specific field, you need to specify coordinates, table name---> row key---> column family: column name----> version
Timestamp//A value can have multiple versions, distinguished by version number. Default Return to latest version
HBase stores data in table form. Tables consist of rows and columns. A column is divided into several row families
Row Key Like nosql databases,row key is the primary key used to retrieve records.
There are only three ways to access rows in the hbase table:
1 Access via a single row key
2 Range by row key
3 Full table scanning
This shows that rk design is extremely important in hbase, our next blog focuses on hbase table rk design focus
Row key Row key
It can be any string (the maximum length is 64KB, the actual application length is generally 10-100bytes), in hbase, row key is stored as a byte array. When stored, the data is sorted according to the dictionary order of Row key. When designing keys, you want to fully sort the storage feature, putting together rows that are often read together. (positional dependence)
Note:
The result of lexicographical sorting of int is
1,10,100,11,12,13,14,15,16,17,18,19,2,20,21,…,9,91,92,93,94,95,96,97,98,99。To preserve the natural order of ×××, the row keys must be left padded with zeros. A read or write of a row is an atomic operation (regardless of how many columns are read or written at once). This design decision makes it easy for the user to understand how the program behaves when concurrent updates are made to the same row.
Family (column family)
Each column in the hbase table belongs to a column family.
Column families are part of the schema of a table (columns are not) and must be defined before using the table.
Column names are prefixed with column families. For example, courses:history and courses:math belong to the courses family.
Access control, disk and memory usage statistics are all done at the column family level.
In practice, control rights on column families help us manage different types of apps: some apps can add new base data, some can read base data and create inherited column families, and some allow only browsing data (and perhaps not all data for privacy reasons).
//Hbase column family is not the more the better, the official recommendation is that the column family is preferably less than or equal to 3. The scenarios we use are generally 1 column family.
Timestamp//A value can have multiple versions, distinguished by version number.
HBase is defined by rows and columns as a storage unit called a cell. Each cell holds multiple versions of the same piece of data. Versions are indexed by timestamps. The type of timestamp is a 64-bit integer. The timestamp can be assigned by hbase(automatically when data is written), in which case the timestamp is the current system time accurate to milliseconds. Timestamps can also be explicitly assigned by clients. If the application is to avoid data version conflicts, it must generate its own unique timestamps. In each cell, different versions of data are sorted in reverse chronological order, that is, the latest data is ranked first. In order to avoid the administrative burden (including storage and indexing) caused by too many versions of data, hbase provides two ways to recover data versions. One is to save the last n versions of the data, and the other is to save the most recent version (for example, the last seven days). The user can make settings for each column family.
Cell //is composed of kv group from or kv+version can be understood as a cell in the excell table
A unit uniquely determined by {row key, column( = + ), version}.
The data in the cell has no type and is stored in byte code form.
Comparison of hbase table structure and traditional database table structure//see figure
Summary: In a relational database, you cannot add a field to a single row. If you want to add it, you have to add it all.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.