Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Detailed introduction of hbase table structure

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "detailed introduction of hbase table structure". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Now let the editor to take you to learn the "detailed introduction of hbase table structure"!

HBase design

Every table in HBase is called BigTable. BigTable stores a series of row records, which have three basic types of definitions: Row Key, Time Stamp, and Column.

1. Row Key is the unique identity of the line in the BigTable.

2. Time Stamp is the associated timestamp for each data operation, which can be regarded as the version of SVN.

3. Column is defined as

< family>

:

< label>

Through these two parts, you can specify a unique data storage column, the definition and modification of family requires a DDL operation similar to DB to HBase, while label can be used directly without definition, which also provides a means to dynamically customize columns. Another role of family is to optimize read and write operations in physical storage, which is physically close to the data stored in family, so this feature can be used in the process of business design.

Data model

HBase stores data as a table. A table consists of rows and columns. Columns are divided into several column families (row family), as shown in the following illustration.

1 、 Row Key

Like the NoSQL database, Row Key is the primary key used to retrieve records. There are only three ways to access rows in HBase table:

1) access through a single Row Key.

2) scan the whole table through the range of Row Key.

3) Row Key can make any string (the maximum length is 64KB, and the length in practical application is usually 10 ~ 100bytes). Inside HBase, Row Key is saved as a byte array.

When storing, the data is sorted and stored according to the dictionary order (byte order) of Row Key. When designing a Key, you should fully sort the storage feature, storing together rows that are often read together (location dependency).

Note that the result of the lexicographic order of the int is 1meme10, 100, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 9, 91, 91, 92, 94, 94, 94, 96, 96, 98, 99. To save the natural order of shaping, Row Key must be filled to the left with 0.

One read and write of a row is an atomic operation (no matter how many columns are read and written at a time). This design decision makes it easy for users to understand the behavior of the program when concurrent updates are performed on the same row.

2. Column family

Each column in the HBase table belongs to a column family. Column families are part of the Schema of the table (while columns are not) and must be defined before using the table. Column names are prefixed with column families. For example, courses:history and courses:math belong to the column family of courses.

Access control, disk and memory usage statistics are all carried out at the column family level. In practical applications, control permissions on column families can help us manage different types of applications, for example, some applications are allowed to add new basic data, some applications can read basic data and create inherited column families, and some applications are only allowed to browse data (and may not be able to browse all data for privacy reasons).

3. Time stamp

A storage unit identified by Row and Columns in HBase is called Cell. Each Cell holds multiple versions of the same data. The version is indexed by a timestamp, which is a 64-bit integer. The timestamp can be assigned by HBase (automatically when the data is written), where the timestamp is the current system time accurate to milliseconds. The timestamp can also be displayed and assigned by the customer. If the application wants to avoid data version conflicts, it must generate its own unique timestamps. In each Cell, different versions of the data are sorted in reverse chronological order, meaning that the latest data comes first.

In order to avoid the management burden (including storage and indexing) caused by too many versions of data, HBase provides two ways to recycle data versions. One is to save the last n versions of the data, and the other is to save the most recent version (for example, the last seven days). You can set it for each column family.

4 、 Cell

Cell is made up of {row key,column (=

< family>

+

< label>

), version} the only determined unit The data in Cell is typeless and is all stored in bytecode form.

Physical storage

The Table is split into multiple HRegion in the direction of the row, each HRegion scattered in a different RegionServer.

StoreFile is stored in HDFS in HFile format.

At this point, I believe that you have a deeper understanding of the "detailed introduction of hbase table structure". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report