Five minutes for easy understanding of Hbase column-oriented storage 07/04 Update SLTechnology News&Howtos

Five minutes for easy understanding of Hbase column-oriented storage

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Row storage

Traditional databases are relational and stored by row. As shown below:

Among them, only Zhang San filled up a row of data, and none of Li Siwang's five Zhao Liu's rows were filled. Because the row structure here is fixed, every line is the same, even if you don't use it, you have to go there empty, and you can't do without it. Let's take a visual picture:

Whether you sit or not, the seat is there, never give up.

Column storage

In order to distinguish it from the traditional, the new database is called non-relational database and is stored by column. As shown below:

I'm a little confused when I look at column storage at first. Here's the conversion between row storage and column storage:

The original column (cell) data of Zhang San corresponds to a row of data of Zhang San now. The original six columns of Zhang San's data has become the current six rows.

The original six columns of data are in one row, so they share a primary key (that is, Zhang San). Now there are six rows, and each row needs a primary key (otherwise I don't know whose the data belongs to), so the original primary key (that is, Zhang San) is repeated six times. As shown below:

As the original column has become the current row, add a row if necessary, and do not add if there is no need, which will not cause a waste of space. Let's take a visual picture:

(the interior of the ferry bus is a large flat plate)

If you want to stand, I'll give you space. If you don't stand, you won't stand. Give me space.

Row and column comparison

① row storage tends to have a fixed structure, while column storage tends to weaken its structure.

(line storage is equivalent to a set meal, even if one person comes, it will give you eight dishes and one soup, resulting in waste; row storage is equivalent to a buffet, which is taken on demand and will not be wasted if there are fewer people.)

② requires only one primary key to store a row of data, and multiple primary keys to store a row of data in columns.

③ row storage stores all business data. Column storage not only stores business data, but also stores column names.

④ row storage is more like a Java Bean, all fields are defined in advance and cannot be changed; column storage is more like a Map, do not define in advance, add key/value to it at will.

Official introduction

Apache Hbase is a Hadoop database, a distributed, extensible, big data storage.

Use Hbase when you need to read and write big data randomly in real time. Its goal is to manage super tables-billions of rows, X millions of columns.

Hbase is an open source, distributed, with version, non-relational database that mimics Google's BigTable. BigTable uses Google File System for distributed data storage, just as Hbase uses HDFS.

Hbase world

Although Hbase weakens the structure, it does not mean laissez-faire. Traditional relational databases have strictly determined the table structure (that is, the data types of all columns and columns) before inserting data.

Hbase's table also needs to be determined before putting in the data, and that is Column Family. The word Family means family, so the family is the family of the list. Then the column is naturally a family member, usually there are multiple family members, so a column family contains multiple columns.

Members of a family are related by blood, so there is usually a relationship between multiple columns of a family, such as similarity or the same category. So column families can be regarded as some kind of classification (classification).

A very common example, when going to the interview, the receptionist MM will usually ask to fill out a form, usually a lot of information, and each company is different. But it can be roughly divided into three categories: personnel basic information, educational experience information, work experience information, these three categories are actually equivalent to three groups. As shown below:

There is specific information in each category, such as name, phone number, date of birth, etc., which are equivalent to identifiers (variable names), called Column Qualifier (column modifiers) in Hbase. Column modifiers are located in the column family to identify pieces of data. As shown below:

In Hbase, a column family (Column Family) and a column modifier (Column Qualifier) are combined to call a column (Column). The column family is separated by a colon (:), and the column family: column modifier, as shown below:

The unique identifier of each row in a traditional database is called the primary key, and in Hbase it is called row key (row key). As shown below:

Data is stamped with a timestamp when it enters Hbase. This timestamp can be used as a version number.

In T1 time, I saved a person's basic information, then found that the name was wrong, and updated the name at T2 time. Instead of updating the original data, I inserted a new data and stamped it with a new timestamp.

At this time, the query gets the new data, which seems to be updated, but only the latest version of the data is returned by default. As shown below:

A combination of row keys, column families, column modifiers, data, and timestamps is called a Cell. The row keys, column families, column modifiers, and timestamps here can actually be thought of as positioning attributes (similar coordinates), finally determining a data. The row in the following figure is equal to a cell in Hbase:

A row key and one or more columns (including data) are combined as a Row. All 1001 of the data in the following figure is equivalent to one row in Hbase, and 1002 is equivalent to another row:

In Hbase, as long as the column family is determined (regardless of the specific column), the table (Table) is determined. As shown below:

Official documentation reminds you that using the concept of tables / rows / columns in traditional databases in Hbase is not a helpful analogy. Instead, think of Hbase's table as a multi-dimensional (two-dimensional) map (Map with Map). The column family is the first dimension and the column modifier is the second dimension.

Explanation: any nuance will be infinitely magnified in large quantities, so the shorter names of column families and column modifiers can save considerable space.

Description: from the strict definition of column storage, Hbase does not belong to column storage, some people call it column-oriented storage, please pay attention to this point.

(end)

A new theory of programming

Talking about technology from a unique perspective

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.