Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Complete solution of HBase HFile and Prefix Compression-KeyValue format

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

1. Introduction

HFile (HBaseFile) is an abstraction of a file storage format used by HBase

There are currently two versions of HFile:HFileV1 and HFileV2

Previous versions of HBase0.92 only supported HFileV1

HBase0.92/0.94 supports both HFileV1 and HFileV2.

The following are the structure diagrams of HFileV1/V2:

HFileV1

HFileV2

The Datablock in the figure is where the application data is actually stored.

Each data block consists of a series of KeyValue, and these KeyValue are arranged in ascending order of Key.

This article will explain what KeyValue is and what algorithms can reduce duplication when there are more and more KeyValue.

First of all, let's look at an example:

Suppose you need to store the basic information of the user and the information about the open source project you are participating in in HBase:

Java code

Open source projects in which users' basic information is involved

-

User Id Professional gender tomcathbase

-

Zhh3009 programmer male mention patch soy sauce and patch soy sauce

User Id Professional gender tomcatant

-

Founder of jdd1999 Code God male founder

-

Table 1.1

Open source project in which users' basic information participates-user Id professional gender tomcat hbase- -- zhh3009 programmer male mention patch soy sauce and patch soy sauce user Id professional gender tomcat ant- -founder of jdd1999 Code God Man-Table 1.1

From this example, the basic information of the user is easy to determine, but the participation in the open source project is uncertain and the role played in the open source project is uncertain.

So it's not easy to build a table with a relational database, because you don't know how many columns there are, and you can't group related columns.

1.1 column family

HBase is a column-based database, and related columns can be grouped into a column family (ColumnFamily)

You don't need to know in advance which columns are in each column family, and you can add them when needed, such as adding columns such as email for zhh3009 in the user's basic information.

In the above example, "user basic Information" and "participating Open Source Project" can be used as two column families.

Different column families usually correspond to a directory within HBase, so that different column values are only placed under the column family directory to which it belongs.

1.2rowKey

We hope to extract the information from the same column family or multiple column families by querying a column. User Id is such a column.

For example, when we want to query the mailbox of zhh3009 and the open source projects we participate in, we will not find the information of jdd1999 according to the user Id.

It is called rowKey in HBase.

How does HBase store the information in the example above?

Extract the user Id column as rowKey, and flatten the above information in the following format:

Java Code column value >-programmer > male > mention patch play Soy Sauce > mention patch play Soy Sauce > Code God > male > founder > founder >-- Table 1.2 column values >-- Programmer > male > mention patch > mention patch > Code God > male > founder > founder >-Table 1.2

Each row in Table 1.2corresponds to a KeyValue in HBase

"= >" on the left is the "Key" in KeyValue, and "= >" on the right corresponds to "Value" in KeyValue.

Of course, this is just a simplified format of KeyValue, the internal format is not that simple, let's take a look at what the real KeyValue looks like.

2.KeyValue internal format

The KeyValue internal format can be divided into three parts: header, Key, and Value, as shown in Table 2.1

Java code

Description of number of bytes in the name

KeyLength 4 represents the total number of bytes occupied by Key

ValueLength4 represents the total number of bytes occupied by Value

RowKeyLength2 represents the number of bytes occupied by rowKey

RowKeyrowKeyLengthrowKey

ColumnFamilyLength1 represents the number of bytes occupied by the column family name

ColumnFamilycolumnFamilyLength column Family name

ColumnNamecolumnNameLength column name

Timestamp8 timestamp

Type1Key type, such as add (Put) or delete (Delete)

ValuevalueLength column value

Table 2.1

Description of the number of bytes in the name-keyLength 4 indicates the total number of bytes occupied by Key ValueLength 4 represents the total number of bytes occupied by Value rowKeyLength 2 represents the number of bytes occupied by rowKey 1 represents the number of bytes occupied by the column family name columnFamily columnFamilyLength column family name columnName columnNameLength column name timestamp 8 timestamp type 1 Key type For example, whether to add (Put) or delete (Delete) value valueLength column values-Table 2.1

KeyLength and valueLength form the head.

The seven items from rowKeyLength to type make up Key, and the last item value represents the third part: Value

It's worth noting that there is a columnFamilyLength in front of columnFamily.

But before columnName, there was no columnNameLength, which is not necessary to save space.

When parsing a KeyValue, the end position of the columnName in this KeyValue can be determined by keyLength-8 (timestamp)-1 (type).

Generate two KeyValue from the first two rows of Table 1.2in the format of Table 2.1:

Representative of KeyValueA: programmer >

Representative of KeyValueB: male >

Java Code name Bytes KeyValue A KeyValue B--keyLength 4 35 35valueLength 4 4 2rowKeyLength 2 7 7rowKey rowKeyLength zhh3009 zhh3009columnFamilyLength 1 12 12columnFamily columnFamilyLength user basic information user basic information columnName columnNameLength professional gender timestamp 8 1329663787364 1329663787364type 1 4 (Put) 4 (Put) value valueLength programmer male-Table 2.2 name bytes KeyValue A KeyValue B----keyLength 4 35 35valueLength 4 4 2rowKeyLength377

RowKeyrowKeyLengthzhh3009zhh3009

ColumnFamilyLength21212

ColumnFamilycolumnFamilyLength user E

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report