In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
1. Introduction
HFile (HBaseFile) is an abstraction of a file storage format used by HBase
There are currently two versions of HFile:HFileV1 and HFileV2
Previous versions of HBase0.92 only supported HFileV1
HBase0.92/0.94 supports both HFileV1 and HFileV2.
The following are the structure diagrams of HFileV1/V2:
HFileV1
HFileV2
The Datablock in the figure is where the application data is actually stored.
Each data block consists of a series of KeyValue, and these KeyValue are arranged in ascending order of Key.
This article will explain what KeyValue is and what algorithms can reduce duplication when there are more and more KeyValue.
First of all, let's look at an example:
Suppose you need to store the basic information of the user and the information about the open source project you are participating in in HBase:
Java code
Open source projects in which users' basic information is involved
-
User Id Professional gender tomcathbase
-
Zhh3009 programmer male mention patch soy sauce and patch soy sauce
User Id Professional gender tomcatant
-
Founder of jdd1999 Code God male founder
-
Table 1.1
Open source project in which users' basic information participates-user Id professional gender tomcat hbase- -- zhh3009 programmer male mention patch soy sauce and patch soy sauce user Id professional gender tomcat ant- -founder of jdd1999 Code God Man-Table 1.1
From this example, the basic information of the user is easy to determine, but the participation in the open source project is uncertain and the role played in the open source project is uncertain.
So it's not easy to build a table with a relational database, because you don't know how many columns there are, and you can't group related columns.
1.1 column family
HBase is a column-based database, and related columns can be grouped into a column family (ColumnFamily)
You don't need to know in advance which columns are in each column family, and you can add them when needed, such as adding columns such as email for zhh3009 in the user's basic information.
In the above example, "user basic Information" and "participating Open Source Project" can be used as two column families.
Different column families usually correspond to a directory within HBase, so that different column values are only placed under the column family directory to which it belongs.
1.2rowKey
We hope to extract the information from the same column family or multiple column families by querying a column. User Id is such a column.
For example, when we want to query the mailbox of zhh3009 and the open source projects we participate in, we will not find the information of jdd1999 according to the user Id.
It is called rowKey in HBase.
How does HBase store the information in the example above?
Extract the user Id column as rowKey, and flatten the above information in the following format:
Java Code column value >-programmer > male > mention patch play Soy Sauce > mention patch play Soy Sauce > Code God > male > founder > founder >-- Table 1.2 column values >-- Programmer > male > mention patch > mention patch > Code God > male > founder > founder >-Table 1.2
Each row in Table 1.2corresponds to a KeyValue in HBase
"= >" on the left is the "Key" in KeyValue, and "= >" on the right corresponds to "Value" in KeyValue.
Of course, this is just a simplified format of KeyValue, the internal format is not that simple, let's take a look at what the real KeyValue looks like.
2.KeyValue internal format
The KeyValue internal format can be divided into three parts: header, Key, and Value, as shown in Table 2.1
Java code
Description of number of bytes in the name
KeyLength 4 represents the total number of bytes occupied by Key
ValueLength4 represents the total number of bytes occupied by Value
RowKeyLength2 represents the number of bytes occupied by rowKey
RowKeyrowKeyLengthrowKey
ColumnFamilyLength1 represents the number of bytes occupied by the column family name
ColumnFamilycolumnFamilyLength column Family name
ColumnNamecolumnNameLength column name
Timestamp8 timestamp
Type1Key type, such as add (Put) or delete (Delete)
ValuevalueLength column value
Table 2.1
Description of the number of bytes in the name-keyLength 4 indicates the total number of bytes occupied by Key ValueLength 4 represents the total number of bytes occupied by Value rowKeyLength 2 represents the number of bytes occupied by rowKey 1 represents the number of bytes occupied by the column family name columnFamily columnFamilyLength column family name columnName columnNameLength column name timestamp 8 timestamp type 1 Key type For example, whether to add (Put) or delete (Delete) value valueLength column values-Table 2.1
KeyLength and valueLength form the head.
The seven items from rowKeyLength to type make up Key, and the last item value represents the third part: Value
It's worth noting that there is a columnFamilyLength in front of columnFamily.
But before columnName, there was no columnNameLength, which is not necessary to save space.
When parsing a KeyValue, the end position of the columnName in this KeyValue can be determined by keyLength-8 (timestamp)-1 (type).
Generate two KeyValue from the first two rows of Table 1.2in the format of Table 2.1:
Representative of KeyValueA: programmer >
Representative of KeyValueB: male >
Java Code name Bytes KeyValue A KeyValue B--keyLength 4 35 35valueLength 4 4 2rowKeyLength 2 7 7rowKey rowKeyLength zhh3009 zhh3009columnFamilyLength 1 12 12columnFamily columnFamilyLength user basic information user basic information columnName columnNameLength professional gender timestamp 8 1329663787364 1329663787364type 1 4 (Put) 4 (Put) value valueLength programmer male-Table 2.2 name bytes KeyValue A KeyValue B----keyLength 4 35 35valueLength 4 4 2rowKeyLength377
RowKeyrowKeyLengthzhh3009zhh3009
ColumnFamilyLength21212
ColumnFamilycolumnFamilyLength user E
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.