Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Quick understanding of HBase and BigTable

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

People with experience in departmental databases, such as me, tend to encounter obstacles in their understanding of data structures when they first come into contact with databases like HBase. Will unconsciously map the concepts of rows and columns of HBase to rows and columns of relational database. In order to speed up the understanding of some of the concepts of HBase, the article "Understanding HBase and BigTable" (recommended for reading in the official HBase documentation) was translated.

The most difficult thing to learn about Hbase (an open source implementation of Google BigTable) is to understand its actual concepts.

Unfortunately, these two great systems include the words table and base in their concepts, which often leads some people (like me) to confuse them with relational database stuff.

This paper aims to describe these distributed data storage systems from a conceptual point of view. After reading, you should be able to better judge when to use Hbase and when to better use "traditional" databases.

It's all in the terminology.

Fortunately, Google's BigTable paper clearly explains what BigTable is. This is the first sentence of the "data Model" section:

Note: please remember every word in the above sentence

The BigTable paper continues to explain

The HbaseArchitecture page of Hadoop wiki assumes:

Although all of this seems quite mysterious, once you break it down into words, it becomes easy to be clear. I like to discuss them in this order: map, persistent, distributed, sorted, multidimensional, and sparse.

Instead of trying to depict a complete system at once, I found it easier to understand HBase by building a piecemeal framework in my mind.

Map

The core of Hbase / BigTable is map, and depending on your programming language background, you may be more familiar with these terms, array (PHP), dictionary (Python), Hash (Ruby), or Object (JavaScript).

Wikipedia articles show that Map is "an abstract data type consisting of a set of keys and a set of values, where each key is associated with a value."

Use JSON to describe an example of a simple Map, where all values are just strings:

Persistence

Persistence simply means that the data you put into this particular Map "will be persisted" after the program that creates or accesses it. This is conceptually no different from any other type of persistent storage, such as files on a file system.

Order

Unlike most Map implementations, key / value pairs are saved in strict alphabetical order in Hbase / BigTable. That is, the row with the key "aaaaa" should be next to the line with the key "aaaab" and far away from the row with the key "zzzzz".

To continue our JSON example, the ordered version is as follows:

Because these systems tend to be very large and distributed, this orderly nature is very important. The spatial proximity of rows with similar keys ensures that when you have to scan the table, the items you are most interested in are close to each other.

The convention for selecting row keys is important. For example, consider a table whose key is a domain name. It makes most sense to list them in reverse notation (so "com.jimbojw.www" is better than "www.jimbojw.com"), so that the rows of the child domain (on storage) will be close to the parent domain row.

It is worth noting that in Hbase / BigTable, the term "ordered" does not mean that "values" has been sorted. There is no automatic indexing other than keys, as in a normal Map implementation.

Multi-dimensional

So far, we haven't mentioned any concept of "columns" and treat "table" as a regular hash / map (map) in the concept. It was intentional. The word "column" is another loaded word, such as "table" and "base", which inherits the emotional baggage of years of experience in relational databases.

Instead, I find it easier to think about it as a multidimensional Map-you can use nested Map if you like. Add a dimension in the previous JSON example:

In the above example, you will now notice that each key points to a Map with two keys: "A" and "B". From here on, we call the top-level key / mapping (key/map) "rows". In addition, in the BigTable / Hbase nomenclature, the "A" and "B" mappings (mappings) will be called "column families".

The column family of the table is specified when you create the table, which is difficult or impossible to modify later. Adding new column families is also expensive, so it's a good idea to specify all the column families you need from the start.

Fortunately, a column family can have any number of columns, represented by a column "qualifier" or "label". The following is a subset of our JSON example with a built-in column qualifier dimension (qualifier dimension):

Notice that in the two rows displayed, the "A" column family has two columns: "foo" and "bar", and the "B" column family has only one column, with an empty string ("") as the qualifier.

When asking Hbase / BigTable for data, you must provide the full column name in the form of ":". So, for example, the two rows in the above example have three columns: "A:foo", "A:bar", and "B:".

Note that although the column family is static, the column itself is not. Consider this extended line:

In this case, the "zzzzz" row has only one column, "A:catch_phrase". Because each row may contain any number of different columns, there is no built-in method to query data (list) for all columns in all rows. To get this information, you must perform a full table scan. However, you can query data for all column families because they are immutable (more or less).

The last dimension in Hbase / BigTable is time. All data is versioned using an integer timestamp (seconds since the epoch) or another integer of your choice. The client can specify a timestamp when inserting data.

Take a look at an example of using any integer timestamp:

Each column family may have its own rules that determine the number of versions of a given cell to be retained (the cell is identified by its rowkey / column key-value pair) in most cases, the application will only ask for the data of the given cell without specifying a timestamp. In this common case, Hbase / BigTable returns the latest version (the version with the highest timestamp) because it stores the version data in reverse chronological order.

If the application specifies a timestamp, Hbase returns unit data with a timestamp less than or equal to the timestamp provided.

Using our imaginary Hbase table, the row / column (row/column) of the query "aaaaa" / "A:foo" will return "y", while the row / column / timestamp of the query "aaaaa" / "A:foo" / 10 will return "M". Querying the row / column / timestamp of "aaaaa" / "A:foo" / 2 will return an empty result.

Sparse

The last keyword is sparse. As mentioned earlier, a given row can contain any number of columns in each column family, or none at all. Another type of sparsity is row-based gaps (row-based gaps), which simply means that there may be gaps between keys (key).

If you are already thinking about Hbase / BigTable in this article's map-based-based terms, rather than in terms of similar concepts in relational databases, the purpose of this article is achieved.

That's it (And that's about it)

Well, I hope this helps you conceptually understand the meaning of the Hbase data model.

As always, I look forward to your thoughts, opinions and suggestions.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report