Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The fundamentals of HBase introduction to spiritual practice

2025-03-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

The fundamentals of HBase introduction to spiritual practice

HBase is a distributed, column-oriented open source database. This technology comes from the Google paper "Bigtable: a distributed Storage system for structured data" written by Fay Chang. Just as Bigtable takes advantage of the distributed data storage provided by the Google File system (File System), HBase provides Bigtable-like capabilities on top of Hadoop. HBase is a subproject of Apache's Hadoop project. Different from the general relational database, HBase is a database suitable for unstructured data storage. Another difference is that HBase is column-based rather than row-based.

1. The process of obtaining the results of query conditions by Hbase

1. Two special tables:-ROOT- & .meta.

Meta. Record the Region information of the user table, and at the same time,. META. You can also have multiple region

-ROOT- record. Meta. Region information for the table, but-ROOT- has only one region

The location of the-ROOT- table is recorded in Zookeeper

2. The process for the client to access data:

Client-> Zookeeper->-ROOT--> .meta.-> user data sheet

3. Multiple network operations, but there is cache cache on client.

II. Hbase architecture

Component description

1 、 Client:

Use HBase RPC mechanism to communicate with HMaster and HRegionServer

Client communicates with HMaster for management operations

Client and HRegionServer perform data read and write operations

2 、 Zookeeper:

Zookeeper Quorum storage-ROOT- table address, HMaster address

HRegionServer registers itself with Zookeeper as Ephedral, and HMaster is aware of the health status of each HRegionServer at any time.

Zookeeper avoids HMaster single Point problem

3 、 HMaster:

There is no single point problem with HMaster. Multiple HMaster can be started in HBase. A Master is always running through Zookeeper's Master Election mechanism, which is mainly responsible for the management of Table and Region:

3.1 manage users' operations of adding, deleting, changing and querying tables

3.2 manage HRegionServer load balance and adjust Region distribution

Responsible for the distribution of new Region after 3. 3 Region Split

3.4 responsible for Region migration on failed HRegionServer after downtime of HRegionServer

4 、 HRegionServer:

The core module in HBase is mainly responsible for reading and writing data to the HDFS file system in response to the user's request.

Any new technology is not a lifesaver, a wipe of the treasure chest immediately cured, not the use of Spring or NOSQL products are amazing + colorful, if that is basically bullshit. In the same type of product, no matter what kind of technology is going to achieve the same goal, you may avoid the current problems you need to face through new technologies, but then new problems come again. Maybe in retrospect, it would be better to use your brains on the original basis and think of ways to make some improvements to get a higher return.

Traditional databases store data in blocks. To put it simply, the more table fields you have, the more data space you occupy, so the query may have to cross data blocks, which will slow down the query. It is entirely possible that there are hundreds of fields in a table in a large system, and there are hundreds of millions of data in the table. Therefore, it will bring the bottleneck of database query. We all know that the number of table records in a common sense database has a great impact on the performance of the query, at this time you are likely to think of sub-table, sub-database approach to reload the pressure of database operations, then it will bring new problems, such as: distributed transactions, globally unique ID generation, cross-database query, etc., will still make you face thorny problems. If you break this row-based storage pattern and adopt a column-based storage mode, this situation is likely to improve for large-scale data scenarios. Because the selection rules in the query are defined by columns, the entire database is automatically indexed. The data aggregation storage of each field is stored by column, which can be increased dynamically, and if the column is empty, the data is not stored, which saves storage space. The data of each field is stored according to aggregation, which can greatly reduce the amount of data read, which is more direct when querying. There is no need to consider that sub-database and sub-table Hbase will automatically split the stored data and support high concurrent read and write operations, which makes massive data storage more scalable automatically. HashMap in Java is the structure of Key/Value, and you can also think of the data structure of HBase as a system of Key/Value. In other words, the area of HBase is defined by table name and row. Each "column family" in the HBase area is managed by an object named HStore. Each HStore consists of one or more MapFiles (a file type in Hadoop). The concept of MapFiles is similar to Google's SSTable. There are two main concepts in Hbase, Row key and Column Family, followed by Cell qualifier and Timestamp tuple,Column family, which we often call "column families". Access control, disk and memory usage statistics are all carried out at the column family level. The column family Column family is a predefined data model, and each Column Family can have multiple column according to the qualifier. In HBase, each cell storage unit has multiple versions of the same data, distinguishing the differences between each version according to a unique time stamp, with the latest data version at the top.

The above are the basic knowledge points that Hbase needs to master.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report