In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
1 background knowledge 1.1 problem solving
Resolve the problem that HDFS does not support quick search and update of individual records.
1.2 Application
In a database with hundreds of millions of records, it is more appropriate to use RDBMS for only tens or millions of records
Make sure your application does not need to use the advanced features of RDBMS (second index, transaction mechanism, advanced query language, etc.)
With sufficient hardware configuration, that is, the number of nodes, HDFS does not perform well when there are less than 5 nodes, and the same is true for HBase.
2 Design concept 2.1 Overview 2.1.1 introduction
Distributed database of NoSQL type developed with Java language
Some advanced features of RDBMS are not supported, such as transaction mechanism, second index, advanced query language, etc.
Supports linear and modular extensions, and can linearly improve performance by adding RegionServer to business machines
2.1.2 HBase features:
Strong read-write consistency: suitable for high-speed count aggregation operations
Automatic data segmentation: distributed storage of data and automatic slicing as the data grows
RegionServer automatic failure backup
Integrate with HDFS
Support MapReduce to perform massively parallel operations
Provide Java Client API
Provide Thrift/REST API
Block caching and Bloom Fliter optimized for bulk queries
Visual management interface
2.1.3 disadvantages
The re-execution of WAL is slow.
Fault recovery is slow and complex
Main compression can cause Imax O storm (a large number of Imax O operations)
2.2 Design Architecture 2.2.1 Chinese explanation of basic concepts Table table consists of multiple lines.
The Row row consists of a Key and one or more columns
Column columns are composed of column families and column qualifiers: column qualifiers; columns between rows can differ greatly
The Column Family column family physically stores multiple columns; designed to improve performance; table creation requires topping the data in the contentColumn Qualifier column qualifier column family. The index table creation does not need to be specified, and the content:htmlCell unit can be added at any time consisting of rows, column families, column qualifiers, values, and timestamps that represent the version
The TimeStamp timestamp is used to indicate the version of the data that can be specified using the system time or by yourself.
2.2.1.2 example this example is taken from the official document Row KeyTime StampColumnFamily contentsColumnFamily anchorColumnFamily people "com.cnn.www" T9
Anchor:cnnsi.com = "CNN"
"com.cnn.www" T8
Anchor:my.look.ca = "CNN.com"
"com.cnn.www" t6contents:html = "
"com.cnn.www" t5contents:html = "…"
"com.cnn.www" t3contents:html = "
Com.example.wwwt5contents:html: "..."
People:author: "John Doe"
Description:
Table format is not the only and most accurate expression, but can also be expressed in Json format.
The blank cells in the table do not take up physical storage space, but exist conceptually
2.2.1.3 Operation the relationship between API attention and version GetTable.get returns the attribute of the specified row; if the first row of Scan does not specify a version, it returns the data with the largest version value (but may not be the latest) You can change the number of data returned by setting the value of MaxVersion. If ScanTable.scan returns multiple rows that meet the condition, if the update Key is not present, insert it; use the system time by default through Table.put (write cache) or Table.batch (no write cache); you can overwrite as long as key, column and version are the same; you can specify a version of DeleteTable.delete1 when inserting. Delete the specified column; 2. Delete all versions of the column; 3. Delete all columns of a specific column family. The deletion operation will not be performed immediately, but the tombstone will be tagged to the data, and the death data and tombstone will be cleared when the space is cleaned. 2. By means of hbase-site.xml. Set the TTL (time to live) by setting the hbase.hstore.time.to.purge.deletes property in the
Description:
The maximum and minimum values of the number of versions can be specified and affect the operation
Version (timestamp) is used to control the survival time of data, it is best not to set it manually
2.2.1.4 limitations
1) the Delete operation will affect the Put operation: the reason is that the Delete operation is not performed immediately, but the death data is tagged. If you perform an operation with a Delete version less than or equal to T, and then insert a Put version of T data, the data of the new Put will also be tagged, and all the tagged data will be cleared in the next clean-up work of the system. When you execute the query, you will not get the data for the new Put, which will not happen if you do not set the version manually and the version uses the system default time.
2) the clean-up work will affect the query: create three units with the version t _ 1 ~ T _ 2 ~ T _ 3 and set the maximum number of versions to 2. So when we query all versions, only T2 and T3 are returned. But when you delete versions T2 and T3, version T1 reappears. Obviously, once the important streamlining work is running, such behavior will not occur again.
See more information about the data model
2.2.2 Architecture 2.2.2.1 Architecture Featur
1) Master-Slave Architecture
2) there are three components:
Component name component main functions HMaster is responsible for Region allocation and DDL operation (creating and deleting tables) HRegionServerRegionServer is responsible for reading and writing data; communicating with clients ZooKeeper maintains cluster activity
3) the underlying storage is HDFS
2.2.2.2 component hbase:meta: information of all region 1) structure: Key
Format: ([table], [region start key], [region id])
Values
Info:regioninfo (serialize HRegionInfo instance)
Info:server (server: Port of the RegionServer containing this Region)
Info:serverstartcode (startup time of the RegionServer containing this Region)
2) Storage location: in ZooKeeper
HMaster: controller
Assign Region: allocation at startup, redistribution of Region on failed RegionServer, allocation when Region segmentation
Monitor all RegionServer in the cluster to achieve load balancing
DDL:Data Definition Language (creation, deletion, and update of tables-updates for column families)
Manage metadata for namespace and table
Rights Management (ACL)
Garbage file collection on HDFS
HRegionServer:HBase actual reader and writer
Respond to the read and write request from client and perform the I _ plink O operation (bypass HMaster directly)
Interact with HDFS to manage table data
Split the Region when the size of the Region reaches the threshold
This section can be explained in detail by referring to Region Server
ZooKeeper: coordinator
Ensure that one and only one HMaster in the cluster is Active
Store hbase:meta, that is, location information for all Region
Store metadata information for tables in HBase
Monitor the status of RegionServer and report the status of RS to HMaster
ZooKeeper cluster itself uses consistency protocol (PAXOS protocol) to ensure the consistency of the state of each node.
Region:Region is the basic unit of HBase data storage and management.
This section can be explained in detail by referring to Region
2.3 related process 2.3.1 first time read and write process
In this section, please refer to the first read and write process in the detailed explanation of Region Server.
2.3.2 Writing process
You can refer to the writing process in the detailed explanation of Region Server in this section.
2.3.2 Reading process
You can refer to the reading process in the detailed explanation of Region Server in this section.
2.4 related mechanism 2.4.1 Compaction mechanism (compression merge) 2.4.1.1 times compression
In this section, please refer to the secondary compression section in the detailed explanation of Region Server.
2.4.1.2 main compression
In this section, please refer to the main compression section in the detailed explanation of Region Server.
2.4.2 WAL Replay mechanism
You can refer to WAL Replay in the detailed explanation of Region Server in this section
Update 2.5.1. Meta table = > hbase:meta2.5.1.1-ROOT- and .meta
Before 0.96.x, there were two tables-ROOT- and .meta to maintain the metadata of region.
1) structure: Key
.meta. Region key (.meta., 1)
Values
Info:regioninfo (serialized instance of hbase:meta)
Info:server (server:port that stores the RegionServer of the hbase:meta)
Info:serverstartcode (startup time of the RegionServer that stores the hbase:meta)
2) the process of reading region location information
Read from ZooKeeper-HRegionServer where the ROOT- Table is located
Read the HRegionServer of .meta. Table from the HRegionServer based on the requested TableName,RowKey
Read the contents of .meta. Table from the HRegionServer to get the location of the HRegion that this request needs to access
Access the HRegionSever to get the requested data
2.5.1.2 hbase:meta
This section can refer to the hbase:meta in the 2.2.2.2 component and the first read and write process in the 2.3 related process for comparison.
2.5.1.3 purpose of the upgrad
1) before version 0.96.x, it was designed with reference to the BigTable of Goole. It takes four steps from the request to read the data to actually read the data. The purpose of Google to design BigTable is that it has a huge amount of data, and the multi-tier schema structure can store more Region, but with it comes the decline of access performance.
2) the amount of data in general companies is not as large as Google, so removing the-ROOT- table, leaving .meta (hbase:meta) table, and increasing the size of Region can not only meet the storage needs, but also improve the access performance.
2.5.2 HLog = > WAL
The WAL implementation in HBase prior to 0.94.x is called HLog and is stored in the / hbase/.logs/ directory
Change its name to WAL after 0.94.x and store it in the / hbase/WALs/ directory
2.6 links to other frameworks
To be continued.
2.7 performance tuning
To be continued.
2.8 Advanced Features
To be continued.
3 Project practice 3.1 getting started Guide 3.1.1 Environment Building
You can refer to the HBase deployment getting started Guide in this section.
3.1.2 getting started
This section can refer to the HBase Shell activity, HBase Java API activity, and using MapReduce to operate HBase
3.2 Technical difficulties
To be continued.
3.3 problems encountered in development
To be continued.
3.4 Application 3.4.1 OpenTSDB development
To be continued.
4 statement
The part to be continued will be updated from time to time. Please look forward to it.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.