Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Phoenix (sql on hbase

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

Introduction to Phoenix (sql on hbase)

Introduction:

Phoenix is a SQL skin over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular

JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance

On the order of milliseconds for small queries, or seconds for tens of millions of rows.

Deployment:

1:wget http://phoenix-bin.github.com/client/phoenix-2.2.1-install.tar, copy the jar package to HBASE_HOME/lib 2: execute psql.sh localhost.. / examples/web_stat.sql.. / examples/web_stat.csv.. / examples/web_stat_queries.sql, load the sample data 3:sqlline.sh localhost (zookeeper address) and enter the command line client

Related documentation:

Wiki home page (detailed documentation):

Https://github.com/forcedotcom/phoenix/wiki

Quick Start

Https://github.com/forcedotcom/phoenix/wiki/Phoenix-in-15-minutes-or-less

Recently Implemented Features

Https://github.com/forcedotcom/phoenix/wiki/Recently-Implemented-Features

Phoenix Performance vs Hive,Impala

Https://github.com/forcedotcom/phoenix/wiki/Performance#salting

Official real-time performance test results:

Http://phoenix-bin.github.io/client/performance/latest.htm

Syntax:

Http://forcedotcom.github.io/phoenix/index.html

Secondary index correlation (Phoenix API is required for the use of the index):

The use of secondary indexes (when multiple columns) requires the following configuration to be added to the hbase-site.xml

Hbase.regionserver.wal.codec org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec

Example of creating an index:

Create table usertable (id varchar primary key, firstname varchar, lastname varchar); create index idx_name on usertable (firstname) include (lastname). You can check whether the current SQL statement index is valid by the following methods: explain select id, firstname, lastname from usertable where firstname = 'foo'

Description of explain:

RANGE SCAN means that only a subset of the rows in your table will be scanned over. This occurs if you use one or more leading columns from your primary key constraint. Query that is not filtering on leading competes with columns ex. Select * from test where pk2='x'

And pk3='y'; will result in full scan whereas the following query will result in range scan select * from test where pk1='x' and pk2='y';. Note that you can add a secondary index on your "pk2" and "pk3" columns and that would cause a range scan to be done

For the first query (over the index table). DEGENERATE SCAN means that a query can't possibly return any rows. If we can determine that at compile time, then we don't bother to even run the scan. FULL SCAN means that all rows of the table will be scanned over (potentially with a filter applied if you have a WHERE clause) SKIP SCAN means that either a subset or all rows in your table will be scanned over, however it will skip large groups of rows depending on the conditions in your filter. See this blog for more detail. We don't do a SKIP SCAN if you have no filter on the

Leading primary key columns, but you can force a SKIP SCAN by using the / * + SKIP_SCAN * / hint. Under some conditions, namely when the cardinality of your leading primary key columns is low, it will be more efficient than a FULL SCAN.

Introduction to the use of index:

Primary key index: the primary key index is referenced in the order in which it was created. Such as primary key (id,name,add), then implicitly create (id), (id,name), (id,name,add) three indexes, if you use these three conditions in where, you will use the index, other combinations cannot use the index (FULL SCAN).

Secondary index: in addition to being referenced in the order in which it was created, the index cannot be used if the columns of the query are not all in the index or override index. For example: DDL:create table usertable (id varchar primary key, firstname varchar, lastname varchar); create index idx_name on usertable (firstname); DML:select id, firstname, lastname from usertable where firstname = 'foo'; this query will not use the index because lastname is no longer in the index. The query statement cannot use the index until DDL:create idx_name on usertable (firstname) include (lastname) is executed. Legacy question: what is the specific difference between include and on in Phoenix?

If the primary key index and secondary index exist at the same time in the query condition, Phoenix will choose the best index by itself.

Mapping experiment between SQL structure and Hbase structure of Phoenix

> > create table user3table (id varchar, firstname varchar, lastname varchar CONSTRAINT compete for PRIMARY KEY (id,firstname)) >! describe user3table +- -+ | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | DATA_TYPE | TYPE_NAME | COLUMN_SIZE | BUFFER_LENGTH | DECIMAL_DIGITS | NUM_PREC_RADIX | NULL | +- +-+ | null | null | USER3TABLE | ID | 12 | VARCHAR | null | null | null | null | 1 | | null | null | USER3TABLE | FIRSTNAME | 12 | VARCHAR | null | null | null | null | 1 | | _ 0 | null | USER3TABLE | LASTNAME | 12 | VARCHAR | null | null | null | null | 1 | +-- -+ > >! index user3table +- -+ | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | NON_UNIQUE | INDEX_QUALIFIER | INDEX_NAME | TYPE | ORDINAL_POSITION | COLUMN_NAME | ASC_OR_DESC | CARDINALIT | +- -+- -+- -- +-> > select * from user3table +-+ | ID | FIRSTNAME | LASTNAME | +-+ | hup | zhan | feng | +-+- +-+ > > hbase > > scan 'USER3TABLE' ROW COLUMN+CELL hup\ x00zhan column=_0:LASTNAME Timestamp=1387875092585, value=feng hup\ x00zhan column=_0:_0, timestamp=1387875092585, value= 1 row (s) in 0.0110 seconds > > create index idx_test on user3table (firstname) include (lastname) > >! index user3table +- -+ | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | NON_UNIQUE | INDEX_QUALIFIER | INDEX_NAME | TYPE | ORDINAL_POSITION | COLUMN_NAME | ASC_OR_DESC | CARDINALIT | +- -+-+ | null | null | USER3TABLE | true | null | IDX_TEST | | 3 | 1 |: FIRSTNAME | A | null | | null | null | USER3TABLE | true | null | IDX_TEST | 3 | 2 |: ID | A | null | | null | null | USER3TABLE | true | null | IDX_TEST | 3 | _ 0:LASTNAME | null | null | +- -+ > > select * from user3table +-+ | ID | FIRSTNAME | LASTNAME | +-+ | hup | zhan | feng | +-+- +-+ > > hbase > > scan 'USER3TABLE' ROW COLUMN+CELL hup\ x00zhan column=_0:LASTNAME Timestamp=1387875092585, value=feng hup\ x00zhan column=_0:_0, timestamp=1387875092585, value= 1 row (s) in 0.0110 seconds in addition: 1:NASalesforce.com\ x00Login\ X00\ X00\ X00 column=STATS:ACTIVE_VISITOR, timestamp=1387867968156, value=\ x80\ X00\ x1A "\ x17\ xFE0 2:NASalesforce.com\ x00Login\ X00\ X00 column=USAGE:CORE, timestamp=1387867968156 when there are multiple non-primary key columns in the table Value=\ x80\ X00\ xC9\ x01C%\ X17\ xFE0 3:NASalesforce.com\ x00Login\ X00\ X00 column=USAGE:DB, timestamp=1387867968156, value=\ x80\ x00\ X00\ X00\ X02\ x84\ x01C%\ X17\ xFE0 4:NASalesforce.com\ x00Login\ X00\ X00 column=USAGE:_0, timestamp=1387867968156, value=\ x01C%\ x17\ xFE0

Conclusion:

1:Phoenix adds columns defined as "CONSTRAINT competitive PRIMARY KEY (id,firstname)" to the Hbase primary key (split with\ x00), while merging the columns involved in the federated primary key with a column named "_ 0" by default. Its value is empty. Other columns are stored in columns of the same name in Hbase

2:Phoenix maintains a system table (SYSTEM TABLE) in Hbase to store scheme metadata for related Phoenix tables.

3: creating a secondary index (create index) operation does not affect the table structure

4: if the column family is not specified when the table is created, the column family is named in the form of _ 0, _ 1

5: if there are multiple columns, the value value obtained through the HBase API is not directly available (the normal value can only be obtained through the Phoenix API)

Dynamic scheme correlation

1: column modification is supported

Example: ALTER TABLE my_schema.my_table ADD d.dept_id char (10) VERSIONS=10 ALTER TABLE my_table ADD dept_name char (50) ALTER TABLE my_table ADD parent_id char (15) null primary key ALTER TABLE my_table DROP COLUMN d.dept_id ALTER TABLE my_table DROP COLUMN dept_name ALTER TABLE my_table DROP COLUMN parent_id ALTER TABLE my_table SET IMMUTABLE_ROWS=true

2: secondary index modification is supported.

Example: CREATE INDEX my_idx ON sales.opportunity (last_updated_date DESC) CREATE INDEX my_idx ON log.event (created_date DESC) INCLUDE (name, payload) SALT_BUCKETS=10 CREATE INDEX IF NOT EXISTS my_comp_idx ON server_metrics (gc_time DESC, created_date DESC) DATA_BLOCK_ENCODING='NONE',VERSIONS=?,MAX_FILESIZE=2000000 split on (? ALTER INDEX my_idx ON sales.opportunity DISABLE ALTER INDEX IF EXISTS my_idx ON server_metrics REBUILD DROP INDEX my_idx ON sales.opportunity DROP INDEX IF EXISTS my_idx ON server_metrics

3: it should not be supported to modify the primary key index (no relevant information was found. It is also difficult to support in theory, because the primary key index is the value of rowkey. )

Java client sample code (programming directly to JDBC interface):

Import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.PreparedStatement; import java.sql.Statement; public class test {public static void main (String [] args) throws SQLException {Statement stmt = null; ResultSet rset = null; Connection con = DriverManager.getConnection ("jdbc:phoenix:zookeeper"); stmt = con.createStatement (); stmt.executeUpdate ("create table test (mykey integer not null primary key, mycolumn varchar)") Stmt.executeUpdate ("upsert into test values"); stmt.executeUpdate ("upsert into test values"); con.commit (); PreparedStatement statement = con.prepareStatement ("select * from test"); rset = statement.executeQuery (); while (rset.next ()) {System.out.println (rset.getString (mycolumn));} statement.close (); con.close ();}}

Single node test:

Table creation: CREATE TABLE IF NOT EXISTS $table (HOST CHAR (2) NOT NULL,DOMAIN VARCHAR NOT NULL, FEATURE VARCHAR NOT NULL,DATE DATE NOT NULL,USAGE.CORE BIGINT,USAGE.DB BIGINT,STATS.ACTIVE_VISITOR INTEGER CONSTRAINT competes for PRIMARY KEY (HOST, DOMAIN, FEATURE, DATE)) SPLIT ON ('CSGoogle','CSSalesforce','EUApple','EUGoogle','EUSalesforce','NAApple','NAGoogle','NASalesforce') The performance_10000000 data table contains 1000W items as follows: +-+ | HOST | DOMAIN | FEATURE | DATE | CORE | | DB | ACTIVE_VISITOR | +-+ | CS | Apple.com | Dashboard | 2013-12-23 | 363 | | | 8390 | +-+ Query # 1-Count-SELECT COUNT (1) FROM performance_10000000 | COUNT (1)-10000000 Time: 66.044 sec (s) Query # 2-Group By First Competition-SELECT HOST FROM performance_10000000 GROUP BY HOST; HOST-CS EU NA Time: 51.43 sec (s) Query # 3-Group By Second Competition-SELECT DOMAIN FROM performance_10000000 GROUP BY DOMAIN DOMAIN-Apple.com Google.com Salesforce.com Time: 46.908 sec (s) Query # 4-Truncate + Group By-SELECT TRUNC (DATE,'DAY') DAY FROM performance_10000000 GROUP BY TRUNC (DATE,'DAY'); DAY-2013-12-23 00:00:00 2013-12-24 00:00:00 2013-12-25 00:00:00. Time: 48.132 sec (s) Query # 5-Filter + Count-SELECT COUNT (1) FROM performance_10000000 WHERE CORE

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report