In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
The second part is advanced (Advanced) 1. Adapter (Adapters) 1.1 Schema adapters
A schema adapter allows Calcite to read specific types of data and display that data in a table format, schema.
Cassandra adapter (calcite-cassandra)
CSV adapter (example/csv)
Druid adapter (calcite-druid)
Elasticsearch adapter (calcite-elasticsearch3 and calcite-elasticsearch6)
File adapter (calcite-file)
JDBC adapter (part of calcite-core)
MongoDB adapter (calcite-mongodb)
OS adapter (calcite-os)
Pig adapter (calcite-pig)
Solr cloud adapter (solr-sql)
Spark adapter (calcite-spark)
Splunk adapter (calcite-splunk)
Eclipse Memory Analyzer (MAT) adapter (mat-calcite-plugin)
1.3 other language interfaces
Piglet (calcite-piglet) runs queries in a subset of Pig Latin
1.4 engine
Many projects and products use Apache Calcite for SQL parsing, query optimization, data virtualization / federation, and materialized view rewriting. Some of them are on the powered by Calcite page.
1.5 drive
The driver allows you to connect to Calcite from your application.
JDBC Driver (Java Doc)
The JDBC driver is supported by Avatica. The connection can be local or remote (JSON on HTTP or Protobuf on HTTP).
JDBC connection string parameter
Property describes whether the approximateDecimal can accept the approximate result of the aggregate function on the DECIMAL type, whether the approximateDistinctCount can accept the aggregate function COUNT (DISTINCT...) Whether the approximate result approximateTopN can accept the case-sensitive caseSensitive identifier of the approximate result of the "TopN" query (ORDER BY aggFun () DESC LIMIT n). If not specified, the value from lex is used. ConformanceSQL conformance level. Value: DEFAULT (default, similar to PRAGMATIC_2003), LENIENT,MYSQL_5,ORACLE_10,ORACLE_12,PRAGMATIC_99,PRAGMATIC_2003,STRICT_92,STRICT_99,STRICT_2003,SQL_SERVER_2008. Whether createMaterializationsCalcite creates materialized things. Default falsedefaultNullCollation how null values should be sorted if neither NULLS FIRST nor NULLS LAST is specified in the query. The default is HIGH, which is the same as Oracle, and null values are sorted. How many rows should the Druid adapter read when druidFetch executes a SELECT query. Whether forceDecorrelate planners should be interrelated as much as possible. The default is true. A collection of fun built-in functions and operators. Valid values are "standard" (default), "oracle", "space", and can be combined with commas, such as "oracle,spatial". Lex vocabulary (keyword) strategy. The value is ORACLE (default), MYSQL,MYSQL_ANSI,SQL_SERVER,JAVA. Whether materializationsEnabledCalcite uses materialized things. The default URIparserFactory parser factory for falsemodelJSON model files. The name of the class that implements the SqlParserImplFactory interface, and there is a public default constructor or how the INSTANCE constant quoting references the identifier. The value is DOUBLE_QUOTE,BACK_QUOTE,BRACKET. If not specified, the value from lex is used. How quotedCasing stores identifiers that use quotation marks. The values are UNCHANGED, TO_UPPER, TO_LOWER. If not specified, the value from lex is used. Schema initializes the schema name schemaFactorySchema factory. The name of the class that implements the SchemaFactory interface and has a public default constructor or INSTANCE constant. If model is specified, this property is ignored. SchemaTypeSchema type. The value must be "MAP" (default), "JDBC" or "CUSTOM". (implicitly specified as type CUSTOM if schemaFactory is specified.) if model is specified, this property is ignored. Spark specifies whether Spark should be used as a processing engine that cannot be pushed to the source system. If false (the default), Calcite generates code that implements the Enumerable interface. TimeZone time zone, such as "gmt-3". The default is the time zone of JVM. TypeSystem type system. The name of the class that implements the interface RelDataTypeSystem and has a public default constructor or INSTANCE constant. UnquotedCasing how to store the identifier if it is not referenced. The valid value is UNCHANGED,TO_UPPER,TO_LOWER. If not specified, the value from lex is used.
If is only connected to a model based on the built-in shema type, you do not need to specify model. For example:
Jdbc:calcite:schemaType=JDBC; schema.jdbcUser=SCOTT; schema.jdbcPassword=TIGER; schema.jdbcUrl=jdbc:hsqldb:res:foodmart
creates a connection that maps to the foodmart database through the JDBC Schema adapter.
Similar to , you can connect to a single schema based on a user-defined schema adapter. For example:
Jdbc:calcite:schemaFactory=org.apache.calcite.adapter.cassandra.CassandraSchemaFactory; schema.host=localhost; schema.keyspace=twissandra
creates a connection to the Cassandra adapter, which is equivalent to the following model file:
{"version": "1.0"," defaultSchema ":" foodmart "," schemas ": [{type: 'custom', name:' twissandra', factory: 'org.apache.calcite.adapter.cassandra.CassandraSchemaFactory', operand: {host:' localhost', keyspace: 'twissandra'}}]}
Note:
Each key in the operand section uses shema. Exe in the connection string. Is a prefix.
1.6 Server
The core module of Calcite (calcite-core) supports SQL queries (SELECT) and DML operations (INSERT,UPDATE,DELETE,MERGE), but does not support DDL operations such as CREATE SCHEMA or CREATE TABLE. As we will see, DDL complicates the state model of the repository and makes the parser more difficult to extend, so we keep DDL out of the core.
The server module (calcite-server) adds DDL support to Calcite. It extends the SQL parser, using the same mechanism as subprojects, and adds some DDL commands:
CREATE and DROP SCHEMACREATE and DROP FOREIGN SCHEMACREATE and DROP TABLE (including CREATE TABLE... AS SELECT) CREATE and DROP MATERIALIZED VIEWCREATE and DROP VIEW
The command is described in detail in the chapter SQL references.
enables this feature by adding calcite-server.jar to classpath and adding parserFactory=org.apache.calcite.sql.parser.ddl.SqlDdlParserImpl#FACTORY to the connection string of JDBC. Here is a simple example of using sqlline shell: $. / sqllinesqlline version 1.3.0 >! connect jdbc:calcite:parserFactory=org.apache.calcite.sql.parser.ddl.SqlDdlParserImpl#FACTORY sa "> CREATE TABLE t (I INTEGER, j VARCHAR (10)) No rows affected (0.293 seconds) > INSERT INTO t VALUES (1,'a'), (2, 'bc'); 2 rows affected (0.873 seconds) > CREATE VIEW v AS SELECT * FROM t WHERE i > 1 seconds No rows affected (0.072 seconds) > SELECT count (*) FROM v +-+ | EXPR$0 | +-+ | 1 | +-+ 1 row selected (0.148 seconds) >! quit
The Linux system test passed, but Windows does not support it yet.
The calcite-server module is optional. One of its goals is to use concise examples to demonstrate the capabilities of Calcite (such as materialized views, external tables, and generated columns), which can be tried from the SQL command line. All the functions used by calcite-server are available through API in calcite-core.
if you are the author of a subproject, your syntax extension is unlikely to match calcite-server, so we recommend that you add SQL syntax extensions by extending the core parser; if you need the DDL command, you can copy and paste calcite-server into your project.
currently, repositories are not persisted. When you execute the DDL command, modify the in-memory repository by adding and removing accessible objects from the root Schema. All commands in the same SQL session will see these objects. You can create the same objects in future sessions by executing the same SQL command script.
Calcite can also be used as a data virtualization or federated server: Calcite manages data in multiple external Schema, but the data seems to be in the same place for the client. Calcite chooses where it should be processed and whether to create a copy of the data for efficiency. The calcite-server module is a step towards achieving this goal; an industry vertical solution will require further encapsulation extensions (making Calcite runnable as a service), repository persistence, authorization, and security.
1.7 scalability
has many other API that can extend the functionality of Calcite.
in this section, we will give you a brief introduction to these API to give you an idea of what you can do. To make full use of these API, you need to read other documentation, such as the javadoc of the interface, and you may need to find out the tests we wrote for them.
1.8 functions and operators
has several ways to add operators or functions to Calcite. We will first describe the simplest (and least powerful).
user-defined functions are the simplest (but least functional). They are easy to write (you just need to write a Java class and register it with Schema), but it doesn't provide much flexibility in terms of the number and type of parameters, parsing overloaded functions, or derived return types.
If needs this flexibility, you may need to write a user-defined operator (see the SqlOperator interface).
If the operator does not follow the standard SQL function syntax "f (arg1, arg2,...)", then you need to extend the parser.
There are many good examples in testing: the UdfTest class tests user-defined functions and user-defined aggregate functions.
aggregate function
user-defined aggregate functions are similar to user-defined functions, but each function has several corresponding Java methods, one for each phase of the aggregation lifecycle:
Init creates an accumulator; add adds the value of a row to the accumulator; merge combines two accumulators into one; and result completes the accumulator and converts it into a result.
for example, the method for the SUM (int) function is as follows (pseudo code): struct Accumulator {final int sum;} Accumulator init () {return new Accumulator (0);} Accumulator add (Accumulator a, int x) {return new Accumulator (a.sum + x);} Accumulator merge (Accumulator a, Accumulator a2) {return new Accumulator (a.sum + a2.sum);} int result (Accumulator a) {return new Accumulator (a.sum + x);}
The following is a sequential call sequence that calculates the sum of the two rows with column values 4 and 7:
A = init () # a = {0} a = add (a, 4) # a = {4} a = add (a, 7) # a = {11} return result (a) # returns 11
Window function
The window function is similar to the aggregate function, except that it applies to a set of rows collected by the OVER clause rather than by the GROUP BY clause. Each aggregate function can be used as a window function, but there are some key differences. The rows seen by window functions may be ordered, and order-dependent window functions (for example, RANK) cannot be used as aggregate functions.
Another difference in is that windows are non-disjoint: a particular row can appear in multiple windows. For example, 10:37 appears in the 9:00-10:00 and 9:15-9:45 windows.
The window function is calculated incrementally: when the clock turns from 10:14 to 10: 15:00, two lines may enter the window and three lines leave. To do this, the window function has an additional lifecycle operation:
Remove removes a value from the accumulator.
The pseudo code of SUM (int) will be: Accumulator remove (Accumulator a, int x) {return new Accumulator (a.sum-x);}
Below the is the sequence of calls that calculate the sum of the moves. In the first two lines, the values in the four lines are 4, 7, 2 and 3, respectively:
A = init () # a = {0} a = add (a, 4) # a = {4} emit result (a) # emits 4a = add (a, 7) # a = {11} emit result (a) # emits 11a = remove (a, 4) # a = {7} a = add (a, 2) # a = {9} emit result (a) # emits 9a = remove (a, 7) # a = {2} a = add (a 3) # a = {5} emit result (a) # emits 5
Grouping window function
The grouping window function is a function that manipulates the GROUP BY clause to aggregate records. The built-in grouping window functions are HOP,TUMBLE and SESSION. You can define other functions by implementing the interface SqlGroupedWindowFunction.
Table function and table macros
user-defined table functions are defined in a similar way to regular user-defined "scalar" functions, but are used in the FROM clause of a query. The following query uses a table function named Ramp:
SELECT * FROM TABLE (Ramp (3,4))
user-defined table macros uses the same SQL syntax as table functions, but with different definitions. Instead of generating data, they generate a relational expression. Table macros is called during query preparation, and then the relational expressions they generate can be optimized. (Calcite uses table macros to implement views).
The TableFunctionTest class tests table functions and contains several useful examples.
1.9 extended parser
assumes that you need to extend Calcite's SQL syntax in order to be compatible with future syntax changes. It is foolish to copy a copy of the Parser.jj syntax file in a project because the syntax is edited frequently.
Fortunately for , Parser.jj is actually an Apache FreeMarker template that contains variables that can be replaced. The parser in calcite-core instantiates the template using the default value of the variable, which is usually empty, but can be overridden. If your project uses a different parser, you can provide your own config.fmpp and parserImpls.ftl files, resulting in an extended parser.
The calcite-server module is created in [CALCITE-707] with the addition of DDL statements such as CREATE TABLE, which is an example to follow. You can also refer to the class ExtensionSqlParserTest.
1.10 Custom SQL dialect acceptance and generation
to customize the SQL extension that the parser should accept, implement the SqlConformance interface or use one of the built-in values in the enumeration SqlConformanceEnum.
to control how SQL is generated for external databases (usually through JDBC adapters), use the SqlDialect class. Dialects also describe the capabilities of the engine, such as whether to support OFFSET and the use of FETCH.
1.11 Custom Schema
To customize Schema for , you need to implement the SchemaFactory interface.
during query preparation, Calcite will call this interface to find out the tables and sub-Schema contained in your custom Schema. When referencing a table in your custom schema in a query, Calcite will ask your schema to create an instance of the Table interface.
the table will be wrapped in TableScan and the query optimization process will be performed.
1.12 Reflective schema
Reflective schema (ReflectiveSchema class) is a way to encapsulate a Java object so that it appears as a Schema. Fields that have set values are displayed as tables.
is not a Schema factory, but a real schema; you must call API to create an object and wrap it in Schema.
See the ReflectiveSchemaTest class for .
1.13 Custom form
to customize a table, you need to implement the TableFactory interface. Although the Schema factory produces a set of named tables, when bound to a Schema with a specific name (and an optional set of additional operands operands), the table factory generates a single table.
Modify data
if your custom table supports DML operations (INSERT,UPDATE,DELETE,MERGE), you must implement the ModifiableTable interface as well as the Table interface.
Flow
if your custom form supports streaming queries, you must implement the StreamableTable interface as well as the Table interface.
see the example of the StreamTest class.
Push the operation down to the custom table
if you want to push processing to the source system of a custom form, consider implementing the FilterableTable interface or the ProjectableFilterableTable interface.
If wants more control, it should write a planner rule. This will allow you to push down expressions and decide whether to push down processing and more complex operations such as association, aggregation, and sorting based on cost.
1.14 Type system
can customize some aspects of the type system by implementing the RelDataTypeSystem interface.
Relational operator
All relational operators implement the RelNode interface, and most of them inherit from the AbstractRelNode class. Core operators such as TableScan, TableModify, Values, ProjectFilter, Aggregate, Join, Sort, Union, Intersect, Minus, Window and Match (used by SqlToRelConverter to cover conventional traditional relational algebra).
Each of the above has a "pure" logical subclass, LogicalProject, and so on. Any given adapter will have a corresponding part of the engine that can effectively perform the operation; for example, the Cassandra adapter has CassandraProject but no CassandraJoin.
can define your own RelNode subclass to add a new operator, or implement an existing operator in a particular engine.
to make operators useful and powerful, you need planner rules to use them in conjunction with existing operators. (and provide metadata, see below). This is algebra, and the effects are combined: you write some rules, but together they can handle an exponential number of query patterns.
, if possible, make your operator a subclass of existing operators; then you can reuse or adapt its rules. Better yet, if your operator is a logical operation, you can rewrite it with existing operators (through planner rules), and try to do so. You will be able to reuse the rules, metadata, and implementations of these operators without extra work.
Planner rules
planner rules (RelOptRule class) can convert relational expressions to equivalent relational expressions.
The planner engine has many planner rules that register and trigger them to convert input queries into more efficient plans. Therefore, planner rules are at the core of the optimization process, but surprisingly each planner rule does not care about cost. The planner engine is responsible for sending rules sequentially to generate the optimal plan, but each rule is only concerned with its correctness.
Calcite has two built-in planner engines: the VolcanoPlanner class uses dynamic programming and is effective for exhaustive search, while the HepPlanner class triggers a series of rules in a more fixed order.
Calling convention
The calling convention is the protocol used by a particular data engine. For example, the Cassandra engine has a collection of relational operators, CassandraProject,CassandraFilter, and so on, and these operators can be connected to each other without the need to convert data from one format to another.
If needs to convert data from one calling convention to another, Calcite uses a special relational expression subclass called a converter (see the Converter class). But, of course, there is an operating cost to convert the data.
When planning queries that use multiple engines, Calcite "shades" the area of the relational expression tree according to its calling convention. The planner pushes actions into the data source by triggering rules. If the engine does not support a specific action, the rule does not trigger. Sometimes an operation may take place in more than one place, and finally the best solution is chosen according to the cost.
The calling convention is a set of subclasses that implement the Convention interface, auxiliary interfaces (such as the CassandraRel interface), and the RelNode class, a class that implements the core relational operators (Project,Filter, Aggregate, and so on).
Built-in SQL implementation
how does Calcite implement SQL if an adapter does not implement all the core relational operators?
The answer to is a specific built-in calling convention, EnumerableConvention. Enumerable convention relational expressions are implemented as "built-in": Calcite generates Java code, compiles it, and executes it in its own JVM. Enumerable conventions are less efficient than distributed engines that run on column data files, but can implement all core relational operators as well as all built-in SQL functions and operators. If a data source cannot implement a relational operator, the enumeration convention is a fallback.
1.15 Statistics and cost
Calcite has a metadata system that allows you to define cost functions and statistics about relational operators, collectively referred to as metadata. Each type of metadata has (usually) an interface to a method. For example, selectivity is defined by the RelMdSelectivity class and the method getSelectivity (RelNode rel,RexNode predicate).
has many built-in metadata types, including collation, column origins, column uniqueness, distinct row count, distribution, explain visibility, expression lineage, max row count, node types, parallelism, percentage original rows, population size, predicates, row count, selectivity, size, table references, unique keys; can also be customized.
you can then provide a metadata provider (Provider) to calculate the metadata for a specific subclass of the RelNode. Metadata providers can handle built-in and extended metadata types, as well as built-in and extended RelNode types. When preparing a query, Calcite combines all applicable metadata providers and maintains a cache so that a given metadata is evaluated only once (such as selectivity for x > 10 conditions in a particular Filter operator).
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.