In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Part II Advanced (Advanced) 1. Lattices
Lattice is a framework for creating and populating materialized views that can identify and resolve materialized views for specific queries.
A Lattice represents a star (or snowflake) model, not a general-purpose schema. In particular, all connections must be many-to-one, built around the fact table of the star model.
The name Lattice comes from mathematics: lattice is a partially ordered set, and any two elements have a single maximum lower bound and a minimum upper bound.
[HRU96] found that the possible materialized sets of cubes form a lattice, and an algorithm is used to select a set of more optimized materialized combinations. Calcite's recommendation algorithm is also derived from this.
Lattice uses SQL statement to represent the star model to define itself. SQL is a useful phrase that can represent multiple table join relationships, and you can assign aliases to column names (SQL expression is more convenient than inventing a new language to express relationships, join conditions, and cardinals).
Unlike regular sql, is important here in order. If you put An in front of B in the from module and implement a join relationship between An and B, it can be said that there is a many-to-one relationship from A to B. (for example, in the lattice example, the Sales fact table is triggered before the time dimension table and the product dimension table, and the product dimension table is triggered before the product classification dimension table, further down an arm of a snowflake.)
A Lattice means a series of constraints. In the A-B relationship, there is a foreign key on table A (for example, each foreign key in table A corresponds to a value in table B key), and a unique key in table B (for example, the key value appears only once). These constraints are important because they allow the planner to remove join relationships from unused columns and know that the query results will not change.
Calcite does not validate the constraint, and if the constraint rule is violated, Calcite will directly return the wrong result.
Lattice is a large virtual join view. It is not materialized (it may be several times larger than the star model due to de-normalization), and you may not want to query it (there are too many columns). So what exactly does it do?
1) as mentioned above, lattice declares many useful primary and foreign key constraints.
2) it helps query executors map users' query behavior to filter-join-aggregate materialized views (the most useful materialized view type for DW queries)
3) give Calcite a framework to collect statistics about data volumes and user queries
4) allow Calcite to automatically design and populate materialized views
Most star models of force users to define whether a column is a dimension or an indicator. In lattice, each column is a dimension column (that is, it can be a column in the GROUP BY clause to query the star schema on a particular dimension). All column can also be used as metrics. You can specify an aggregate function for a specified column to define a metric (metrics).
if "unit_sales" is generally more likely to be used as a measure than as a dimension, it doesn't matter. The Calcite algorithm will notice in the near future that it is rarely aggregated and will not tend to create aggregation layers on top of it (the current algorithm does not refer to query history when designing tiles).
but some people may wonder whether orders with less than 5 and more than 100 will be more or less profitable. In this scenario, "unit_sales" suddenly becomes a dimension. If the cost of declaring a column as a dimension column is 0, I think we should make all columns dimension columns.
this model allows a specified table to be used multiple times with different table aliases. Two time dimensions OrderDate and ShipDate can be defined in the model.
most SQL systems require that column names in a view be unique, which is difficult to achieve in lattice, because both primary and foreign key columns are often included in a join. Therefore, Calcite allows users to reference columns in two ways, and if the column is unique, you can directly use its name ['unit_sales']. Whether the column is unique in lattice or not, it is unique in its own table, so you can reference it by its table name. Such as the following example
["sales", "unit_sales"] ["ship_date", "time_id"] ["order_date", "time_id"]
A "tile" of is a materialized table in lattice with some specific dimensions (called cuboid--Cube in Kylin contains different pairwise combinations of all dimensions, each combination is a Cuboid). The tiles attribute in the Lattice JSON element defines an initial materialized tile collection.
can omit the tiles attribute if it runs the algorithm. Calcite selects an initial collection. If the tiles attribute is defined, the algorithm starts running with the defined list and then starts looking for other complementary tiles (eg fill in the gaps left by the initail tiles).
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.