Model Design of Quick BI and Analysis of the principle of generating SQL 04/18 Update SLTechnology News&Howtos

Model Design of Quick BI and Analysis of the principle of generating SQL

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Model Design of Quick BI and Analysis of the principle of generating SQL

1. Summary

With the rapid development of the Internet, the amount of data shows a blowout growth. How to analyze and use these data to make the data produce commercial value has become more and more important. It is gratifying that more and more people have realized the importance of using data analysis to determine business strategy, and are also carrying out data analysis in various industries. As we all know, the core of data analysis is data. In order to analyze data more easily, the design of data model needs to follow certain specifications. At present, the most popular specification of online analytical processing (OLAP) is the dimensional modeling specification. This paper introduces how Quick BI carries on the dimension modeling and how to automatically generate the SQL statement of the analysis query based on the dimension model, which makes the data analysis easier.

Keywords: Quick BI, OLAP, dimensional modeling, SQL

Second, the classification of dimensional models

OLAP (On-line Analytical Processing, online Analytical processing) can be divided into ROLAP, MOLAP and HOLAP according to the different ways of storing data. ROLAP represents the OLAP implementation based on relational database storage (Relational OLAP), with relational database as the core and relational structure for multi-dimensional data representation and storage; MOLAP represents the OLAP implementation based on multidimensional data storage (Multidimensional OLAP); HOLAP represents the OLAP implementation based on hybrid data storage (Hybrid OLAP), such as low-level relational database storage and high-level multi-dimensional array storage. Next, it mainly introduces the modeling principle of ROLAP based on relational database.

ROLAP divides tables in a multidimensional database into two categories: fact tables and dimension tables. Fact tables are used to store factual data of dimension keywords and numerical types, and are generally designed around business processes, such as sales fact tables, which generally store information such as when and where users bought products, sales and sales. The dimension table is used to store the detailed data of the dimension, for example, the ID of the product dimension is stored in the sales fact table, and the product name and brand information are stored in the product dimension table, which are associated through the product ID.

According to the relationship between fact table and dimension table, ROLAP can be divided into star model (Star Schema) and snowflake model (Snowflake Schema).

1. Star model

The star model consists of fact table (FactTable) and dimension table (DimensionTable). The dimensional foreign keys in the fact table are respectively associated with the primary keys in the corresponding dimension table. After the association, the shape looks like a star, so the image is called the star model. The following example is a star model: where sales_fact_1997 is a fact table that stores information about the customer's purchase, quantity, and sales of a product at a certain time, a store, and a record of an order placing process. The fact table sales_fact_1997 is associated with dimension tables product (product dimension table), customer (customer dimension table), time_by_day (time dimension table) and store (store dimension table) respectively through foreign keys product_id, customer_id, time_id and store_id.

two。 Snowflake model

Snowflake model is when one or more dimensional tables are not directly connected to the fact table, but are connected to the fact table through other dimensional tables, its diagram is like a snowflake, so it is called snowflake model. The following example product (product) dimension table and product_class (product category) dimension table are associated through product_class_id, and the association is many-to-one. Product_class is not directly associated with the sales_fact_1997 fact table.

Third, the principle of SQL generation based on ROLAP model.

After the model is built, the next focus is to generate SQL statements to meet the analysis needs, and then send the SQL statements to DB to query the data and return the analysis results. The following describes how to generate SQL statements through specific requirements scenarios.

1. Generate SQL based on star model (or snowflake model)

Demand scenarios:

Check the total sales and sales volume by date and product, the date is limited to 1997, the total sales is limited to more than 1000 yuan, the results are arranged in reverse order according to the total sales, look at the first five.

Generate SQL ideas

1. Analyze the fields and tables that need to be used, the goal is to identify which tables, inter-table relationships, table grouping fields, aggregate fields, and determine the select and from information in SQL.

two。 Analyze the filter criteria, and the goal is to identify the values to be filtered in the where in the SQL.

3. Analyze the grouping dimension, and the goal is to identify the fields of group by in SQL.

4. The goal of analyzing the aggregated filter criteria is to identify the values that need to be filtered in the having.

5. Analyze the columns and sort types that need to be sorted (ascending or descending).

6. Limit on the number of results generated

7. Generate a query SQL based on the above information:

Select grouping field, aggregate field from table (including table association) where filter condition group by grouping dimension having aggregated filter condition order by sort information result number limit.

Generate SQL

Follow the steps above, and the requirements in this example, to analyze the key information in the query (the following steps correspond to the steps in the idea of generating SQL)

1. The grouping fields used: the_date and product_name. The grouping field the_date is daily granularity and needs to be processed as annual granularity: DATE_FORMAT (`the_ date`,'% Y')

Aggregate fields: store_sales, unit_sales. Aggregation method is sum.

Tables used: sales_fact_1997, product, time_by_day

Relationship between tables: sales_fact_1997. Product_id= product. Product_id

Sales_fact_1997. Time_id= time_by_day .time _ id

two。 Filter criteria:

The_ date` = STR_TO_DATE ('1997-01-01 00 Y-%m-%d 0000','% Y-%m-%d% iRO% s')

3. Grouping dimensions: DATE_FORMAT (`the_ date`,'% Y'), product_name

4. Filter criteria after aggregation: SUM (`store_ sales`) > 1000

5. Sort: alias desc after order by aggregation

6. Limit the number of results: limit 0pl 5

7. The generated SQL is as follows

IV. Appendix-tables used

The table creation statements used in the above example are listed below, which need to be executed under the MySQL database, while other types of databases need to make some adjustments.

1. Sales_fact_1997 table

2. Product table

3. Product_ class table

4. Time_by_ Daylist

5. Customer table

6. Store table

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.