Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Quick BI supports multi-dimensional analysis of multiple data sources

2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

1. Summary

With the rapid development of the Internet and the explosive growth of data, the storage forms of data also begin to show diversity, such as structured storage, such as Mysql, Oracle, SQLServer, semi-structured or even unstructured storage, such as HBase,OSS. Then people engaged in data analysis are faced with extracting data from a variety of data storage forms and then conducting multi-dimensional analysis, which will be a very challenging thing. Quick BI, as a new generation of intelligent BI service platform, just solves this problem. It not only supports multi-dimensional analysis of a variety of structured data sources, but also supports query analysis after uploading local files, but also supports OLAP analysis of some unstructured data sources, and even supports association analysis of mixed heterogeneous data sources.

The data sources currently supported by Quick BI can come from either Ali Cloud Database or self-built database, as shown below:

Second, multidimensional analysis of structured data sources.

For general data sources, users need to add their own data sources, such as MySQL data sources, to the Quick BI data source interface before doing multi-dimensional analysis, as shown below:

After the data source is added, you can select one or more data tables to analyze to create a dataset, as follows:

After the dataset is created, users can drag and drop dimensions and metrics in the dashboard for multi-dimensional analysis, such as:

The multidimensional analysis of structured data sources is relatively simple, and the general process is to generate the dialect SQL of the corresponding structured data sources according to the query factors such as dimensions, metrics and filter conditions selected by the user, and then send the SQL to the user's own database through the executor. finally, Quick BI receives the returned query results for visual display. The following figure shows the flow chart of multidimensional analysis:

By the way, local file uploads support both csv and Excel file types. The uploaded file will land on an official data source provided by Quick BI: the Discovery Space. Exploring the bottom layer of the space relies on a MPP SQL engine developed by Aliyun, which provides storage + computing services.

The next chapter will focus on the principles of unstructured query analysis and mixed heterogeneous data source association analysis.

III. Query analysis of unstructured data sources

3.1 background

In recent years, some large enterprises prefer to use databases such as semi-structured storage (HBase) and object storage (OSS), which can accommodate databases larger than big data. How to effectively help enterprises to carry out multi-dimensional data analysis of this kind of data sources is a challenge for BI products in the industry.

Among the open source database products, there are some potential solutions. For example, for the query of HDFS data, Hive designed the metastore component, which is specially used to store metadata and solves the mapping relationship between structured query and unstructured data. Users can customize the mapping way more flexibly by using the syntax of creating external table SQL. In addition, Apache Phoenix takes a similar approach to enable users to query data in HBase using SQL statements. After full investigation, according to the business scenario of Quick BI products, combined with open source computing engine, a set of analysis engine for unstructured query is developed.

3.2 Technical principles

The key to OLAP query for unstructured data sources is to support data query in the form of SQL syntax. Quick BI uses SQL syntax to create external tables within the OLAP engine, providing users with a custom way to map from unstructured data to structured storage. For structured data sources such as MySQL and Oracle, there is no need for additional metadata information, while for unstructured data sources, additional metadata information is needed. Metastore maintains the metadata information of all unstructured data sources, which reflects the mapping from unstructured data to structured data. Metadb contains three tables that define the metadata information that can be queried by SQL, as shown in the following figure:

Schems, Tables, and Columns respectively define the structure of the external table, and when the external table is created through SQL, the corresponding records are added to it. When querying unstructured data sources, read the corresponding records and parse the data.

Take a scenario as an example to further illustrate the process of unstructured query. Suppose the user stores business data on OSS in the form of a CSV file, whose name is iris.csv, and its content is as follows:

For this file, the user expects to query it with Quick BI. According to the user's configuration in the Quick BI data source page, a SQL statement is generated inside OLAP to create an external table:

SQL Parser parses the SQL, and AST Builder generates the corresponding AST, as shown below:

Notice that the AST contains all the information needed for the external table, including how to parse the csv file and map it to structured data so that the data can correspond to schema, column, and table in the SQL. Query Execution calls Metastore to store the information in Meatadb. At this point, the CSV file stored on OSS has been logically mapped into a table, as shown in the following figure:

Next, you can query it directly using SQL, for example, using the following SQL to filter out data records with a species of setosa

Similarly, for HBase data sources, the mapping rules are also defined in the form of external tables:

3.3 Summary

Quick BI currently supports query analysis of csv files on OSS. In the future, it will support query analysis of more file types on OSS and multi-dimensional analysis of HBase.

IV. Association analysis of mixed heterogeneous data sources

4.1 background

Quick BI users' analysis requirements for heterogeneous data sources, such as dimension tables in MySQL and fact tables in MaxCompute, need to analyze the association of tables between different types of data sources or between different databases of the same type of data sources.

4.2 Technical principles

In order to support cross-source association analysis between heterogeneous data sources, we first need a computing engine with data source Connector plug-in mechanism, which can be easily extended to support multiple data source type queries. We have selected a MPP memory computing engine with such characteristics from the open source computing engine in the industry, and after secondary development, we have formed a set of cross-source query engine suitable for Quick BI business scenarios.

When the cross-source query engine supports queries from heterogeneous data sources, it only needs to use the complete catalogName.dbName.tableName table name (e.g. odps.quickbi_test.company_sales_record) in the query SQL.

In Quick BI business, each data source configured by users is equipped with a unique identity dsId in the background, so it can be used as a catalogName, and the Catalog involved has already been loaded in the cross-source query engine. After that, when querying, the cross-source query engine parses the SQL, generates a logical plan and a physical plan, then loads the data to the specified data source through the configuration of Catalog, calculates it in memory and returns the final result.

Most multidimensional analysis is a single-source query, so under what circumstances will the cross-source query engine be enabled? it is necessary to realize the intelligent routing of queries from heterogeneous data sources. intelligent routing mainly determines whether it is a single-source query or a multi-source query to be routed to a single-source query engine or a cross-source query engine according to the data source information covered in the query model. The cross-source query process is shown in the following figure:

Note that you need to ensure that the catalogs involved in this query is dynamically registered with the CatalogServer of the cross-source query engine before sending the SQLText of this query to the cross-source query engine.

Example: heterogeneous query between MaxCompute and MySQL SQL:

Query results:

4.3 Summary

Heterogeneous data source analysis is transparent and imperceptible to users. Welcome to the Quick BI experience. Users only need to select data tables from different databases when data sets are associated, and they can easily experience the function of heterogeneous data source analysis. At present, it supports heterogeneous data source query among MaxCompute, MySQL and Oracle or heterogeneous data source cross-database query. In the future, heterogeneous queries for more types of data sources will be supported, such as Hive, SQLServer, PostgresSql and so on.

5. Unfinished to be continued

Quick BI will update iteratively and gradually support multi-dimensional analysis of more types of data sources, such as API data sources, etc.

Author: Saruo Yi Hou

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report