Implementation of cross-database multi-dimensional analysis background 07/15 Update SLTechnology News&Howtos

Implementation of cross-database multi-dimensional analysis background

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

The raising of a question

The background data of multidimensional analysis (BI) system may often come from multiple databases, so there will be the problem of fetching data across databases.

For example, considering the performance and cost, the capacity of the production database is often limited, and the historical data is stored in a separate database. ETL periodically synchronizes the newly generated data in the production database to the historical database. The synchronization cycle may be one day, a week or a month, depending on the amount of data generated. If the multidimensional analysis system is only connected to the historical database to fetch data, then the user can only analyze the historical data, that is, to realize the multidimensional analysis of Tunable 1, 7 and 30. If you want to realize the real-time analysis of Thum0, you should get the data from the production database and the history database to calculate and finally merge the results. In many cases, the production database and history database are still heterogeneous databases, so it is difficult to do cross-database mixed operations directly.

Even if it is not a Thum0 scenario, when there is a large amount of historical data, it may be stored in multiple databases, and it will also be the case of heterogeneous databases. At this time, the multi-dimensional analysis system also needs to take numbers, calculate and merge the results from several different data warehouses.

Solution ideas and process

As a data computing middleware (DCM), the construction of data front layer is an important application mode of aggregator. The aggregator has a programmable gateway mechanism, which can connect multiple databases at the same time, and submit the results to the foreground for presentation.

Case scenario description

In the following case, the multidimensional analysis system does self-help analysis of the order data. For simplicity, we used the following simulation environment:

L the foreground of the multidimensional analysis system is simulated by jdbc.jsp in the tomcat server. Tomcat is installed in C:\ tomcat6 of the windows operating system.

L aggregator JDBC is integrated in multi-dimensional analysis applications. Jdbc.jsp imitates the multidimensional analysis application system to generate SQL that conforms to the aggregator specification and submits it to the aggregator SPL script for processing through the aggregator JDBC.

Part of the data of the multidimensional analysis system comes from the ORDERS table in the production database (Oracle database) demo, and the other part comes from the history database (Mysql database) test. On the same day, the data is connected to the production database to realize real-time analysis.

The ETL process synchronizes the latest data of the day to the history library every day. The date is subject to the order date ORDERDATE, assuming that the current date is 2015-07-18. The start and end dates of the ORDERDATE are required for multi-directional analysis.

The case includes a production library and a history library. In fact, the aggregator also supports one production library and multiple history libraries at the same time, or in cases where there is no production library but there are multiple history libraries.

Background data initialization preparation

Use the following sql file to complete the ORDERS table creation and data initialization in the ORACLE database.

The configuration file is in classes, and the license files obtained on the official website should also be placed in the classes directory. The Jar package for the aggregator is to be placed in the lib directory (refer to the aggregator tutorial for which jar you need).

Modify the home directory configuration in raqsoftConfig.xml:

C:\ tomcat6\ webapps\ CrossDB\ WEB-INF\ dfx

Runtime,Server

CrossDB.dfx

2. Edit the jdbc.jsp in the CrossDB directory and simulate the foreground interface to submit the sql display result.

In jsp, connect to the JDBC of the aggregator, and then submit the execution SQL. The steps are exactly the same as the general database, with high compatibility and versatility. For multidimensional analysis tools, although it is an interface operation to connect JDBC and submit SQL, the basic principle is exactly the same as jsp.

3. Open the CrossDB.dfx in the dfx directory and observe and understand the SPL code.

The input parameter is sql, for example:

Select top 10 ORDERID,CUSTOMERID,EMPLOYEEID,ORDERDATE,AMOUNT from ORDERS where ORDERDATE between date ('2011-07-18') and date ('2015-07-18') and AMOUNT > 100.

The SPL script is as follows:

one

= sql.sqlparse@aw ()

= A1.pselect ("ORDERDATE between*")

two

= substr (A1 (B1), "date (")

= substr (A1 (B1N1), "date (")

three

= mid (A2Jing 2jue 10)

= mid (B2Jing 2jue 10)

four

If between (date (now ()), date (A3): date (B3))

five

= connect ("orcl")

= B5.cursor@dx ("select ORDERID,CUSTOMERID,EMPLOYEEID,ORDERDATE,AMOUNT from orders where orderdate=?", date (now ()

six

= connect ("mysql")

= B6.cursor@x ("select ORDERID,CUSTOMERID,EMPLOYEEID,ORDERDATE,AMOUNT from orders")

seven

= mcs= [C5 ~ C6] .mcursor ()

= connect () .cursor@x ("with orders as (mcs)" + sql)

eight

Return C7

nine

Else

= connect ("mysql")

= B9.cursor@x (sql.sqltranslate ("MYSQL"))

ten

Return C9

Description:

A1: parses the SQL, gets the where clause, and splits it into sequences with spaces.

B1mema2Muir B3: find the required condition order date in A1 sequence, and get the start and end date values.

A4: determine whether the query scope contains the current date.

B5-C6: if the current date is included, connect the production database and the history database and set up a fetch cursor.

B7: create multipath cursors with production and history cursors.

C7, B8: perform sql queries on multiway cursors and return results.

A9-C10: if you don't include the current date, just connect to the history database. Translate SQL into SQL that conforms to the MYSQL database specification, execute SQL to get the cursor and return it.

In the actual business, it is generally necessary to keep some historical data in the production library, so that there will be duplicate data in the production library and history database, so the code needs to add date conditions to the production library. In the case of multiple historical libraries, generally speaking, there is no duplicate data between these libraries, and the code can be simplified. For example, assuming that the ORACLE and MYSQL in the example do not have duplicate data, the code for CrossDB.dfx can be simplified as follows:

one

= connect ("orcl")

= A1.cursor@dx ("select ORDERID,CUSTOMERID,EMPLOYEEID,ORDERDATE,AMOUNT from orders")

two

= connect ("mysql")

= A2.cursor@x ("select ORDERID,CUSTOMERID,EMPLOYEEID,ORDERDATE,AMOUNT from orders")

three

= mcs= [B1 ~ B2] .mcursor ()

= connect () .cursor@x ("with orders as (mcs)" + sql)

four

Return B3

4. Start tomcat, access http://localhost:8080/CrossDB/jdbc.jsp in the browser, and view the results.

As a result, the format of the order date is slightly different, which only needs to be set in the display format at the front end of the multidimensional analysis.

We can also continue to test the following situations:

1. Only query the history database

Sql = "select top 10 ORDERID,CUSTOMERID,EMPLOYEEID,ORDERDATE,AMOUNT from ORDERS where ORDERDATE between date ('2011-07-18') and date (' 2015-07-18') and AMOUNT > 100"

2. Group query

Sql = "select CUSTOMERID,EMPLOYEEID,sum (AMOUNT) Sminute count (1) C from ORDERS where ORDERDATE between date ('2011-07-18') and date (' 2015-07-18') group by CUSTOMERID,EMPLOYEEID"

ETL process

In this case, the aggregator SPL script can also take on the work of ETL.

After the multidimensional analysis system is online, the latest data of the day should be synchronized regularly every night. Let's assume that the date of the day is 2015-07-18.

The SPL script etl.dfx adds current-day data increments to the history database, as shown below:

one

= connect ("orcl")

two

= A1.cursor@xd ("select ORDERDATE,CUSTOMERID,EMPLOYEEID,ORDERID,AMOUNT from ORDERS where ORDERDATE=?", etlDate)

three

= connect ("mysql")

four

= A3.update@i (B2, ORDERS,ORDERDATE,CUSTOMERID,EMPLOYEEID,ORDERID,AMOUNT)

five

> A3.close ()

The input parameter for etl.dfx is etlDate, which is the date of the day that needs to be added.

Etl.dfx scripts can be executed on the windows or linux command line, and can be executed regularly in combination with scheduled tasks. You can also use the ETL tool to schedule calls.

The windows command line can be invoked by:

C:\ Program Files\ raqsoft\ esProc\ bin > esprocx.exe C:\ etl.dfx 2015-07-18

The linux command is:

/ raqsoft/esProc/bin/esprocx.sh / esproc/ etl.dfx 2015-07-18

Application and promotion

As a data computing middleware (DCM), the background data source provided by the aggregator can support a variety of front-end applications, not only when the front end is multi-dimensional analysis, but also can include scenarios such as large screen display, management of the cockpit, real-time reports, large data inventory reports, report batch subscription and so on.

In addition, the background data source formed by the aggregator can also cache the data. At this time, the data computing gateway and routing implemented by the aggregator can intelligently switch between the data cached by the aggregator and the data warehouse, thus solving the performance requirements that the data warehouse cannot meet, such as the common scenarios where hot and cold data are calculated separately. (for more information, please see "aggregator implements Computational routing to optimize BI background performance").

In other applications, the aggregator can also be completely separated from the database and play the role of lightweight multi-dimensional analysis background, which is equivalent to an independent small and medium-sized data warehouse or data Mart. (for more information, see "aggregator implements lightweight multidimensional analysis background". )

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.