What is the external calculation of big data report data? 07/13 Update SLTechnology News&Howtos

What is the external calculation of big data report data?

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

What is the external calculation of big data report data? in view of this problem, this article introduces the corresponding analysis and solutions in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

In the report application, the report for the historical data query accounts for a large proportion, the characteristics of this kind of report are: first, the data change is small, the historical data of the query will hardly change; second, the amount of data is large, and will continue to increase over time. If these historical data are always stored in the database, because the JDBC performance of most databases is very low (the conversion of data objects in the fetching process is an order of magnitude slower than reading data from a file), the performance of the report will degrade sharply when a large amount of data or concurrency is involved. Obviously, if we can move these little historical data out of the database and use the file system to store them, we can achieve much higher IO performance than the database, thus improving the overall performance of the report.

However, the report can not use the original data directly, it needs to be rendered after operation (such as query summary), and the file itself has no computing power, so it can not provide the results needed by the report. In addition, the amount of data stored in files is generally very large, so it is difficult to achieve efficient computing only on the report presentation side.

For drying reports, this requirement can be achieved with the help of the built-in centralized engine, called out-of-library file calculation, or data external calculation. Supported file types include text, Excel, JSON format files, etc., as well as more efficient binaries.

Through data external calculation, the historical data with a large amount of data can be separated from the database, which can not only meet the performance requirements of historical query reports, but also realize the calculation of mixed data sources (file + database). And then achieve a large amount of real-time data query, such as reading a large amount of historical data from the file system, reading a small amount of real-time data from the database for mixed computing. Thus, on the one hand, it can avoid the IO bottleneck of the database, quickly improve the performance of the report, and increase the scope of data query; on the other hand, by removing the historical data, the database can focus on ensuring the consistency of the data of the business system, instead of wasting resources on a large number of historical query tasks, which is also a means of database optimization.

The following is an example to illustrate the steps to implement external data computing (combined with the implementation of the aggregator):

1. Export the historical data in the database to a file

Users can choose the appropriate method to export historical data to a file, of course, this process can also be done using an aggregator, such as exporting data to text. If you want higher performance, you can also use a binary file format that is 2-5 times faster than text. Use the following code in the aggregator to convert a text file to a binary format.

File ("eaves / order details .b") .export @ b ("file / order details .txt" .cursor ())

2. Use the built-in pooling engine to read data files.

When the data is external, the dry report can use the file as a data source to design the report, such as counting the number of orders and the amount of orders by customers according to the order details, because the original order data is very large, so the file is read in a stream (file cursor) step by step.

The parameters used in the script and their meanings are as follows:

Script:

Cursor@t () 2=A1.select (shipper country = = county & & shipper area = = area & & shipper city = = city & & ordering date > = begin & & ordering date = begin & & ordering date

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.