What is big data's report? 07/06 Update SLTechnology News&Howtos

What is big data's report?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

What is big data report? in view of this question, this article introduces in detail the corresponding analysis and answers, hoping to help more partners who want to solve this problem to find a more simple and feasible way.

In the actual business, some reports are relatively "large", and the number of rows of report data can reach tens of millions or even hundreds of millions. This kind of report with a large number of rows is usually called "large report". In most cases, large reports are list detail reports, and a few are grouped reports.

Big data report query usually does not take out all the records at once and then give them to the front end to be presented, because this will take a long time, the user experience is very poor, and the report server memory is too much.

The common way is to present a large report through paging, taking only a small part of the data at a time, giving it to the front end immediately after the end of the number, and then taking out the data of the corresponding number of pages when the page number changes, which can speed up the presentation of the report. users have little sense of waiting.

How can it be realized? There are several ways.

1. Database paging

The most common practice in the industry is to use database paging, specifically, to use the syntax provided by the database to return records within a specified range of line numbers. The interface calculates the line number range according to the current page number (showing a fixed number of rows per page) as a parameter into the SQL, the database will only return the records of the current page, thus achieving the effect of paging presentation.

Mainly with the help of the ability of relational database itself, the implementation of each database will be different, Oracle can use rownum,mysql can be limit, the specific implementation of a lot of information on the Internet will not be discussed here.

Is there any deficiency in database paging? Any technology has its scope of application, and the problem of data paging mainly focuses on the following four points.

(1) the efficiency is poor when turning pages.

Rendering the first page in this way is usually faster, but when you turn the page backwards, the fetch SQL used will be executed again and the records involved in the previous page will be skipped. For some databases without the OFFSET keyword, the interface can only skip the data (take it out and discard it). For example, ORACLE also needs to use a subquery to generate a sequence number before it can be filtered with the sequence number. These actions will reduce efficiency, waste time, the first few pages do not feel obvious, but if the page number is relatively large, there will be a sense of waiting.

(2) data inconsistencies may occur.

By turning pages in this way, you need to issue SQL independently each time you fetch the number by page. In this way, if there are insert and delete actions between the two pages, then the number taken reflects the latest data and is likely to mismatch the original page number. For example, 20 rows per page, before the user turns page 2, one of the 20 rows of records contained in page 1 is deleted. Then the first line of page 2 taken out by the user when turning the page is actually line 22 before the delete operation, while the original line 21 actually falls to page 1, and if you want to see it, you have to turn back to page 1 to see it. If we also do summary statistics based on the data taken out, there will be errors and inconsistent results.

In order to overcome these two problems, we sometimes use another way to fetch the number from the database with the SQL cursor, after taking out a page to render, but not terminating the cursor, and then continue to fetch the number when turning to the next page. This method can effectively overcome the above two problems, the page turning efficiency is high, and there will be no inconsistency. However, the vast majority of database cursors can only be fetched from the front and back in one way, and the interface can only turn the page backwards, which is difficult to explain to business users, so this method is rarely used.

Of course, we can also combine these two methods, using cursors when turning the page backward, and once we need to turn the page forward, we can re-execute the fetch SQL. This is better than the experience of re-fetching each page, but it doesn't fundamentally solve the problem.

(3) grouping report cannot be realized.

In addition to inventory reports, sometimes we have to present grouped reports with a large amount of data, which contain grouped, grouped summary and grouped detail data. We know that reading a fixed number of records (one or more pages) at a time according to the number of pages turned, there is no guarantee that a complete packet can be read at once, while packet rendering and grouping summarization are required to operate on the basis of the whole set of data. otherwise, something will go wrong.

(4) other data sources cannot be used

Database paging is based on the ability of relational database itself, but it doesn't work for non-relational database. Just imagine, how does NoSQL page with SQL and how does the text page?

We have discussed the topic of diverse data sources of reports. today, with the rapid expansion of data scale, it is very common to produce reports based on non-relational databases.

Is there a better way than database paging?

two。 Hard coding implementation

We find that the paging method based on the database is strongly dependent on the data source and can not meet the needs of other data source types, that is, it is tightly coupled with the database. We need an implementation of low-coupling data sources to deal with diverse data source scenarios and to solve the problems of efficiency, accuracy and grouping presentation at the same time.

According to the mainstream idea of problem solving, this goal can be achieved by coding, which is roughly as follows:

When based on database

The fetch thread and the rendering thread are two asynchronous threads. After sending out the SQL, the fetch thread continuously fetches the data and caches it to the local storage. The rendering thread calculates the number of rows according to the number of pages to the local cache to get the data display. In this way, as long as the data that has been fetched can be presented quickly, there will be no sense of waiting, and it is normal and understandable that the unfetched data needs to wait; while the fetching thread only involves a sentence of SQL, which is the same transaction in the database, and there will be no inconsistency. In this way, both problems can be solved. However, this requires the design of a storage format that can randomly access records by line number, otherwise the records will be counted by traversing, and the response will still be slow.

When based on non-database

The same setting of fetching and rendering of two asynchronous threads reads data in batches through file cursors (or batch fetch interfaces provided by other data sources) and caches them locally, and the rendering phase is exactly the same as above.

Then use the open interface of the report tool to interact with the front-end report to complete the paging presentation of the report.

When making a group report

It is necessary to take out the complete grouping at once, summarize the grouping in the code, insert the summary result into the result set and return it to the front-end report for presentation, and set the grouping record flag bit at the same time, so that the front-end report can set different display effects (such as bold and red) for the grouped rows when it is rendered.

The effect after implementation is similar to the following:

The fetch thread keeps fetching the cache, the rendering thread reads the data from the cache, and the total number of pages is constantly changing.

From the above description, we can see that although hard coding can be implemented, it is very complex. In addition to multithreaded programming, we should also consider caching data storage forms, file cursors, group reading and summarization, and open enough flexible interfaces when rendering with the help of other report tools. These are all challenges.

And we only think about rendering, what if we want to export Excel? What if I still have to print? After all, since the report can be checked, it should be able to export and print. These demands are real in the financial and manufacturing industries.

(3) use report tools that support large reports

If the front end uses the report tool to develop the report, it is more direct to choose a report tool that directly supports the presentation, export and printing of large reports. The tool implements the large report mechanism of two asynchronous threads and solves the problems mentioned above, all of which are encapsulated and used directly.

The answer to the question about what is big data report is shared here. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.