In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
When querying data, we sometimes encounter inventory reports with a large amount of data. The query conditions entered by the user are very broad, and tens of millions of rows or even hundreds of millions of records may be found from the database. If you wait for all these records to be retrieved before generating the report presentation, it will take a long time, the user experience is bad; and the report generally uses the in-memory computing mechanism, in most cases can not hold so much data. Therefore, we generally use paging rendering to render the first page as quickly as possible, and then we can turn the page and display it at will, displaying only one page at a time, without causing memory overflow.
So, how do general reporting tools or BI systems implement this mechanism?
The vast majority of products are made using database paging.
Specifically, it uses the syntax provided by the database to return records within a specified range of line numbers. The interface calculates the line number range according to the current page number (showing a fixed number of rows per page) as a parameter into the SQL, the database will only return the records of the current page, thus achieving the effect of paging presentation.
In doing so, there will be two problems:
1. The efficiency is poor when turning the page.
It is usually faster to render the first page in this way, but if you turn the page backwards, the SQL of the original number will be executed again and the records involved in the previous page will be skipped. Some databases do not have the OFFSET keyword, so they can only be skipped by the interface (removed and discarded). For example, ORACLE also needs to use a subquery to generate a sequence number to filter with the sequence number. These actions will be a waste of time, and the first few pages do not feel obvious, but if the page number is relatively large, there will be a sense of waiting.
two。 There may be data inconsistencies
In general, the SQL issued each time the number is fetched by page is independent. In this way, if there is another insert and delete action in the database between the two page fetch, the data fetched will be up-to-date and may not match the original page number. For example, after page 1 takes out 20 rows of records, before page 2, one row is deleted from the 20 rows of records on page 1, then the first line of page 2 will be the original row 22, and the original line 21 will fall to page 1. You have to turn the page back to see it. If you do summary statistics based on these data, there will be wrong results.
There is also a less commonly used method. Issue a fetch SQL to the database to generate a cursor, fetch a page from it and render it, but do not terminate the cursor, and continue to fetch the number when you fetch the next page. This method can overcome the above two problems, and there will be no inconsistencies, but most database cursors can only be fetched backward rather than backwards, so the interface can only turn the page backwards, which is difficult to explain to business users, so this method is rarely used.
It can also be a combination of the two methods, using the latter method when turning the page backwards, and once the page turning forward occurs, the fetch SQL is re-executed. This is a little better than the experience of counting each page, but it doesn't fundamentally solve the problem.
Is there any other good idea?
The fetch thread and the rendering thread are two asynchronous threads. After sending out the SQL, the fetch thread continuously fetches the data and caches it to the local storage. The rendering thread calculates the number of rows according to the number of pages to the local cache to get the data display. In this way, as long as the data that has been fetched can be presented quickly, there will be no sense of waiting, and it is normal and understandable that the unfetched data needs to wait; while the fetching thread only involves a sentence of SQL, which is the same transaction in the database, and there will be no inconsistency. In this way, both problems can be solved. However, this requires the design of a storage format that can randomly access records by line number, otherwise the records will be counted by traversing, and the response will still be slow.
When the current database system does not directly support this mechanism, it is only the report tools or BI systems that are involved in writing these programs. For users with large list report presentation requirements, it is necessary to carefully examine these function points.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.