When the report is connected to hive, how to query in pages when the amount of data is large. 07/12 Update SLTechnology News&Howtos

When the report is connected to hive, how to query in pages when the amount of data is large.

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Editor to share with you the report even hive when the amount of data is relatively large how to page query, I hope you will have something to gain after reading this article, let's discuss it together!

Hive provides a rownum mechanism similar to Oracle, like this (less efficient):

Select * from (select row_number () over (order by create_time desc) as rownum,u.* from user u) mm where mm.rownum between 10 and 15

Alternatively, if there is a unique identity field in the table, you can also use this field and limit. For example:

Get the first page of data:

Note: at the same time, you need to record that the largest id in these 10 items is preId as a condition for the next page.

Select * from table order by id asc limit 10

Get the second page of data:

Note: also save the largest id replacement preId in the data.

Select * from table where id > preId order by id asc limit 10

For database paging, here has analyzed the existing problems large list report should do? Some ideas for improvement are also given, which can be referred to:

The fetch thread and the rendering thread are two asynchronous threads. After sending out the SQL, the fetch thread continuously fetches the data and caches it to the local storage. The rendering thread calculates the number of rows according to the number of pages to the local cache to get the data display. In this way, as long as the data that has been fetched can be presented quickly, there will be no sense of waiting, and it is normal and understandable that the unfetched data needs to wait; while the fetching thread only involves a sentence of SQL, which is the same transaction in the database, and there will be no inconsistency. In this way, both problems can be solved. However, this requires the design of a storage format that can randomly access records by line number, otherwise the records will be counted by traversing, and the response will still be slow.

Draw a picture and feel it:

② and ③ are two threads, one is responsible for fetching the number cache, and the other is responsible for reading the cache for report rendering.

It looks a little complicated and can be used directly with well-made tools: the implementation of massive lists and grouped reports

You can also export Excel and print.

After reading this article, I believe you have a certain understanding of "how to query pages with a large amount of data when the report is even hive". If you want to know more about it, please follow the industry information channel. Thank you for your reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.