ES solves the problem of deep paging and the example analysis of implementing Scroll query API 07/06 Update SLTechnology News&Howtos

ES solves the problem of deep paging and the example analysis of implementing Scroll query API

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces ES to solve the problem of deep paging and the implementation of Scroll query API example analysis, the content is very detailed, interested friends can refer to, hope to be helpful to you.

1. Preface:

ES's normal paging query has a deep paging limit, which defaults to 10000. (because the more you divide later, the more memory is consumed), so you have to query more than 10,000 pieces of data, or narrow the scope of the query, or do not use other methods.

ES provides scroll queries. The scroll_id is obtained in the first query, and the next query is directly queried through scroll_id. Scroll is equivalent to maintaining a snapshot of the current index segment, which is the snapshot when you execute the scroll query. Any incoming data from the new index after this query will not be queried in this snapshot. But compared with from and size, instead of querying all the data and removing the unwanted parts, it records a read location to ensure that the next read continues quickly. And it cannot be sorted, it can only be sorted according to the default document order, which is more suitable for querying and scanning all data.

1. Look at a DEMOprivate static void scrollSearch (String indexName, String typeName, String...) Ids) {IdsQueryBuilder qb = QueryBuilders.idsQuery () .addIds (ids); SearchResponse sResponse = client.prepareSearch (indexName) .setTypes (typeName) .setQuery (qb) .setScroll (new TimeValue (5000)) .setSize (50) .execute () .actionGet (); int tShards = sResponse.getTotalShards () Long timeCost = sResponse.getTookInMillis (); int sShards = sResponse.getSuccessfulShards (); System.out.println (tShards+ "," + timeCost+ "," + sShards); while (true) {SearchHits hits = sResponse.getHits (); SearchHit [] hitArray = hits.getHits (); for (int I = 0; I)

< hitArray.length; i++) { SearchHit hit = hitArray[i]; Map fields = hit.getSource(); for(String key : fields.keySet()) { System.out.println(key); System.out.println(fields.get(key)); } } sResponse = client.prepareSearchScroll(sResponse.getScrollId()).setScroll(new TimeValue(5000)).execute().actionGet(); //Break condition: No hits are returned if (sResponse.getHits().getHits().length == 0) { break; } } }2、自己实现的基础API// 接口定义 /** * Scroll 游标全量数据查询（不支持排序，只按照doc_id排序） * * @param clazz 实体类类型 * @param query 查询参数 * @param scrollId 游标 * @param size 一次拿取数据多少。 * @param * @return */ EsScrollResponse listByQueryScroll(Class clazz, IQuery query, String scrollId,int size); // 具体实现@Overridepublic EsScrollResponse listByQueryScroll(Class clazz, IQuery query, String scrollId, int size) { // 参数校验 if (size < 1 || size >

{throw new RuntimeException ("ES query the number of requests is out of range, please be between 1-200");} / / return result initialization EsScrollResponse response = new EsScrollResponse (); List result = new ArrayList (); / / get document information Document document = clazz.getAnnotation (Document.class); / / global query return parameters initialization. SearchResponse searchResponse = null; / / the first query has no cursors, get data and record cursors if (StringUtils.isEmpty (scrollId)) {SearchRequestBuilder searchRequestBuilder = esDataSource.getClient (). PrepareSearch (document.indexName ()) .setTypes (document.type ()) .setQuery (query.buildQuery ()) .setScroll (TimeValue.timeValueMinutes (ES_TIME_OUT_MINUTES)) .setSize (size) SearchResponse = searchRequestBuilder.setTimeout (TimeValue.timeValueMinutes (ES_TIME_OUT_MINUTES)). Execute (). ActionGet ();} else {/ / query by cursor SearchScrollRequestBuilder searchScrollRequestBuilder = esDataSource.getClient (). PrepareSearchScroll (scrollId) .setScroll (TimeValue.timeValueMinutes (ES_TIME_OUT_MINUTES)); searchResponse = searchScrollRequestBuilder.execute (). ActionGet () } / / for (SearchHit hit: searchResponse.getHits ()) {T rt = JSON.parseObject (hit.getSourceAsString (), clazz); result.add (rt);} response.setData (result); response.setScrollId (searchResponse.getScrollId ()); return response The encapsulated entity of} / * ES Scroll query * / @ Datapublic class EsScrollResponse {/ * stores data * / private List data; / * specified cursor * / private String scrollId;} three or two paging methods are compared.

Paging mode

Description

Advantages

Shortcoming

Scene

From + size

The commonly used paging method, specifying the paging size and offset, can directly obtain the required data. However, the memory consumption is particularly high, and the speed is very general. When we specify from = 1000000 node = 10, each node will take out the data of top 100000, and then sort it in a summary. Assuming there are 3 node, then we need to take out 3000,000 pieces of data for sorting, and then take the data of top10 to return. So the default limit for from+size for ES is set to 10000. When the amount of data reaches 100, 000 and 1 million, this paging method is obviously unreasonable.

The case of small amount of data is the most convenient to use, the flexibility is good, the realization is simple, the memory consumption is large, the speed is general, the situation of large amount of data is faced with the problem of deep paging, the amount of data is small and can tolerate deep paging problem scroll (cursor)

A snapshot query form. Once a snapshot is formed, the new data cannot be found in this scrolling query, and scroll cannot be sorted or from can be specified. If we want to view the data with a specified page number, we must take out all the data before the page and discard it. Therefore, scroll is generally used to export full data.

When exporting full data, the performance had better not reflect the real-time performance of the data (because it is a snapshot version), the maintenance cost is high, you need to maintain a scroll_id, and you do not support sorting. You can only sort all the data according to doc_id to solve the deep paging problem and implement Scroll query API. This is the example analysis shared here. I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.