How to solve the problem of deep paging in ES 04/19 Update SLTechnology News&Howtos

How to solve the problem of deep paging in ES

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly introduces "how to solve the problem of deep paging in ES". In daily operation, I believe many people have doubts about how to solve the problem of deep paging in ES. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the doubts of "how to solve the problem of deep paging in ES". Next, please follow the editor to study!

ES Deep Paging question:

The default paging method for ES is in the form of from+ size, similar to MySQL's paging offset+limit. When the amount of requested data is large, Elasticsearch imposes restrictions on paging because of the high performance consumption. For example, if we query 1000 pieces of data, suppose we have 5 shards, then each shard needs to return 1000 pieces of data to coordinating node, while coordinating node needs to receive 5-1000 pieces of data, sort and return 1000 pieces of data to the client. Even if there are only _ doc _ id and _ score for each piece of data, the amount of data is very large. If you request a large amount of data, it is easy to cause the OOM of ES. There is a setting index.max_result_window in ES, which defaults to 10000 pieces of data. If the paged data exceeds 10,000 items, the result is refused to be returned. If the cluster is well configured and the number of query requests is not very large, you can enlarge this parameter appropriately.

Solution:

1: traversing using scroll

Scroll is divided into two steps: initialization and traversal. All search results that meet the search criteria are cached during initialization, which can be thought of as snapshots. During traversal, data is taken from this snapshot, that is, inserting, deleting and updating the index data after initialization will not affect the traversal results. Therefore, scroll is not suitable for real-time search, but more suitable for background batch tasks.

API description:

1) initialize

POST / book/_search?scroll=1m&size=2 {"query": {"match_all": {}

Ergodic

GET / _ search/scroll {"scroll": "1m", "scroll_id": "values queried in step 1"}

The reference for using the java RestHighLevelClient code is as follows:

@ Autowiredprivate RestHighLevelClient restHighLevelClient;public Result scrollSearch (... Query parameters) {BoolQueryBuilder queryBuilder.te = QueryBuilders.boolQuery (); / / add your own search criteria. QueryBuilder.must (QueryBuilders.termQuery ("type", "category to which goods belong); queryBuilder.must (QueryBuilders.matchQuery (" name "," trade name "); / / search SearchSourceBuilder searchSourceBuilder = SearchSourceBuilder.searchSource (); searchSourceBuilder.query (queryBuilder); / / sort searchSourceBuilder.sort (SortBuilders.fieldSort (" order ") .order (SortOrder.DESC)); / / search results SearchResponse searchResponse = null / / according to the actual situation, determine whether to call multiple times or while to traverse all if (StringUtils.isBlank ("scrollId")) {/ / the first screen searchSourceBuilder.size ("number of entries per query"); SearchRequest searchRequest = new SearchRequest (); searchRequest.indices ("index name") .source (searchSourceBuilder) SearchRequest.scroll (new Scroll (TimeValue.timeValueMinutes (scrollKeepAliveTime); searchResponse = restHighLevelClient.search (searchRequest, RequestOptions.DEFAULT);} else {/ / then scroll SearchScrollRequest searchScrollRequest = new SearchScrollRequest ("scrollId") based on the last id; searchScrollRequest.scroll (new Scroll (TimeValue.timeValueMinutes (scrollKeepAliveTime); searchResponse = restHighLevelClient.scroll (searchScrollRequest, RequestOptions.DEFAULT) } SearchHit [] hits = searchResponse.getHits (). GetHits (); / / process search results GoodsDTO result= handleSearchData (hits) according to business requirements; / / scrollId, scroll down using String scrollId = searchResponse.getScrollId (); return Result.succcess (result);}

2: use search after

In order to obtain the document information of the next page in real time, search_after paging is based on the last data of the previous page to determine the location of the next page. At the same time, in the process of paging request, if there are additions or deletions of index data, these changes will also be reflected on the cursor in real time. This way is provided only after es-5.X. In order to find the last piece of data on each page, each document's sort field must have a globally unique value using _ id.

API description:

GET / book/_search {"query": {"match_all": {}}, "size": 2, "sort": [{"_ id": "desc"}]} GET / book/_search {"query": {"match_all": {}}, "size": 2, "search_after": [3], "sort": [{"_ id": "desc"}]}

The data on the next page depends on the information of the last item on the previous page, so you can't skip the page.

The reference for using the java RestHighLevelClient code is as follows:

@ Autowiredprivate RestHighLevelClient restHighLevelClient;public Result searchAfter (... Query parameters) {SearchRequest searchRequest = new SearchRequest (index); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder (); searchSourceBuilder.query (. Search criteria); searchSourceBuilder.size (1000); searchSourceBuilder.sort ("_ id", SortOrder.ASC); searchSourceBuilder.searchAfter ("id of the last piece of data on the previous page"); searchRequest.source (searchSourceBuilder); SearchResponse searchResponse = client.search (searchRequest, RequestOptions.DEFAULT); SearchHit [] hits = searchResponse.getHits (). GetHits () / / according to business needs, deal with search results GoodsDTO result= handleSearchData (hits); return Result.succcess (result);} at this point, the study on "how to solve the problem of deep paging in ES" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.