How to analyze the strange query hit number in Elasticsearch 07/09 Update SLTechnology News&Howtos

How to analyze the strange query hit number in Elasticsearch

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you questions about how to analyze the strange query hits in Elasticsearch. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

According to the feedback of the business unit, in the index of a certain month, when the query condition remains unchanged and only one index prefix is added to the request, the number of hits returned by the search decreases. As shown in the following figure, when querying from the index skylay*_alarm-2020.02*, the total number of hit results total is 49.

Cymbal

However, when an index prefix skylar*_alarm-2020.01* is added to query the two sets of indexes from skylay*_alarm-2020.02*,skylar*_alarm-2020.01*, the total number of hit results is reduced to 15:

Cymbal

According to the conventional understanding, adding an index to the query condition, the hit result set can only be more, should not be reduced, careful observation of the returned results, found that there are a lot of skipped in _ shards, which is very suspicious, and it seems that it is mostly related to him.

Analysis.

When analyzing query problems, the first thing that comes to mind is profile. Add "profile": true to the request body, but unfortunately you don't see anything of value. Then google to see if anyone else has encountered skipped problems. It is found that many people in the community have asked questions related to skipped, but basically no one has answered them. The only valuable answers are as follows:

Are shards skipped when a range query does not create hits

Yes, there is a pre-flight search phase called the can match search phase that removes all shards from a search request that a query can not match (e.g.if a range query does not overlap with the range of values for a shard). This feature exists since 5.6.

This paragraph roughly means that starting from version 5.6, fragments that will not match are skipped according to certain conditions before the real query starts, for example, in the process of range query, the values of some fragments are not in the query range.

In addition, there is no more content, we can only continue the analysis from the source code level. The function AbstractSearchAsyncAction#buildSearchResponse is responsible for building the query to return the results. The number of skipped depends on a counter in this class. Continue to scroll up to find the CanMatchPreFilterSearchPhase#getIterator where the shard is marked as skip:

Private GroupShardsIterator getIterator (...) {

For (SearchShardIterator iter: shardsIts) {

If (possibleMatches.get (iTunes +)) {

Iter.reset ()

} else {

/ / marked as skip, and will not be checked when querying

Iter.resetAndSkip ()

}

Return shardsIts

}

Due to the need to give priority to solving online problems, we will not analyze the conditions under which sharding is set to skip for the time being. CanMatchPreFilterSearchPhase will be implemented only if certain conditions are met, that is, if we do not go to this stage, we will not check whether the shard will be skip. Check conditions are implemented in TransportSearchAction#shouldPreFilterSearchShards, where code is no longer posted. To sum up, common conditions include the following: do not go through the CanMatchPreFilterSearchPhase (temporarily called "pre-filtering") process when one of the following conditions is met.

Query with Global Aggregation

There is aggregation in the query, and the aggregation is set to: "min_doc_count": 0

The number of shards to be queried is less than 128, which is specified by the query parameter pre_filter_shard_size.

Suggester is used in the query

Solve

Once you have found conditions that can skip "pre-filtering", it is relatively easy to solve the problem, and there are two ways that do not affect the existing business:

For queries with aggregate requests, you can add "min_doc_count": 0 to the aggregate request

For queries without aggregate requests, you can adjust the pre_filter_shard_size parameter in the request to make it larger than the number of fragments to be queried

For example: _ search?pre_filter_shard_size=1000

Of course, you can also use the pre_filter_shard_size parameter uniformly. When this problem is solved, skipped fragmentation disappears:

There are few times when you need to look at the code in the process of using ES to solve the problem, but the principle of slicing skipped in the query process is not specified in the official document, or even the meaning of this field is not explained in the manual. This time, the ES version used by the business side is 5.6.12. It is uncertain whether there is a similar problem with the new version of ES. Take the time to analyze the principle of "pre-filtering" on the new version in the future.

The above is the problem of how to analyze the strange number of hits in Elasticsearch. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.