How to use filter to improve query efficiency in ES 07/12 Update SLTechnology News&Howtos

How to use filter to improve query efficiency in ES

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail how to use filter to improve query efficiency in ES. The editor thinks it is very practical, so I share it with you for reference. I hope you can get something after reading this article.

Brief introduction of bool query

Bool queries in Elasticsearch (hereinafter referred to as ES) are also widely used in business. In some non-real-time paging queries and export scenarios, we often use bool queries to combine various query conditions.

Bool query includes four kinds of clauses

Must

Filter

Should

Must_not

I will only introduce must and filter clauses here, because that is what we are going to talk about today. Others can check the official documents by themselves.

Must, the returned document must meet the conditions of the must clause and participate in calculating the score

Filter, the returned document must meet the conditions of the filter clause. But unlike Must, the score is not calculated and caching can be used

From the above description, you should already know that if you only look at the results of the query, must and filter are the same. The difference is that the scene is different. Use must if the result needs to be scored, otherwise consider using filter.

Just say more abstract, look at an example, the following two statements, the result of the query is the same.

Use filter to filter the time range

GET kibana_sample_data_ecommerce/_search

{

"size": 1000

"query": {

"bool": {

"must": [

{"term": {

"currency": "EUR"

}}

]

"filter": {

"range": {

"order_date": {

"gte": "2020-01-25T23:45:36.000+00:00"

"lte": "2020-02-01T23:45:36.000+00:00"

}

Use must to filter the time range

GET kibana_sample_data_ecommerce/_search

{

"size": 1000

"query": {

"bool": {

"must": [

{"term": {

"currency": "EUR"

}}

{"range": {

"order_date": {

"gte": "2020-01-25T23:45:36.000+00:00"

"lte": "2020-02-01T23:45:36.000+00:00"

}

}}

]

}

The results of the query are all

{

"took": 25

"timed_out": false

"_ shards": {

"total": 1

"successful": 1

"skipped": 0

"failed": 0

}

"hits": {

"total": {

"value": 1087

"relation": "eq"

}

...

The more efficient principle of filter

You already knew the basic usage and difference between must and filter in the last section. To put it simply, if you don't need to score for your business scenario, using filter can really make your query efficiency fly.

To illustrate why filter queries are efficient, we need to introduce a concept of ES, query context and filter context.

Query context

Query context focuses on how much the document matches the conditions of the query, and the degree of the match is determined by the correlation score, and the higher the score, the better the match. Therefore, this kind of query not only pays attention to whether the document meets the query conditions, but also needs to calculate the correlation score.

Filter context

Filter context is concerned with whether the document matches the query criteria, and there are only two results, yes and no. There are no other additional calculations. One of its common scenarios is to filter the time range.

And filter context will be automatically cached by ES, which further improves the efficiency.

For bool queries, must uses query context, while filter uses filter context.

We can verify it through an example. Continuing with the example in the first section, let's take a look at the detailed process of ES's query through the search profiler that comes with kibana.

The execution of a query using must looks like this:

You can clearly see that the correlation score is calculated for this query, and the score portion accounts for about one-tenth of the query time.

I will not take a screenshot of the filter query, but the difference is that this part of score is 0, that is, the correlation score is not calculated.

In addition to the difference in whether to calculate correlation scores, frequently used filters will be automatically cached by Elasticsearch to improve performance.

I once optimized a business query scenario in a project, where the number of indexed documents online was about 30 million. After changing it to filter, the query speed was almost twice as fast.

I took some pictures, so you can feel it.

You can see that the whole time has been shortened by half.

This is the end of this article on "how to use filter to improve query efficiency in ES". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.