Common operations of ElasticSearch: query and aggregation articles 07/09 Update SLTechnology News&Howtos

Common operations of ElasticSearch: query and aggregation articles

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

[TOC]

0 description

Based on es 5.4and es 5.6.This is a list of queries that are often used in personal work (only Java API is used at work). If you need to see a complete one, you can refer to the official documentation.

Https://www.elastic.co/guide/en/elasticsearch/reference/5.4/search.html .

1 query

First use a quick start to introduce, and then the various queries listed later are used more (in my work environment), other less useful are not listed here.

1.1.Quick start 1.1.1 query all GET index/type/_search {"query": {"match_all": {}

GET index/type/_search1.1.2 pagination (take term as an example) GET index/type/_search {"from": 0, "size": 100, "query": {"term": {"area": "GuangZhou"}} 1.1.3 contains specified fields (such as term) GET index/type/_search {"_ source": ["hobby", "name"] "query": {"term": {"area": "GuangZhou"}} 1.1.4 sort (take term as an example)

Sort a single field:

GET index/type/_search {"query": {"term": {"area": "GuangZhou"}}, "sort": [{"user_id": {"order": "asc"}}, {"salary": {"order": "desc"}]} 1.2 full-text query

The query field is indexed and analyzed, and the word splitter (or search word splitter) for each field is applied to the query string before execution.

1.2.1 match query {"query": {"match": {"content": {"query": "Lippi Evergrande", "operator": "and"}}

Operator defaults to or, that is, "Lippi Evergrande" is partitioned as "Lippi" and "Evergrande". Whenever one of the two appears in content, it will be searched; when set to and, only if it appears at the same time will be searched.

1.2.2 match_phrase query

The document will not be searched until it meets the following two criteria:

(1) after word segmentation, all entries should appear in this field. (2) the order of entries in the field should be the same {"query": {"match_phrase": {"content": "Lippi Evergrande"}} 1.3.

Term search precisely matches the terms stored in the inverted index, and term-level queries are used for structured data such as numbers, dates, and enumerated types.

1.3.1 term query {"query": {"term": {"postdate": "2015-12-10 00:41:00"}} 1.3.2 terms query

The upgraded version of term, such as the postdate field queried above, can be configured with multiple fields.

{"query": {"terms": {"postdate": ["2015-12-10 00:41:00", "2016-02-01 01:39:00"]}

Because term is an exact match, don't ask, how do you set and for the relationship in []? How is this possible? since it is an exact match, it is impossible for a field to have two different values.

1.3.3 range query

For documents that match data types, date types, or string fields within a range, note that only one field can be queried and cannot be applied to multiple fields.

Numerical value:

{"query": {"range": {"reply": {"gte": 245,250}

The supported operators are as follows:

Gt: greater than, gte: greater than or equal to, lt: less than, lte: less than or equal

Date:

{"query": {"range": {"gte": "2016-09-01 00:00:00", "lte": "2016-09-30 23:59:59", "format": "yyyy-MM-dd HH:mm:ss"}

Format is fine without adding it, if the time format is correct.

1.3.4 exists query

Returns a document with at least one non-null value in the corresponding field, that is, the field has a value (this concept will be explained later).

{"query": {"exists": {"field": "user"}

Refer to the instructions in "from Lucene to Elasticsearch: full-text Retrieval practice".

The following documents match the above query:

Document description {"user": "jane"} has user field and is not empty {"user": ""} has user field, value is empty string {"user": "-"} has user field, value is not empty {"user": ["jane"]} has user field, value is not empty {"user": ["jane", null]} has user field, at least one value is not empty.

The following documents will not be matched:

Document description {"user": null} although there is a user field, the value is empty {"user": []} although there is a user field, but the value is empty {"user": [null]} although there is a user field, but the value is empty {"foo": "bar"} there is no user field 1.3.5 ids query

Queries the document with the specified id.

{"query": {"ids": {"type": "news", "values": "2101"}

The type is optional, or you can specify multiple id as data.

{"query": {"ids": {"values": ["2101", "2301"]}} 1.4 compound query 1.4.1 bool query

Because of the work about es is to do aggregation, statistics, classification of projects, often have to do a variety of complex multi-condition query, so in fact, bool query is used very much, because the number of query conditions is variable, so when dealing with the logical idea, the outer layer is carried by a large bool query. (of course, its Java API is used in the project.)

Bool query can combine any number of simple queries, and the logic between each simple query is as follows:

The attribute indicates that the must document must match the query condition under the must option. The ANDshould document equivalent to the logical operation can match the query condition under the should option or not. The ORmust_not equivalent to the logical operation is the opposite of must, and the document matching the query condition under this option will not be returned. Like filter and must, the document matching the query condition under the filter option will be returned, but the filter does not score and only plays the filtering function.

An example is as follows:

{"query": {"bool": {"must": {"match": {"content": "Lippi"}}, "must_not": {"match": {"content": "CSL"}

It should be noted that there can be only one must, must_not, should, and filter under the same bool.

What if you want to have more than one must, such as matching "Lippi" and "Chinese Super League" at the same time, but deliberately separate the two keywords (because in fact, one must, then use match, and operator is and)? Note that arrays are used under must, and then there are multiple match objects in it:

{"size": 1, "query": {"bool": {"must": [{"match": {"content": "Lippi"}}, {"match": {"content": "Evergrande"}]} "sort": [{"id": {"order": "desc"}]}

Of course, the array under must can also be multiple bool query conditions for more complex queries.

The above query is equivalent to:

{"query": {"bool": {"must": {"match": {"content": {"query": "Lippi Evergrande", "operator": "and"}} "sort": [{"id": {"order": "desc"}]} 1.5 nested queries

Add the following index first:

PUT / my_index {"mappings": {"my_type": {"properties": {"type": "nested", "properties": {"first": {"type": "keyword"}, "last": {"type": "keyword"} Group: {"type": "keyword"}

Add data:

PUT my_index/my_type/1 {"group": "GuangZhou", "user": [{"first": "John", "last": "Smith"}, {"first": "Alice", "last": "White"}]} PUT my_index/my_type/2 {"group": "QingYuan", "user": [{"first": "Li" "last": "Wang"}, {"first": "Yonghao", "last": "Ye"}]}

Query:

Simpler query:

{"query": {"nested": {"path": "user", "query": {"term": {"user.first": "John"}

More complex queries:

{"query": {"bool": {"must": [{"nested": {"path": "user", "query": {"term": {"user.first": {"value": "Li"} {"nested": {"path": "user" "query": {"term": {"user.last": {"value": "Wang"} 1.6 add: array query and testing

Add an index:

PUT my_index2 {"mappings": {"properties": {"message": {"type": "text"}, "keywords": {"type": "keyword"}

Add data:

PUT / my_index2/my_type/1 {"message": "keywords test1", "keywords": ["Beauty", "Animation", "Movie"]} PUT / my_index2/my_type/2 {"message": "keywords test2", "keywords": ["Movie", "Beauty", "Advertising"]}

Search for:

{"query": {"term": {"keywords": "advertisement"}}

Note1: note that when setting the field type, keywords is set to keyword, so the term query can match exactly, but if it is set to text, it may not be found-- if a word splitter is added, it can be searched; if not, the default word splitter is used, just dividing it into one word, and it will not be searched. This needs to be noted in particular.

Note2: for array fields, you can also do bucket aggregation. When doing bucket aggregation, each value will be grouped as a value instead of the entire array. You can use the above for testing, but it should be noted that the field type cannot be text, otherwise the aggregation will fail.

Note3: so according to the above hint, the general pure array is more suitable for storing tag data, as in the case above, while the field type is set to keyword instead of text, and just match exactly when searching.

1.7 Scroll query scroll

If you want to find out, for example, 100000 pieces of data at one time, the performance will be very poor. At this time, you will generally use scoll to scroll the query and check batch after batch until all the data has been queried and processed (the scrollId returned by es can be understood as the operation handle identification of es for this query. Each time the scrollId,es is sent, the scrollId,es will be operated once, or looped until the time window expires).

Using scoll scrolling search, you can search for a batch of data first, and then search another batch of data next time, and so on. Until all the data is searched, scoll search will save a snapshot of the view at that time during the first search, and then will only provide a data search based on the snapshot of the old view. If the data changes during this period, it will not let the user see it. Every time a scroll request is sent. We also need to specify a scoll parameter and a time window within which each search request can be completed (that is, the scrollId is only valid within this time window, and so is the view snapshot).

GET spnews/news/_search?scroll=1m {"query": {"match_all": {}}, "size": 10, "_ source": ["id"]} GET _ search/scroll {"scroll": "1m", "scroll_id": "DnF1ZXJ5VGhlbkZldGNoAwAAAAAAADShFmpBMjJJY2F2U242RFU5UlAzUzA4MWcAAAAAAAA0oBZqQTIySWNhdlNuNkRVOVJQM1MwODFnAAAAAAAANJ8WakEyMkljYXZTbjZEVTlSUDNTMDgxZw=="} 2 aggregation 2.1Metric aggregation

Equivalent to the aggregate function of MySQL.

Max {"size": 0, "aggs": {"max_id": {"max": {"field": "id"}

Size is not set to 0 and returns all other data in addition to the aggregate result.

Min {"size": 0, "aggs": {"min_id": {"min": {"field": "id"} avg {"size": 0, "aggs": {"avg_id": {"avg": {"field": "id"} sum {"size": 0 "aggs": {"sum_id": {"sum": {"field": "id"} stats {"size": 0, "aggs": {"stats_id": {"stats": {"field": "id"} 2.2 barrel polymerization

It is equivalent to the group by operation of MySQL, so do not attempt to bucket aggregate the fields of text in es, or it will fail.

Terms

It is equivalent to a grouping query, aggregating according to the field.

{"size": 0, "aggs": {"per_count": {"terms": {"size": 100, "field": "vtype", "min_doc_count": 1}

You can also aggregate metrics in the process of bucket aggregation, which is equivalent to doing all kinds of max, min, avg, sum, stats and so on after mysql does group by:

{"size": 0, "aggs": {"per_count": {"terms": {"field": "vtype"}, "aggs": {"stats_follower": {"stats": {"field": "realFollowerCount"} Filter

It is equivalent to MySQL filtering out the results according to where conditions, and then doing various max, min, avg, sum, stats operations.

{"size": 0, "aggs": {"gender_1_follower": {"filter": {"term": {"gender": 1}} "aggs": {"stats_follower": {"stats": {"field": "realFollowerCount"}

The aggregation operation above is equivalent to querying each metric with a gender of 1.

Filters

On the basis of Filter, we can query the independent indicators of multiple fields, that is, we can aggregate the indicators for each query result.

{"size": 0, "aggs": {"gender_1_2_follower": {"filters": {"filters": [{"term": {"gender": 1} {"term": {"gender": 2}, "aggs": {"stats_follower": {"stats": {"field": "realFollowerCount"} Range {"size": 0 "aggs": {"follower_ranges": {"range": {"field": "realFollowerCount", "ranges": [{"to": 500}, {"from": 500, "to": 1000}, {"from": 1000 "to": 1500}, {"from": "1500", "to": 2000}, {"from": 2000}]}

To: less than, from: greater than or equal to

Date Range

Similar to the previous one, it is only that the field is of date type, and then the range value is also the date.

Date Histogram Aggregation

This feature is very useful to classify the data according to the year, month and day.

Index the following documents:

DELETE my_blogPUT my_blog {"mappings": {"properties": {"title": {"type": "text"}, "postdate": {"type": "date", "format": "yyyy-MM-dd HH:mm:ss"} PUT my_blog/article/1 {"title": "Elasticsearch in Action" "postdate": "2014-09-23 23:34:12"} PUT my_blog/article/2 {"title": "Spark in Action", "postdate": "2015-09-13 14:12:22"} PUT my_blog/article/3 {"title": "Hadoop in Action", "postdate": "2016-08-23 23:12:22"}

Aggregate data on an annual basis:

GET my_blog/article/_search {"size": 0, "aggs": {"agg_year": {"date_histogram": {"field": "postdate", "interval": "year", "order": {"_ key": "asc"} {"took": 18, "timed_out": false "_ shards": {"total": 5, "successful": 5, "failed": 0}, "hits": {"total": 3, "max_score": 0, "hits": []}, "aggregations": {"agg_year": {"buckets": [{"key_as_string": "2014-01-01 00:00:00" "key": 1388534400000, "doc_count": 1}, {"key_as_string": "2015-01-01 00:00:00", "key": 1420070400000, "doc_count": 1}, {"key_as_string": "2015-01-01 00:00:00" "key": 1451606400000, "doc_count": 1}]}}

Aggregate data on a monthly basis:

GET my_blog/article/_search {"size": 0, "aggs": {"agg_year": {"date_histogram": {"field": "postdate", "interval": "month", "order": {"_ key": "asc"}

In this way, the data for each month of the included year is classified, regardless of whether it contains a document or not.

Aggregate data on a daily basis:

GET my_blog/article/_search {"size": 0, "aggs": {"agg_year": {"date_histogram": {"field": "postdate", "interval": "day", "order": {"_ key": "asc"}

In this way, the data for each day of each month of the included year is classified, regardless of whether it contains a document or not.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.