How to use the method of Elasticsearch aggregation 07/19 Update SLTechnology News&Howtos

How to use the method of Elasticsearch aggregation

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "how to use the method of Elasticsearch aggregation". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Aggregation is a kind of search-based data aggregation, which can be combined to complete complex operations. Aggregation can summarize, group documents, and so on. Through aggregation, we get an overview of the data, which is to analyze and summarize the uniform data, rather than looking for a single document.

Bucket Aggregation: a collection of documents with a small number of columns that meet certain criteria, similar to the "group by" of MySQL

Metric Aggregation: a small number of mathematical operations, can be used for statistical analysis of document fields, such as max, min, sum and so on.

Pipeline Aggregation: second polymerization of the rest of the aggregation results

Matrix Aggregation: supports manipulation of multiple fields and provides a result matrix, version 7.x is incorporated into Metric Aggregation.

{"size": 0, [query ": {},]? "aggs": {"${my_name}": {"${aggregation_type}": {} [, "meta": {[]}]? [, "aggs": {[] +}]?} [ "${my_name}": {.}] *}}

Aggregates can be nested. For example, there is a "aggs" inside the "aggs" above.

"aggs" is abbreviated, or you can write the full "aggregations".

The top "size" is generally set to 0, and the aggregation operation is used for statistics without the need to output documents.

Query query, optional

My_name sets the name himself

Use kibana to import "kibana_sample_data_flights", which is the flight information of the plane, such as region, price, weather and so on.

Operation path: Home-> add data-> sample data-> Sample flight data

1. Sub-polymerization

Group according to destination (DestCountry) to check the number of flights

GET kibana_sample_data_flights/_search {"size": 0, "aggs": {"dest_count": {"terms": {"field": "DestCountry"} 2, number interval grouping

Group according to the price range, for example, how many are 0 to 100 yuan, and how many are 100 to 200 yuan

GET kibana_sample_data_flights/_search {"size": 0, "aggs": {"price_stat": {/ / set your own name "histogram": {"field": "AvgTicketPrice", "interval": 100 / / specified range}

In the output result, a key of "100.0" represents data from 0 to 100.0. The calculation formula is as follows

Bucket_key = Math.floor (value / interval) * interval3, date interval grouping GET kibana_sample_data_flights/_search {"size": 0, "aggs": {"price_stat": {"date_histogram": {"field": "timestamp", "calendar_interval": "month"}

Note: date interval setting, 7.x version uses "calendar_interval", older version uses "interval".

Supported interval expression

Minute: minute, 1m

Hour: hour, 1h

Days: day, 1D

Week: week, 1w

Month: month, 1m

Quarter: quarter, 1Q

Year: year, 1y

Aggregate operations such as computing metrics are based on the use of a way or persons to extract the values that need to be aggregated from the document. This data can not only be extracted from the attributes of the document (using data attributes), but can also be generated using scripts.

Support max, min, count, sum, avg, stats (various statistics), cardinality (quantity removed), percentiles (percentile), geo_bounds (geographical boundary)

1. Maximum value

The maximum and minimum price of the output flight

GET kibana_sample_data_flights/_search {"size": 0, "aggs": {"max_price": {"max": {"field": "AvgTicketPrice"}}, "mix_price": {"min": {"field": "AvgTicketPrice"}} 2, nesting operation

Output the maximum and minimum prices of flights at each destination

GET kibana_sample_data_flights/_search {"size": 0, "aggs": {"dest_count": {"terms": {"field": "DestCountry"}, "aggs": {"max_price": {"max": {"field": "AvgTicketPrice"} "min_price": {"min": {"field": "AvgTicketPrice"} 3, stats

Output all kinds of statistical results at one time, including count, min, max, sum, avg

GET kibana_sample_data_flights/_search {"size": 0, "aggs": {"my_stats": {"stats": {"field": "AvgTicketPrice"} 4, cardinality

Quantity statistics after weight removal

GET kibana_sample_data_flights/_search {"size": 0, "aggs": {"my_cardinality": {"cardinality": {"field": "DestCountry"} 5, top_hits

Top_hits operation, the first few documents.

To get the minimum price of a flight to each country, the following "size": 5 "represents the flight from five countries, and" size ": 2" represents the lowest two prices.

GET kibana_sample_data_flights/_search {"size": 0, "aggs": {"my_count": {"terms": {"field": "DestCountry", "size": 5}, "aggs": {"my_min_price": {"top_hits": {"size": 2 "sort": [{"AvgTicketPrice": {"order": "asc"} 6. Ranges sets the range grouping itself.

For example, if you have a packet less than 200,200 to 500, and more than 500, you can specify the output key.

GET kibana_sample_data_flights/_search {"size": 0, "aggs": {"my_price_range": {"range": {"field": "AvgTicketPrice", "ranges": [{"to": 200}, {"from": 200, "to": 500} {"key": "> 500"," from ": 500}]} 7, percentile aggregation

Percentile aggregation can use the results of percentile aggregation to evaluate the data distribution, judge whether the data can be distorted, judge whether the data can be bimodal distribution and so on. It is often used in stress testing, for example, the value corresponding to the 95th percentile represents all values with a value greater than 95%. Assuming that the result is "10%:12ms,..., 70%:55ms, 99%:100ms", it means that under normal circumstances (70%), 99% of the web pages with response time of 12msresponse 55msjiggle are loaded in 100ms.

GET kibana_sample_data_flights/_search {"size": 0, "aggs": {"my_price_percentiles": {"percentiles": {"field": "AvgTicketPrice", "percents": [1,5,25,50,75,95 99]} 8. Geographical boundary aggregation GET kibana_sample_data_flights/_search {"size": 0, "aggs": {"my_geo_bounds": {"geo_bounds": {"field": "DestLocation", "wrap_longitude": true} 9, optimize the performance of Terms aggregation

Setting eager_global_ordinals to true preloads this data in memory.

The results of aggregation analysis are analyzed again.

Divided into two categories

Sibling: the results are at the same level as the existing analysis results. There are min_bucket, max_bucket, avg_bucket, sum_bucket, stats_bucket, percentiles_bucket

Parent: the results are embedded in the existing aggregate analysis results. There are derivative (difference, difference from the previous one, used to see the trend), cumulative_sum (cumulative summation), moving_avg (moving average, the average of data in a fixed size window)

Description, bucket_path parameter, specify the path, if it is a secondary path, note that there is a ">".

1. An example of Sibling

Get the average tickets according to different destinations, and analyze these average fares.

Note that my_distance,my_avg_price,my_result is the name of the variable set by yourself, and buckets_path specifies the path.

GET kibana_sample_data_flights/_search {"size": 0, "aggs": {"my_distance": {"terms": {"field": "DestCountry"}, "aggs": {"my_avg_price": {"avg": {"field": "AvgTicketPrice"} "my_result": {"stats_bucket": {"buckets_path": "my_distance > my_avg_price"} 2, examples of Parent

Count the average fare per 50km and view its fluctuations

GET kibana_sample_data_flights/_search {"size": 0, "aggs": {"my_distance": {"histogram": {"field": "DistanceKilometers", "interval": 50}, "aggs": {"my_avg_price": {"avg": {"field": "AvgTicketPrice"} "my_result": {"derivative": {"buckets_path": "my_avg_price"}

Sort by quantity (_ count), and sort by key returned by the same quantity

GET kibana_sample_data_flights/_search {"size": 0, "aggs": {"dest_count": {"terms": {"field": "DestCountry", "order": [{"_ count": "asc"} {"_ key": "desc"}

Sort according to the final returned result, such as the following my_stats

GET kibana_sample_data_flights/_search {"size": 0, "aggs": {"my_distance": {"terms": {"field": "DestCountry", "order": {"my_stats.min": "asc"}} "aggs": {"my_stats": {"stats": {"field": "AvgTicketPrice"}

Polymerization analysis

The reason for the inaccuracy of Terms aggregation analysis is that the data is scattered in multiple fragments, and Coordinating Node can not get the full picture of the data.

Open show_term_doc_count_error and you can see two more return values.

Doc_count_error_upper_bound: the term bucket of the legacy, the document contained, and the maximum possible value

Sum_other_doc_count: the total number of documents in the terms except the terms that returned the result bucket (Total-Total returned)

Cases of incorrect Trems

So how to deal with it?

Processing plan 1: when the amount of data is small, set Primary Shard to 1; achieve accuracy.

Processing scheme 2: set shard_size parameters on distributed data to improve accuracy. How it works: getting extra data from Shard each time increases accuracy, but reduces response time.

The default size of shard_size is "shard_size = size * 1.5 * 10", which can be set according to your needs.

This is the end of the content of "how to use Elasticsearch aggregation". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.