Elasticsearch Aggregation notes 07/12 Update SLTechnology News&Howtos

Elasticsearch Aggregation notes

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Overview of Aggregation

Aggregation can coexist with ordinary query results, and multiple unrelated Aggregation can be included in a query result. If you only care about the aggregate results but not the query results, the size of SearchSource will be set to 0, which can effectively improve the performance.

Aggregation Typ

Metrics:

A simple aggregation type that calculates aggregate metrics for all documents in the target set and generally does not have a nested sub aggregations. Such as average (avg), summation (sum), count (count), cardinality (cardinality). Cardinality corresponds to distinct count

Bucketing:

Bucket aggregation type, where aggregate metrics are calculated on a series of buckets instead of all documents, and each bucket represents a subset of the original result set that meets certain conditions. Generally, there are nested sub aggregations. Typical such as TermsAggregation, HistogramAggregation

Matrix:

Matrix aggregation, multi-dimensional aggregation, that is, two-dimensional or even multi-dimensional aggregation index tables are calculated according to two or more aggregation dimensions. There seems to be only one kind of MatrixStatAggregation at present. And currently does not support scripting (scripting)

Pipeline:

Pipeline aggregation, which is calculated again on the basis of previous aggregation results, is often used in combination with Bucketing Aggregation. List: first calculate the total transaction amount per day in the past 30 days (Bucketing aggregation), and then count the number of days in which the total transaction amount is greater than 10000 (Pipeline aggregation). Aggregation structure

Aggregation request:

Two-tier structure:

Aggregation-> SubAggregation

Sub aggregation is to further do the aggregate calculation in the original Aggregation calculation results.

Aggregation response:

Three-tier structure: (for Bucketing aggregation) MultiBucketsAggregation-> Buckets-> Aggregations

Aggregation attribute:

Name: corresponds to the name of the Aggregation in the request

Buckets: each Bucket corresponds to each possible value in the Agggregation result and the corresponding aggregate result.

Bucket attribute:

Key: corresponds to the possible value of the aggregation dimension. The specific value is related to the type of Aggregation, such as Term aggregation (total amount calculated by transaction type), then the Bucket key value is all possible transaction types (credit/debit etc). For example, DateHistogram aggregation (the number of transactions per day), then the Bucket key value is the specific date.

DocCount: corresponds to the amount of text in each bucket.

Value: corresponding to the calculation results of aggregate indicators. Note that if it is a multi-tier Aggregation calculation, the Aggregation value in the middle tier generally has no value, such as Term aggregation. Only the Aggregation of the specific calculation index at the bottom layer has a value.

Aggregations: the calculation result of the subAggregation of the current Aggregation in the request (if any)

Mapping SQL to Aggregation

The premise of SQL mapping implementation: only for aggregate computing, that is, column with aggregate function type exists in part of sql select.

It is difficult to describe the mapping process directly. The above examples are easy for you to understand. Anyway, the structure of SQL is nothing more than SELECT/FROM/WHERE/GROUP BY/HAVING/ORDER BY. ORDER BY does not discuss first, the general aggregate result does not care much about the order. FROM is also easy to understand, which is the name of the index.

ES Builder corresponding to the SQL component:

Column 1Column 2Column 3select column (aggregate function) MetricsAggregationBuilder is determined by column corresponding aggregate function (e.g. MaxAggregationBuilder) select column (group by field) Bucket keywhereFiltersAggregationBuilder + FiltersAggregator.KeydFilterkeyedFilter = FiltersAggregator.KeyedFilter ("combineCondition", sub QueryBuilder) AggregationBuilders.filters ("whereAggr", keyedFilter) group byTermsAggregationBuilderAggregationBuilders.terms ("aggregation name"). Field (fieldName) havingMetricsAggregationBuilder is determined by having conditional aggregate function (e.g. MaxAggregationBuilder) + BucketSelectorPipelineAggregationBuilderPipelineAggregatorBuilders.bucketSelector (aggregationName, bucketPathMap, script)

The ES Builder corresponding to the commonly used SQL operator and aggregate function:

Sql elementAggregation TypeCode to buildcount (field) ValueCountAggregationBuilderAggregationBuilders.count (metricsName) .field (fieldName) count (distinct field) CardinalityAggregationBuilderAggregationBuilders.cardinality (metricsName). Field (fieldName) sum (field) SumAggregationBuilderAggregationBuilders.sum (metricsName). Field (fieldName) min (field) MinAggregationBuilderAggregationBuilders.min (metricsName). Field (fieldName) max (field) MaxAggregationBuilderAggregationBuilders.max (metricsName) .field (fieldName) avg (field) AvgAggregationBuilderAggregationBuilders.avg (metricsName) .field (fieldName) ANDBoolQueryBuilderQueryBuilders.boolQuery (). Must (). Add (sub QueryBuilder) ORBoolQueryBuilderQueryBuilders.boolQuery (). Add (sub QueryBuilder) NOTBoolQueryBuilderQueryBuilders.boolQuery () .mustNot () .add (sub QueryBuilder) = TermQueryBuilderQueryBuilders.termQuery (fieldName) Value) INTermsQueryBuilderQueryBuilders.termsQuery (fieldName, values) LIKEWildcardQueryBuilderQueryBuilders.wildcardQuery (fieldName, value) > RangeQueryBuilderQueryBuilders.rangeQuery (fieldName) .gt (value) > = RangeQueryBuilderQueryBuilders.rangeQuery (fieldName) .gte (value) 1530.20

The most complex SQL in history! Here we mainly focus on the processing of the having part, using the BucketSelectorPipelineAggregationBuilder of Pipeline type. Two kinds of child nodes are added under the term aggregation corresponding to the last GroupBy condition: sub aggregations includes not only the aggregate function of the select part but also the aggregate function corresponding to the having condition. Pipeline aggregations includes BucketSelectorPipelineAggregationBuilder corresponding to having condition. The main attributes of BucketSelectorPipelineAggregationBuilder are: bucketsPathMap: saves the name of the path and the mapping of the corresponding aggregate attributes. Script: uses a script to describe the aggregation condition, but the left side of the condition is not directly replaced by the attribute name but by the name of path.

Note that although logically the having condition is applied to the previously calculated aggregate results, from the structure of the ES Aggregation, the Aggregation corresponding to the aggregation index in the BucketSelectorPipelineAggregationBuilder and having conditions is a sibling relationship rather than a father-son relationship!

Also note that script path is a relative path for a sibling node (sibling node) rather than an absolute path from the root node, using the name of the aggregate attribute rather than the name of the Aggregation itself. Moreover, it is required that the Bucket accessed according to the path must be unique, because the BucketSelector only judges whether the current Bucket is selected according to the condition, and this Bool judgment cannot be applied if the path returns multiple Bucket.

6.select count (paymentId) from Payment group by timeRange (createdAt,'1D é, 'yyyy/MM/dd')

Here, a custom function timeRage is used to indicate that the attribute createAt is aggregated by day, and the corresponding ES aggregation type is DateHistogramAggregation.

Other considerations

Bucket count

Distinct count: Elasticsearch uses an approximate algorithm based on hyperLogLog.

Reference

Https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.