Will the aggregation be faster by increasing the number of fragments in Elasticsearch? 07/09 Update SLTechnology News&Howtos

Will the aggregation be faster by increasing the number of fragments in Elasticsearch?

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces whether the aggregation of increasing the number of fragments in Elasticsearch will become faster, the content is very detailed, interested friends can use it for reference, I hope it can be helpful to you.

In the process of an aggregation test, we hope to make the aggregation calculation process complete faster by increasing the number of fragments. So an index is prepared, which has 260 million doc, the size of 70GB, and 2 shards. Name it index2, then change its split to 40 shards, and generate a new index, named index40:

The cluster has two nodes, JVM is configured with 30GB, and each index passes through forcemerge. Leave the cluster idle, and then perform the aggregation test. The purpose of this aggregation test is to verify the speed of bucket aggregation in high cardinality data samples and to understand how it works and to see what points can be optimized. To simulate the generation of a large number of bucket, the test uses deeply nested aggregations:

{

"aggs": {

"sip": {

"terms": {

"field": "sip"

"size": 10000

}

"aggs": {

"dip": {

"terms": {

"field": "dip"

"size": 10000

}

"aggs": {

"proto": {

"terms": {

"field": "proto"

"size": 10000

}

Because the query concurrency limit of ES in a single node is 5, in order to improve the parallelism, add the parameter max_concurrent_shard_requests for concurrency control, it was expected that all shards perform aggregation at the same time, which will greatly speed up the aggregation speed, but the result is very unexpected. After increasing the degree of clustering and merging rows, the query delay is not very different, but the CPU utilization has increased a lot! The table is as follows:

Increase the degree of aggregation and consolidation, why not improve the speed of execution?

The aggregation latency of index2 is basically the same as the 40 concurrent aggregation latency of index40, while the utilization of CPU has increased significantly. What happened to the extra CPU? When you hit hot_threads,jstack, you can only see the implementation of aggregation, nothing special, and profile does not see the problem. There is almost no io on the disk during the aggregation process.

It feels that the speed of aggregation has nothing to do with the shard size. To verify this idea, the aggregation request plus the parameter preference=_shards:0 tells him to use only 0 shards for data aggregation, and the result will be returned in 3 seconds. The more shards involved in the aggregation, the slower the aggregate returns. So the idea was overturned. But at the same time, strange changes in CPU utilization have also been observed. As shown in the figure, when using four fragment aggregations, at first as shown on the left, four core are full, as expected, and then the utilization of many other core will rise. What is the system doing?

Aggregation is only performed in the search thread pool, and not concurrently when a single shard performs aggregation. What are the other core busy doing? Hit jsatck several times to compare the hotspot thread and locate the most suspicious call chain:

At org.apache.lucene.store.ByteBufferIndexInput.buildSlice (ByteBufferIndexInput.java:277)

....

Org.elasticsearch.search.aggregations.LeafBucketCollector#collect (int, long)

Look at the buildSlice code and find that you have to new a ByteBuffer every time:

Final ByteBuffer slices [] = new ByteBuffer [endIndex-startIndex + 1]

Collect collects only one piece of data per collection. During the whole aggregation process, only here you need to apply for a large number of ByteBuffer dynamically. There must be a memory problem. Jstat checks the gc in the process of index40,max_concurrent_shard_requests=40 aggregation very frequently. Some screenshots are as follows:

During the entire aggregation process, the GC was executed for about 9 seconds, and neither hot_thread nor jstack could locate the GC thread hotspot directly, thus taking a detour. Combined with the GC log, confirm that the extra core is busy with concurrent GC.

Compared with the aggregation process of index2d, YGC only executes 2 times, and the GCT consumption time is less than 1 second.

The problem has been located at this point.

This leads to some thoughts:

Aggregate computing collect is too inefficient. At present, many computing engines use vectorization to process one batch of data at a time.

In the process of aggregation, dynamic memory is applied too frequently, and a large number of temporary objects are generated, which puts great pressure on YGC.

Increasing sharding and improving aggregation parallelism may not necessarily speed up the speed of aggregation. Consider the pressure of business aggregation statements on memory. As in today's example, if 40 shards are scattered in more nodes, GC is not a problem, and the overall aggregation speed should be much faster. Similarly, if the aggregation produces less bucket, increasing the degree of aggregation can significantly increase the overall polymerization speed.

Aggregation takes into account the pressure on node memory, but this is not easy to quantify. It is recommended to do the pressure test in advance before going online. It is stressful to perform bucket aggregation on high cardinality data.

About whether the aggregation of increasing the number of fragments in Elasticsearch will be shared here faster, I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.