How to optimize Elasticsearch write speed 10/31 Update SLTechnology News&Howtos

How to optimize Elasticsearch write speed

2025-10-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article is about how to optimize the writing speed of Elasticsearch. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

The sample version of this optimization is 7.9.2. The ES version is rising so fast that it is completely out of the 5's era.

1. Which operations take up resources

To optimize, you need to first know the writing process of ES and which steps are the most time-consuming.

First of all, there is the problem of replica. In order to ensure the minimum high availability, the number of copies here is set to 1, which cannot be saved. Therefore, setting the number of copies to 0 is only appropriate when the data is imported for the first time.

As shown in the figure above, it takes several steps for a piece of data to finally fall to the ground. In this process, there will even be a backup mechanism such as tranlog.

The underlying storage of ES is Lucene, which contains a series of reverse indexes. Such an index becomes a segment. However, the record is not written directly to the segment, but to a buffer first.

When the buffer is full, or stay in the buffer long enough to reach the refresh time (highlight), the contents of the buffer will be written into the segment at once.

This is why the configuration of the refresh_interval attribute can seriously affect performance. If you don't want high real-time performance, you might as well make it a little larger.

The buffer defaults to 10% of the heap space, with a minimum value of 48mb (for sharding). If you have a large number of indexes and heavy writes, this part of the memory footprint is considerable and can be increased appropriately.

2. Start optimization

There are three main actions for data writing: flush, refresh and merge. By adjusting their behavior, there is a tradeoff between performance and data reliability.

Flush

As can be seen from the above introduction, translog writes a full amount of data, which is a bit like binlog in MysSQL, or aof in redis, to ensure data security in abnormal situations.

This is because, after we write the data to disk, we have to call fsync to brush the data to disk. If we do not do so, it will result in data loss when the system is powered off.

ES defaults to flush once for each request, but this is not necessary for logs. You can change this process to asynchronous, with the following parameters:

Curl-H "Content-Type: application/json"-XPUT 'http://localhost:9200/_all/_settings?preserve_existing=true'-d' {"index.translog.durability": "async", "index.translog.flush_threshold_size": "512mb", "index.translog.sync_interval": "60s"}'

This is arguably the most important step of optimization and has the greatest impact on performance, but in extreme cases it is possible to lose some of the data. For the logging system, it is bearable.

Refresh

In addition to writing translog,ES, the data is written to a buffer. But pay attention! At this point, the contents of the buffer cannot be searched, and it needs to be written to the segment.

This is the refresh action, which defaults to 1 second. That is, the data you write will not be searched until 1 second later.

So ES is not a real-time search system, it is a quasi-real-time system (near-realtime).

This refresh interval can be modified through index.refresh_interval.

For the log system, of course, make it a little bigger. Xjjdog here is adjusted to 120s, reducing the frequency of these falling into the segment, the speed will naturally be faster.

Curl-H "Content-Type: application/json"-XPUT 'http://localhost:9200/_all/_settings?preserve_existing=true'-d' {"index.refresh_interval": "120s"}'

Merge

Merge is actually the mechanism of lucene, which mainly merges small segment blocks to generate larger segment to improve the speed of retrieval.

The reason is that the refresh process generates a lot of small segment files, and data deletion also produces space debris. So merge, in popular terms, is like a defragmentation process. Like postgresql, there are vaccum processes that do the same thing.

Obviously, this kind of finishing operation not only makes Feixuan O, but also a waste of CPU.

Sadly, merge has three strategies.

Tiered default option, which combines index segments of similar size and takes into account the maximum number of index segments allowed per layer.

Log_byte_size uses the logarithm of the number of bytes as the unit of calculation and selects multiple indexes to merge to create a new index.

Log_doc calculates the number of documents in the index segment and selects multiple indexes to merge to create a new index.

Each strategy has a very detailed targeted configuration, which is not verbose here.

Since the log system does not delete randomly, we can keep it by default.

3. Fine-tuning

The new version optimizes the configuration of thread pools, eliminating the need to configure complex search, bulk, and index thread pools. You need to configure the following: thread_pool.get.size, thread_pool.write.size, thread_pool.listener.size, thread_pool.analyze.size. The data exposed by API _ cat/thread_pool can be observed and adjusted.

In fact, it is possible to disperse the pressure of Imax O by configuring multiple disks, but it is easy to cause data hotspots to be concentrated on a single disk.

The index establishment process of Lucene consumes a lot of CPU and can reduce the number of inverted indexes to reduce the loss of CPU. The first optimization is to reduce the number of fields; the second optimization is to reduce the number of index fields. The specific action is to set the index property to not_analyzed or no for fields that do not need to be searched. As for _ source and _ all, they are not very effective in actual debugging and will not be discussed in detail.

In addition, if the log is passed through a component such as filebeat or logstash, batch mode is generally enabled. The performance can be increased by batch, but it should not be too large. It can be set according to the actual observation, and it is generally possible between 1k-1w.

Thank you for reading! This is the end of the article on "how to optimize the writing speed of Elasticsearch". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it out for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.