Example Analysis of Elasticsearch Writing data bottom layer 07/01 Update SLTechnology News&Howtos

Example Analysis of Elasticsearch Writing data bottom layer

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces Elasticsearch to the bottom of the data sample analysis, the content is very detailed, interested friends can refer to, I hope to help you.

noun explanation

Document: A document is a JSON file stored in elasticsearch, equivalent to a row of data in a table in a relational database.

Shard: Index data can be split into smaller shards, each shard placed on a different server to improve concurrency. Lucene index in Lucene is equivalent to a shard in ES.

Segments: A segment consists of multiple segments, each segment is an independent inverted index, and has immutability. Segments provide search functionality.

Transaction Log: Elasticsearch uses a translog to log index,delete,update,bulk requests to ensure that data is not lost. If Elasticsearch needs to recover data, it can read from the translog. Each fragment corresponds to a translog file.

Commit point: Records all known segments, each Shard has a Commit point, which holds all segments successfully written to disk by the current Shard.

Lucene index: A collection of segments plus a Commit point.

memory allocation

OS Cache: The inverted index segments in Lucene are stored in files. To improve access speed, they will be loaded into OS Cache to improve Lucene performance, so it is recommended to leave at least half of the system memory for Lucene.

Node Query Cache: responsible for caching filter query results, each node has one, shared by all shards, filter query results do not involve the calculation of scores.

Indexing Buffer: An index buffer used to store newly indexed documents, and when it is full, the documents in the buffer are written to segments on disk.

Shard Request Cache: Used to cache request results for aggregates, suggestions, hits.total.

Field Data Cache: Elasticsearch Load Memory The default behavior of fielddata is delayed loading. The first time you aggregate, sort, or use a field of type text in a script, you need to set the field to the fielddata structure, which will load the inverted index of all segments of the field into heap memory. It is not recommended because fielddata takes up a lot of heap memory space, and doc_value is used for aggregation or sorting.

Write Flow--Data Bottom Layer

1. Data is written to Index Buffer and Translog files.

To ensure that data is not lost, Transaction Log is fsync persisted to disk for every index,delete,update,bulk request from older versions.

Refresh: The process of writing Index Buffer to Segment is called Refresh.

Every 1s, the data in the Index Buffer is written to a new Segment (in OS Cache), at which point the Segment is opened and Search is provided. After Refresh, the data can be searched, which is why Elasticsearch is called near-real-time search.

Refresh is triggered when the Index Buffer is full, and the default value is 10% of JVM heap memory.

Refresh does not perform the fsync operation and does not clear the Transaction Log.

3. Repeat steps 1~2, new Segment is added continuously, Index Buffer is cleared continuously, and the data in Transaction Log is accumulated continuously.

4. ES Flush (Lucene Commit) occurs every 30 minutes or when the Transaction Log is full (default 512M):

4.1 Call Refresh to clear Index Buffer.

4.2 Call fsync to write Segments from cache to disk.

4.3 Clear (delete) the Transaction Log.

4.4 Commit ponits are written to disk, and each Shard has a Commit point that holds all the segments that the current Shard successfully wrote to disk.

Parameter description Refresh PUT my-index/_settings{ "refresh_interval": "60s" #default 1s}TranslogPUT my-index/_settings{ "translog.flush_threshold_size": "1gb", #default 512M, when translog exceeds this value, flush "translog.sync_interval": "60s", #default 5s "translog.durability": "async" #default is request, each request falls, ignoring translog.sync_interval. Set to async, asynchronous write, fsync according to the interval of index.translog.sync_interval parameter.} About Elasticsearch write data bottom example analysis shared here, I hope the above content can be of some help to everyone, you can learn more knowledge. If you think the article is good, you can share it so that more people can see it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.