How to run Elasticsearch in production environment 07/08 Update SLTechnology News&Howtos

How to run Elasticsearch in production environment

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "how to run Elasticsearch in the production environment". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to run Elasticsearch in the production environment".

Basic knowledge: clusters, nodes, indexing and fragmentation

If you are new to Elasticsearch (ES), I would like to explain some basic concepts first. This section does not deal with best practices at all and focuses mainly on interpreting terminology. Most people can just skip this section.

Elasticsearch is a distributed management framework for running Apache Lucene (Java-based search engine). Lucene is the place where data is actually saved and indexed and searched. ES sits on top of it, allowing you to run thousands of Lucene instances in parallel.

The highest-level unit of ES is the cluster. A cluster is a collection of ES nodes and indexes.

A node (Node) is an instance of ES. It can be a single server or an ES process running on the server. Unlike a server and a node, a VM or physical server can contain many ES processes, and each ES process is a node. Nodes can only join one cluster. There are different types of nodes (type), of which the two most interesting types are the data node (data node) and the primary candidate node (Master-Eligible node). A node can have multiple types at the same time. The data node runs all data operations, that is, storing, indexing, and retrieving data. The primary candidate node has the right to vote master, which is used to manage clusters and indexes.

Indexes (Index) are high-level abstractions of data. Indexes themselves do not hold data, they are just another abstraction of the actual stored data. Any actions performed on the data, such as insert, delete, indexing, and search, will affect the index. The index can belong entirely to a cluster and consists of shard.

Shard is an instance of Apache Lucene. A shard can hold many documents. Sharding is the actual object for data storage, indexing and search. Shards belong to only one node and index. There are two types of shards: primary and replica, which are basically the same, have the same data, and search runs in parallel across all shards. Of all the shards that have the same data, one belongs to primary. This is the only shard that can accept index requests. If the primary shard in the node dies, the copy will take over and become the primary shard. ES then creates a new copy and replicates the data.

To sum up, we can sort out the following picture:

Learn more about Elasticsearch

If you want to run a system, I believe you need to know about it. In this section, I'll explain the various parts of Elasticsearch, and I'm sure you need to understand it if you want to manage it in production. This section also does not involve specific recommendations, which will be described later. The purpose of this program is only to introduce the necessary background.

Quorum

It is important to understand that Elasticsearch is a (flawed) electoral system. Nodes vote to decide who should manage them, that is, the primary node. The master node runs a large number of cluster management processes and has the final decision-making power over many transactions. The ES election is flawed because only a small number of nodes, the main candidate (master-eligible) nodes, have the right to vote. The primary candidate node is enabled through the following configuration:

Node.master: true

When the cluster starts or the primary node leaves the cluster, all nodes that meet the primary election criteria start to elect a new primary node. To do this, you need to have 2n + 1 primary candidate nodes. Otherwise, a brain fissure may occur, such as getting 50% of the votes from two nodes at the same time, which will result in the loss of all data in one of the two partitions. In order not to happen. You need 2n + 1 nodes that match the primary candidate.

How nodes join the cluster

When the ES node starts, it exists alone in the vast world. How does it know which cluster it belongs to? There are different ways to do this, and now most of them use the method of seed host (Seed Host).

Basically, Elasticsearch nodes constantly communicate with all the other nodes they have seen. So a node initially needs to know only a few other nodes to understand the entire cluster. Let's look at an example of a three-node cluster:

Initial state

Initially, nodes An and C only know B. B is the seed host. The seed host is either provided to the ES as a configuration file or placed directly into the elasticsearch.yml.

Node A connects with B and exchanges information

Once node An is connected to BMIT B, it knows that An exists. For A, there is no change.

Node C connects and shares information with B

Now, C company, come up. Once this happens, B will tell C that An exists. C and B now know all the nodes in the cluster. Once A reconnects to B, it will also be aware of the existence of C.

Segment and segment merging

As I said above, in data storage fragmentation, this is only partially true. The final data is stored in the file system as a file. In Lucene and Elasticsearch, these files are called Segment. A slice will have one to thousands of segments.

Again, the segment is the actual real file, which you can view in the data directory of your Elasticsearch installation. This means that there is overhead in using segments. If you want to view the segment, you must find the file and open it. This means that many files need to be opened, and there will be a lot of overhead. Segments in Lucene are immutable, which is a problem. They can only be written once, and then they can't be changed. This, in turn, means that each document you put into ES will create a segment that contains only that document. Obviously, a cluster of 1 billion documents has 1 billion segments, which means that there are indeed 1 billion files on the file system. Is that correct? No.

In the background, Lucene performs continuous segment merging, and it cannot change segments, but it can create new segments using data merging of two smaller segments.

In this way, lucene will keep trying to keep the number of segments (the number of files, that is, the cost) small. You can also use a forced merge.

Message routing

In Elasticsearch, you can run any command on any node in the cluster and the result is the same. Interestingly, the document will end up in only one main tile and its copy, and ES doesn't know where it is. There is no mapping to record which shard a document is in.

When a search is performed, the ES node that gets the request broadcasts it to all shards in the index. That is, the main slice and all copies. These fragments are then looked up in all the segments that contain the document.

When the insert is performed, the ES node randomly selects a main shard and places the document in it. It is then written to the main slice and all its copies.

How do I run Elasticsearch in a production environment?

This section is the practice section. As I mentioned earlier, my main purpose of managing ES is to keep a journal, and this article will try to avoid the impact of this tendency, but may fail.

Size

The first question you need to ask and then answer is about resizing. How many ES clusters do you need?

Memory

The first thing I say is RAM, because RAM will restrict all other resources.

Heap

ES is written in Java, Java uses heap, and you can think of it as memory reserved by Java. If you list all the important factors of the heap, it will triple the size of this document, so I will introduce the most important part, namely the heap size.

Use as much memory as possible, but the heap size should not exceed 30g.

There is a secret about the heap that many people don't know: each object in the heap needs a unique address, an object pointer. The length of the address is fixed, which means that the number of objects that can be addressed is limited. To put it simply, when beyond a certain range, Java will start using compressed object pointers instead of uncompressed object pointers. This means that each memory access will involve other steps, which can seriously slow down. So you don't need to set more than this threshold (about 32G) at 100%.

I used to stay in a dark room for a whole week, doing nothing but using esrally benchmarking to test Elasticsearch combinations of different file systems, heap sizes, files, and BIOS settings. To make a long story short, here's how to set the heap size:

Add index latency, as low as possible

The naming convention is fs_heapsize_biosflags. As you can see, starting with the heap size of 32 gigabytes, performance suddenly starts to deteriorate. The same is true of throughput:

The index adds a median throughput. The higher the better.

To make a long story short: if you want to be lucky, use 29G or 30G RAM, use XFS, and enable hardwareprefetch and llc-prefetch as much as possible.

File caching

Most people running Elasticsearch,Linux on Linux use memory as the file system cache. A common recommendation is that the ES server uses 64 gigabytes of memory, which is half for caching and half for heap. I haven't tested the file cache. However, it is not difficult to see that large ES clusters, such as for logging, can benefit a lot from configuring large file caches. If all your indexes fit into the memory heap, it won't do so much good.

CPU

This depends on the actions performed on the cluster. If you do a lot of indexing, you need more and faster CPU than just logging. For logging, I found that eight CPU cores were more than sufficient, but I found that many people used a larger configuration, but it was not good for his usage scenario.

Magnetic disk

This one is not as straightforward as I thought. First, if the index can fit into the RAM, the disk is only important when the node is cold started. Second, the actual amount of data that can be stored depends on the index layout. Each shard is a Lucene instance, and they all have memory requirements. This means that the maximum number of shards you can hold in the heap is limited. I'll discuss this in more detail in the index layout section.

Usually, you can put all the data disks into the RAID0. You need to replicate at the Elasticsearch level, so losing a node doesn't matter. Do not use LVM with multiple disks, because LVM can only write to one disk at a time, which does not bring the benefit of multiple disks at all.

With regard to the file system and RAID settings, I sorted out the following points:

Scheduler:cfq and deadline are superior to noop. It might be nice if you have a nvme,Kyber, but I haven't tested it yet.

QueueDepth: as high as possible

Pre-reading: please open

Raid chunk size: no impact

FS block size: no effect

FS type: XFS > ext4

Index layout

Much depends on your use case. I can only discuss it from the logging scenario (especially using Graylog).

Slice

Simplified version:

For workloads with heavy writes, main shard = number of nodes

For heavy workload, main shard * number of replicas = number of nodes

More copies = higher search performance

The maximum write performance can be given by the following formula:

Node_throughput * number_of_primary_shards node throughput * number of main shards

Node throughput * number of main shards

The reason is simple: if there is only one primary shard, the write speed is only similar to that of a single node, because a shard can only be on one node. "if you do want to optimize write performance, you should ensure that there is only one slice (master or replica) on each node, because in this case the replica can get the same write speed as the primary, and writes depend largely on disk IO." Note: if there are many indexes, the above strategy may be problematic, and performance bottlenecks may be other reasons.

If you want to optimize search performance, you can give it with the following formula:

Node_throughput * (number_of_primary_shards + number_of_replicas) Node Throughput * (number of main shards + number of replicas)

Node throughput * (number of main shards + number of replicas)

For search, the main shard and the copy are basically the same. Therefore, if you want to improve search performance, you only need to increase the number of copies.

Size

I have discussed the size of the index many times. Here is my experience:

30g of heap = 140g shards maximum per node 30g heap memory = up to 140g shards can be started on a node

With more than 140shards, the Elasticsearch will crash and run out of memory errors. This is because each shard is an instance of Lucene, and each instance requires a certain amount of memory. This means that there is a limit to the number of shards that each node can have.

If you have a large number of nodes, shards, and index sizes, how many indexes you can hold can be calculated by the following formula:

Number_of_indices = (140 * number_of_nodes) / (number_of_primary_shards * replication_factor) Index = (140 * nodes) / (number of main fragments * replication factor)

Based on the disk size, the size of the index can be easily calculated:

Index_size = (number_of_nodes * disk_size) / number_of_indices index size = (number of nodes * disk size) / number of indexes

However, the larger the index, the slower the search. For logging, it's okay to slow down, but for applications with a really high search volume, you should resize the index according to the RAM size.

Segment merging

Each segment is an actual file in the file system, and more segments means more read overhead. Basically, for each search query, it goes to all the shards in the index, and then to all the segments in the shards. Multiple segments greatly increase the cluster read IOPS until it becomes unavailable. Therefore, the number of segments needs to be reduced as much as possible.

Force_merge API allows you to merge segments into a certain number, such as 1. If you perform index scrolling (for example, using Elasticsearch for logging), it is a good idea to perform a regular forced merge when the cluster is not in use. Forced merging takes up a lot of resources and significantly slows down the cluster. Therefore, it is best not to let Graylog do it for you, but to do it yourself in less cluster time. If you have many indexes, you must perform segment merging on a regular basis. Otherwise, the cluster will run very slowly and eventually hang up.

Cluster layout

For all settings except the minimum setting, it is best to use a dedicated candidate master node. The main reason is to ensure that there are always 2n + 1 candidate master nodes to ensure arbitration. But for data nodes, you can add new nodes at any time without having to worry about this requirement. In addition, we do not want the high load on the data node to affect the primary node.

Finally, the master node is an ideal candidate for seed nodes. Keep in mind that seed nodes are the easiest way to discover nodes in Elasticsearch. Because the primary node rarely changes, it is the best choice because it probably already knows all the other nodes in the cluster.

The primary node can be very small, one CPU core, and 4G RAM is enough for most clusters. Of course, we also need to pay attention to the actual use and adjust accordingly.

Monitor and control

ES provides you with a large number of metrics and all metrics in the form of JSON, which makes it easy to pass to monitoring tools. Here are some useful monitoring indicators:

Number of segments

Heap usage

Heap GC time

Average search, indexing, merge time

IOPS

Disk utilization

Thank you for your reading, the above is the content of "how to run Elasticsearch in the production environment". After the study of this article, I believe you have a deeper understanding of how to run Elasticsearch in the production environment, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.