How to prevent the brain fissure of elasticsearch 07/06 Update SLTechnology News&Howtos

How to prevent the brain fissure of elasticsearch

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

How to prevent the brain fissure problem of elasticsearch, in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

What is a brain fissure?

Let's look at a simple case of an elasticsearch cluster with two nodes. The cluster maintains a single index and has a shard and a replication node. Node 1 is elected as the master node at startup and saves the primary shard (marked as 0P in the schema below), while node 2 saves the replication shard (0R).

Now, what happens if the communication between the two nodes is interrupted? This can happen due to network problems or simply because one of the nodes is unresponsive (such as stop-the-world garbage collection).

Both nodes believe that the other is dead. Node 1 does not need to do anything because it is already elected as the primary node. But Node 2 automatically elects itself as the master node because it believes that part of the cluster does not have a master node. In the elasticsearch cluster, it is the master node that decides to distribute the shards evenly to the nodes. Node 2 saves replication shards, but it believes that the primary node is unavailable. So it automatically promotes the replication node to the primary node.

Now our cluster is in an inconsistent state. The index request on node 1 allocates the index data to the primary node, while the request on node 2 puts the index data on the shard. In this case, the two pieces of data are separated, and it is difficult to reorder them without a full re-index. In a worse case, an index client that is not aware of the cluster (for example, using the REST interface) is very transparent and difficult to detect, and no matter which node is hit, the index request will still complete successfully each time. The problem is only vaguely discovered when searching for data: depending on which node the search request hits, the results will be different.

How to avoid the problem of brain fissure

The default configuration of elasticsearch is good. But it is impossible for the elasticsearch project team to know all the details in your particular scenario. This is why some configuration parameters need to be changed to suit your needs. All the parameters mentioned in this blog post can be changed in elasticsearch.yml in the config directory of your elasticsearch installation address.

To prevent brain fissure, one of the parameters we need to look at is discovery.zen.minimum_master_nodes. This parameter determines how many nodes need to communicate during the host selection process. The default configuration is 1. A basic principle is that it needs to be set to N _ par _ 2cm _ 1, where N is the number of nodes in the cluster. For example, in a three-node cluster, minimum_master_nodes should be set to 3ax 2 + 1 = 2 (rounded).

Let's imagine the previous situation if we set discovery.zen.minimum_master_nodes to 2 (2ax 2 + 1). When the communication between the two nodes fails, node 1 will lose its primary state, and node 2 will not be elected as the primary. None of the nodes will accept requests for indexing or search, allowing all clients to discover the problem immediately. And none of the fragments will be in an inconsistent state.

Another parameter that we can adjust is discovery.zen.ping.timeout. Its default value is 3 seconds and it is used to determine how long a node waits when it assumes that another node in the cluster fails to respond. It's a good idea to increase this value in a slow network. This parameter not only applies to high network latency, but also works when the overload response of a node is very slow.

Two-node cluster?

If you think (or intuitively) that it is wrong to set the minimum_master_nodes parameter to 2 in a two-node cluster, you are right. In this case, if one node dies, the whole cluster is dead. Although this eliminates the possibility of brain fissure, it invalidates another good feature of elasticsearch-using replication shards to build high availability.

If you are just starting to use elasticsearch, it is recommended to configure a 3-node cluster. In this way, you can set minimum_master_nodes to 2, reducing the possibility of brain fissure, but still maintaining the advantage of high availability: you can withstand a node failure but the cluster is still functioning properly.

But what if a two-node elasticsearch cluster is already running? You can choose to tolerate the possibility of brain fissure in order to maintain high availability, or choose high availability to prevent brain fissure. To avoid this compromise, the best option is to add a node to the cluster. It sounds extreme, but it's not. For each elasticsearch node, you can set the node.data parameter to select whether the node needs to save data. The default value is "true", which means that each elasticsearch node is also used as a data node by default.

In a two-node cluster, you can add a new node and set the node.data parameter to "false". In this way, this node will not save any shards, but it can still be selected as the primary (default behavior). Because this node is a myriad of data nodes, it can be placed on a cheap server. Now you have a three-node cluster that can safely set minimum_master_nodes to 2, avoid brain cleavage and still lose a node without losing data.

The problem of brain fissure is difficult to be solved completely. There is still a question about this in elasticsearch's question list, describing the brain fissure problem that still occurs when the parameters of minimum_master_nodes are set correctly in an extreme case. The elasticsearch project team is working on a better implementation of the master algorithm, but if you are already running an elasticsearch cluster, you need to be aware of this potential problem.

How to find this as soon as possible is very important. A relatively simple way to detect a problem is to do a periodic check of the terminal response of each node under / _ nodes. This terminal returns a short report of the status of all cluster nodes. If two nodes report different cluster lists, this is a clear sign of brain fissure.

This is the answer to the question about how to prevent the brain fissure of elasticsearch. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.