What is the ultimate solution of Elasticsearch cluster health value red 07/08 Update SLTechnology News&Howtos

What is the ultimate solution of Elasticsearch cluster health value red

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces the Elasticsearch cluster health value red ultimate solution, the content is very detailed, interested friends can refer to, hope to be helpful to you.

Elasticsearch occurs when cleaning the cache (echo 3 > / proc/sys/vm/drop_caches).

The following cluster health values: red, red alert status, while some fragments are gray.

If you look at the Elasticsearch startup log, you will find the following:

The situation in which the cluster service times out the connection.

Bserver: timeout notification from cluster service. Timeout setting [1m], time since start [1m]

The problem takes a long time to troubleshoot, and the problem has been solved.

Specially sort out the problem troubleshooting and solution in detail.

1. Interpretation of cluster status.

The head plug-in is displayed in different colors.

1), green-the healthiest state, which means that all master and replica fragments are available

2), yellow-all main shards are available, but some copies are not available

3), red-some of the main fragments are not available. (at this time, some of the data can still be found in the query. In this case, it is better to solve the problem as soon as possible. )

Reference official website: http://t.cn/RltLEpN (the translation of some Chinese cluster health status blog materials is not accurate enough, the official website shall prevail)

If the cluster status is red, the Head plug-in displays: cluster health value red. It means that at least one main shard allocation failed.

This will cause some data and parts of the index to be no longer available.

Nevertheless, ElasticSearch allows us to execute the query, and it is up to the application builder to decide whether to inform the user that the query result may be incomplete or whether the query is suspended.

2. What is unassigned sharding?

One-sentence explanation: unallocated slices.

When you start ES, you will find that the cluster shards will appear purple, gray and finally green by constantly refreshing through the Head plug-in.

3. Why does unassigned shard appear?

If shards cannot be allocated, for example, if you have overallocated the number of replica shards for the number of nodes in the cluster, shards will remain in UNASSIGNED state.

The error code is ALLOCATION_FAILED.

You can view the status of different nodes and different indexes in the cluster by using the following instructions.

GET _ cat/shards?h=index,shard,prirep,state,unassigned.reason4, symptoms after unassigned fragmentation?

Head plug-in view meeting: long after Elasticsearch starts N, one or more shards are still gray.

5. What are the possible causes of unassigned fragmentation problems?

6. How to troubleshoot the cluster status in red?

7. How to slice Fixed unassigned?

Option 1: extreme case-this shard data is no longer available, delete the shard directly.

There is no interface to delete sharding directly in ES, unless the entire node data is no longer used, delete the node.

Curl-XDELETE 'localhost:9200/index_name/'

Scenario 2: number of nodes in the cluster > = the maximum number of replicas of all indexes in the cluster + 1.

N > = R + 1

Where:

Nmure-number of nodes in the cluster

Rmure-the maximum number of copies of all indexes in the cluster.

Knowledge: when a node joins and leaves the cluster, the master node will automatically reassign the shard to ensure that multiple copies of the shard will not be assigned to the same node. In other words, the primary node does not assign the primary shard to the same node as its copy, nor does it assign two copies of the same shard to the same node.

If there are not enough nodes to allocate shards accordingly, shards may be in an unallocated state.

Because there is only one node in my cluster, that is, Number1; so Rene0, can satisfy the formula.

The question is passed on to:

1) add node processing, that is, N increase

2) Delete the copy fragment, that is, set R to 0.

The way R is set to 0 can be achieved by using the following command line:

Root@tyg:/# curl-XPUT "http://localhost:9200/_settings"-d'{" number_of_replicas ": 0}'{" acknowledged ": true}

Scheme 3: allocate redistributes shards.

If option 2 is still unresolved, you can consider redistributing the shards.

Possible reasons:

1) the node may encounter problems when restarting. Normally, when a node resumes its connection to the cluster, it forwards information about its shard to the primary node, which then converts the shard from "unallocated" to "assigned / started".

2) when the process fails due to some reason, such as the storage of the node has been corrupted, the shard may remain unallocated.

In this case, you must decide how to continue: try to restore the original node and rejoin the cluster (and do not force the allocation of primary shards)

Or force the use of Reroute API to allocate shards and re-index missing data sources or backups.

If you decide to assign unassigned primary shards, be sure to add the "allow_primary": "true" flag to the request.

The script used by ES5.X is as follows:

ES2.X and earlier versions, change allocate_replica to allocate, others remain the same.

Script interpretation:

Step 1: locate the nodes and fragments of the UNASSIGNED

Curl-s' localhost:9200/_cat/shards' | fgrep UNASSIGNED

Step 2: redistribute the fragments of UNASSIGNED through allocate_replica.

8. Core knowledge points

1) routin

The principle is very simple, each user's data is indexed into a separate shard, and only that user's shard is queried when querying. Routing is required at this point.

The advantage of using routing: routing is a powerful mechanism for optimizing clusters.

It allows us to deploy documents according to the logic of the application, so that faster queries can be built with fewer resources.

2) use routing during indexing

We can use routing to control which shard ElasticSearch sends the document to.

The routing parameter value does not matter, you can take any value. The important thing is that you need to use the same value when putting different documents on the same shard.

3) specify routing query

Routing allows users to build more efficient queries, so why send queries to all nodes when we only need to get data from a specific subset of the index?

Examples of specified routing queries:

Curl-XGET 'localhost:9200/documents/_search?pretty&q=*:*&routing=A'

4) Cluster rerouting reroute

The reroute command allows cluster rerouting assignments that contain specific commands to be performed explicitly.

For example, shards can be moved from one node to another, can be unassigned, or unassigned shards can be explicitly assigned on a specific node.

5) allocate distribution principle

Allocate unassigned shards to a node.

Assign unassigned shards to nodes. Accepts the index name and number of the index and shard, as well as the node to which the shard is assigned.

It also accepts the allow_primary flag to explicitly specify that primary shards are allowed to be explicitly allocated (which may result in data loss).

9. Summary

1) the problem was investigated for more than 6 hours, and a solution was finally found. There was almost no idea before, and I wanted to give up, but gritted my teeth and finally solved it.

2) remember, firsthand information is very important!

When there are problems with Elasticsearch, the most efficient solution is first-hand ES English official website documents, followed by ES English forums, ES github issues, stackoverflow and other English forums and blogs. The last is: Elasticsearch Chinese community, other related Chinese technology blogs and so on.

On the Elasticsearch cluster health value red ultimate solution is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.