MongoDB trouble shoot sharded clusters 07/12 Update SLTechnology News&Howtos

MongoDB trouble shoot sharded clusters

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Preface of MongoDB trouble shoot sharded clusters

Part1: write at the front

In the MongoDB sharding environment, we will encounter some common errors. This article translates the errors listed on the official website of MongoDB.

Part2: overall environment

MongoDB 3.4.4

Actual combat

Part1: an application or Mongos is down

If each application server has its own instance of mongos, other application servers can continue to access the database. In addition, the mongos instance does not maintain a persistent state and can be restarted (unavailable during startup) without losing any state or data. When an mongos instance starts, it fetches data from configserver and can start routing queries.

Part2: in a sharding cluster, one of the mongod processes is down

For sharding environments, replica sets provide very good high availability. If the primary library is down, the replica set selects a new primary library. If the slave library is down, the replica set disconnects the slave library from the master library, and the data of the slave library will not be emptied. In a three-member replica set, even if a single member of the group encounters a catastrophic failure, the other two members have a complete copy of the data.

Often check whether availability is interrupted and whether there is a failure. If there is an unrecoverable failure in the system, replace the server in question as soon as possible, and add a new member to the replica set to ensure the high availability of the entire replica set cluster.

All members of Part3:sharding are unavailable

If all members of a replica set shard are unavailable, all data retained in that shard is not available. However, data on all other shards will remain available, and data can be read and written to other shards. ? However, the application must be able to process some of the results, at this point DBA the reason for the interruption of the investigation and try to recover the shards as soon as possible.

Part4:configserver replica set members are not available

Changed in version 3.2: starting with MongoDB 3.2, the configuration server of the sharding cluster can be deployed as a replica set. The replica set configuration server must be running the WiredTiger storage engine. MongoDB 3.2Three mirrored mongod instances of the configuration server are not recommended.

The replica set provides high availability for the configuration server. If the unavailable configuration server is the primary server, the replica set selects the new primary server.

If the replica set configuration server loses its primary server and cannot select the primary server, the cluster's metadata becomes read-only. Data can still be read and written from shards, but block migration or fragmentation cannot be performed until primary is available. If all configuration databases become unavailable, the cluster cannot be used.

The Part5:configserver data is too old to cause the cursor to fail

When one or more mongos instances have not updated their cache of cluster metadata from the configuration database, the query returns the following warning:

Could not initialize cursor across all shards because: stale config detected

This warning should not be returned to the application. The warning is repeated until all mongos instances flush their caches. To force the instance to flush its cache, run the flushRouterConfig command.

Part6: sharding keys and cluster availability

The most important consideration when selecting a sharding key is:

1. Ensure that MongoDB can distribute data evenly between shards

two。 Measure write operations can be spread throughout the cluster

3. Make sure that mongos can isolate most queries into a specific mongod.

1. Each shard should be a replica set, and if a particular mongod instance fails, the replica set members will select another one as the master node and continue. ? However, if the entire shard is inaccessible or fails for some reason, the data will not be available.

two。 If the sharding key allows mongos to split most operations into a single shard, the failure of a single shard will only make some data unavailable.

3. If the data of sharding construction and distribution requires that the whole cluster be obtained, then the unavailability of this shard will cause the whole cluster to be unavailable.

In essence, it also shows the importance of selecting the appropriate fragment key for a single fragment isolation query operation.

Part7:configserver string error

Starting with MongoDB 3.2, configserver servers can be deployed as replica sets. The mongos instance of the sharding cluster must specify the same configserver server replica set name, but you can specify the hostname and port of different members of the replica set.

Since then, the use of mirrored mongod instances as configuration servers (SCCC) is no longer supported. Before upgrading the sharding cluster to 3.4, the configuration server must be converted from SCCC to CSRS.

For earlier versions of MongoDB sharding clusters, the configserver server uses the topology of three mirrored mongod instances, and the mongos instances in the sharding cluster must specify the same configDB string.

Part8: avoid downtime when moving configerserver servers

Use CNAME to identify your configuration server to the cluster so that you can rename and renumber the configuration server without downtime.

Part9:move Chunk reported an error

At the end of the chunk migration, the shard must be connected to the configserver database to update the record of the block in the cluster metadata. If the shard cannot connect to the configserver database, MongoDB reports the following error:

ERROR: moveChunk commit failed: version is at | instead of | "and" ERROR: TERMINATING "

When this occurs, primary replication of the shard replica set terminates to protect data consistency. If secondary members have access to the configuration database, the data on the shard will be accessed again after being elected as the new master.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.