MongoDB Replica Set troubleshooting 04/28 Update SLTechnology News&Howtos

MongoDB Replica Set troubleshooting

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

1. Check the status of Replica Set

Use db.runCommand ({"replSetGetStatus": 1}); or rs.status ()

two。 Check replication delay time

Source: m1.example.net:30001 syncedTo: Tue Oct 02 2012 11:33:40 GMT-0400 (EDT) = 7475 secs ago (2.08hrs) source: m2.example.net:30002 syncedTo: Tue Oct 02 2012 11:33:40 GMT-0400 (EDT) = 7475 secs ago (2.08hrs)

The possible reasons for replication delays are:

Network delay

You can use ping and traceroute commands to detect network conditions

Disk throughput

If Secondary's disk does not flush data to disk as quickly as Primary's disk, it will not be able to keep up with Primary. You can use iostat or vmstat to check disk usage

Concurrent quantity

In some cases, if there is a long-term operation on the Primary, the replication operation of the Secondary may be blocked. Consider write concern. The second is to see if there are any slow queries.

Appropriate Write Concern

Replica Acknowledge Write Concern

Replica Set Write Concern

3. Connection testing between all members

Members of the Replica Set need to be able to communicate with each other and check the firewall settings.

4. Socket Exceptions problem of restarting multiple Secondar

When restarting multiple members of the Replica Set, make sure that a Primary can be selected. If the program has a socket connection error during maintenance, you can check the keepalive settings of the TCP.

Cat / proc/sys/net/ipv4/tcp_keepalive_time

Under Linux, the default setting of tcp_keepalive_time is 7200 seconds, that is, two hours. You can set this value to 300 seconds for the server where all MongoDB instances are located.

Echo 300 > / proc/sys/net/ipv4/tcp_keepalive_time

This setting will disappear when you restart and need to be modified. You can modify / etc/sysctl.conf directly and then execute sysctl-p

5. Check the size of the Oplog

The larger the oplog, the greater the acceptable delay.

Use db.printReplicationInfo (); check the size of the oplog

Db.printReplicationInfo (); configured oplog size: 50278.6203125MBlog length start to end: 143109secs (39.75hrs) oplog first event time: Wed Mar 18 2015 00:36:53 GMT+0800 (CST) oplog last event time: Thu Mar 19 2015 16:22:02 GMT+0800 (CST) now: Thu Mar 19 2015 17:32:42 GMT+0800 (CST)

If you resize oplog, you need to set all members to the same size.

6.Oplog Entry Timestamp Error

If the following error occurs in the log

ReplSet error fatal couldn't query the local local.oplog.rs collection. Terminating mongod after 30 seconds. [rsStart] bad replSet oplog entry?

Reference:

Http://docs.mongodb.org/v2.4/tutorial/troubleshoot-replica-sets/

Http://john88wang.blog.51cto.com/2165294/1564543

Http://docs.mongodb.org/v2.4/faq/diagnostics/#faq-keepalive

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.