What is the thinking caused by a node downtime in MongoDB? 09/19 Update SLTechnology News&Howtos

What is the thinking caused by a node downtime in MongoDB?

2025-09-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article shows you what kind of thinking caused by a MongoDB node downtime, the content is concise and easy to understand, it can definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Brief introduction

Recently, a node in a MongoDB cluster environment was abnormally powered off, resulting in a business interruption and then returned to normal.

The service error log is also detected through the ELK alarm.

The operation and maintenance department investigated the cause of the power outage of the node and found that it was only caused by a mistake in the allocation of resources. After solving the problem, everyone also raised some questions about the interruption:

"the current MongoDB cluster uses a fragmented replica set architecture, in which what is the impact of the failure of the primary node?"

"isn't the MongoDB replica set automatically reversable? is this a second?"

With these problems, here is some analysis of the automatic Failover mechanism of the replica set.

Log analysis

First of all, it can be confirmed that the power outage is the primary node on a replica set, and the active / standby relationship is switched when the power is off.

The corresponding log is found from the other two standby nodes:

Log of standby Node 1

2019-05-06T16:51:11.766+0800 I REPL [ReplicationExecutor] Starting an election, since we've seen no PRIMARY in the past 10000ms 2019-05-06T16:51:11.766+0800 I REPL [ReplicationExecutor] conducting a dry run election to see if we could be elected 2019-05-06T16:51:11.766+0800 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 172.30.129.78 ASIO 30071 2019-05-06T16:51:11.767+0800 I REPL [ReplicationExecutor] VoteRequester (term 3 dry run) received a yes vote from 172.30.129.7Vol 30071 Response message: {term: 3, voteGranted: true, reason: ", ok: 1.0} 2019-05-06T16:51:11.767+0800 I REPL [ReplicationExecutor] dry election run succeeded, running for election 2019-05-06T16:51:11.768+0800 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 172.30.129.78 virtual 30071 2019-05-06T16:51:11.771+0800 I REPL [ReplicationExecutor] VoteRequester (term 4) received a yes vote from 172.30.129.7 30071 Response message: {term: 4, voteGranted: true, reason: ", ok: 1.0} 2019-05-06T16:51:11.771+0800 I REPL [ReplicationExecutor] election succeeded, assuming primary role in term 4 2019-05-06T16:51:11.771+0800 I REPL [ReplicationExecutor] transition to PRIMARY 2019-05-06T16:51:11.771+0800 I REPL [ReplicationExecutor] Entering primary catch-up mode. 2019-05-06T16:51:11.771+0800 I ASIO [NetworkInterfaceASIO-Replication-0] Ending connection to host 172.30.129.78 due to bad connection status; 2 connections to that host remain open 2019-05-06T16:51:11.771+0800 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 172.30.129.78 REPL 30071 2019-05-06T16:51:13.350+0800 I REPL [ReplicationExecutor] Error in heartbeat request to 172.30.129.78 REPL 30071; ExceededTimeLimit: Couldn't get a connection within the time limit

Log of standby Node 2

2019-05-06T16:51:12.816+0800 I ASIO [NetworkInterfaceASIO-Replication-0] Ending connection to host 172.30.129.78 due to bad connection status; 0 connections to that host remain open 2019-05-06T16:51:12.816+0800 I REPL [ReplicationExecutor] Error in heartbeat request to 172.30.129.78 NetworkInterfaceASIO-Replication-0 30071 ExceededTimeLimit: Operation timed out, request was RemoteCommand 72553-- target:172.30.129.78:30071 db:admin expDate:2019-05-06T16:51:12.816+0800 cmd: {replSetHeartbeat: "shard0", configVersion: 96911, from: "172.30.129.7 06T16:51:12.821+0800 30071", fromId: 1, term: 3} 2019-05-06T16:51:12.821+0800 I REPL [ReplicationExecutor] Member 172.30.129.16030071 is now in state PRIMARY

It can be seen that the standby node 1 initiates the election at 16:51:11 and becomes the new master node, and then the standby node 2 learns the latest master node information at 16:51:12, so it can be confirmed that the active / standby switching has been completed at this time.

At the same time, there is also a large amount of information about heartbeat failure for the original primary node (172.30.129.78 virtual 30071).

So, how does the standby node perceive that the primary node has lost Down, how does the heartbeat between the primary and standby nodes operate, and what impact does this have on the synchronous replication of data?

Next, let's dig into the automatic failover (Failover) mechanism of the replica set.

How to implement Failover in replica set

The following is a replica set of the PSS (one master and two standby) architecture. In addition to performing data replication with two standby nodes, the three nodes perceive each other's survival through heartbeats.

Once the primary node fails, the standby node will detect that the primary node is in an unreachable state in a certain period, and then one of the standby nodes will initiate elections in advance and eventually become the new primary node. This detection period is determined by the electionTimeoutMillis parameter, which defaults to 10s.

Next, let's take a look at how this mechanism is implemented through some source code:

Db/repl/replication_coordinator_impl_heartbeat.cpp

Related methods

ReplicationCoordinatorImpl::_startHeartbeats_inlock starts the heartbeat of each member.

ReplicationCoordinatorImpl::_scheduleHeartbeatToTarget scheduling task-(schedule) initiates a heartbeat to a member

ReplicationCoordinatorImpl::_doMemberHeartbeat executes to initiate a heartbeat to a member

ReplicationCoordinatorImpl::_handleHeartbeatResponse handles heartbeat response

ReplicationCoordinatorImpl::_scheduleNextLivenessUpdate_inlock scheduling keep alive status check timer

ReplicationCoordinatorImpl::_cancelAndRescheduleElectionTimeout_inlock cancels and reschedules the election timeout timer

ReplicationCoordinatorImpl::_startElectSelfIfEligibleV1 initiates active election

Db/repl/topology_coordinator_impl.cpp

Related methods

TopologyCoordinatorImpl::prepareHeartbeatRequestV1 constructs heartbeat request data

TopologyCoordinatorImpl::processHeartbeatResponse processes the heartbeat response and constructs the next Action instance

The following diagram describes the call relationship between the various methods

Figure-main relationship

The realization of heartbeat

First, after the replica set is built, the node starts sending heartbeats to other members through the ReplicationCoordinatorImpl::_startHeartbeats_inlock method:

Void ReplicationCoordinatorImpl::_startHeartbeats_inlock () {const Date_t now = _ replExecutor.now (); _ seedList.clear (); / / get replica set member for (int I = 0; I

< _rsConfig.getNumMembers(); ++i) { if (i == _selfIndex) { continue; } //向其他成员发送心跳 _scheduleHeartbeatToTarget(_rsConfig.getMemberAt(i).getHostAndPort(), i, now); } //仅仅是刷新本地的心跳状态数据 _topCoord->

RestartHeartbeats (); / / use V1 election protocol (after 3.2s) if (isV1ElectionProtocol ()) {for (auto&& slaveInfo: _ slaveInfo) {slaveInfo.lastUpdate = _ replExecutor.now (); slaveInfo.down = false;} / / schedule the scheduleNextLivenessUpdate_inlock check timer _ scheduleNextLivenessUpdate_inlock ();}}

After obtaining the node information of the current replica set, call the _ scheduleHeartbeatToTarget method to send a heartbeat to other members

Here, the implementation of _ scheduleHeartbeatToTarget is relatively simple. The heartbeat is actually initiated by _ doMemberHeartbeat, as follows:

Void ReplicationCoordinatorImpl::_scheduleHeartbeatToTarget (const HostAndPort& target, int targetIndex, Date_t when) {/ / performs scheduling, calling _ doMemberHeartbeat _ trackHeartbeatHandle (_ replExecutor.scheduleWorkAt (when, stdx::bind (& ReplicationCoordinatorImpl::_doMemberHeartbeat, this, stdx::placeholders::_1, target, targetIndex)) at some point in time;}

The ReplicationCoordinatorImpl::_doMemberHeartbeat method is implemented as follows:

Void ReplicationCoordinatorImpl::_doMemberHeartbeat (ReplicationExecutor::CallbackArgs cbData, const HostAndPort& target, int targetIndex) {LockGuard topoLock (_ topoMutex); / / cancel callback trace _ untrackHeartbeatHandle (cbData.myHandle); if (cbData.status = = ErrorCodes::CallbackCanceled) {return;} const Date_t now = _ replExecutor.now (); BSONObj heartbeatObj; Milliseconds timeout (0) / / later versions of if (isV1ElectionProtocol ()) {const std::pair hbRequest = _ topCoord- > prepareHeartbeatRequestV1 (now, _ settings.ourSetName (), target); / / Construction request, set a timeout heartbeatObj = hbRequest.first.toBSON (); timeout = hbRequest.second } else {.} / / construct a remote command const RemoteCommandRequest request (target, "admin", heartbeatObj, BSON (rpc::kReplSetMetadataFieldName getTerm ()) {/ / cancel and reschedule electionTimeout timer cancelAndRescheduleElectionTimeout ();}}. / / call the processHeartbeatResponse method of topCoord to process the heartbeat response state and return Action HeartbeatResponseAction action = _ topCoord- > processHeartbeatResponse (now, networkTime, target, hbStatusResponse, lastApplied) to be executed next. . / / schedule the next heartbeat at intervals of _ scheduleHeartbeatToTarget (target, targetIndex, std::max (now, action.getNextHeartbeatStartDate ()) provided by action); / / execute processing _ handleHeartbeatResponseAction (action, hbStatusResponse, false) according to Action;}

Many details have been omitted here, but you can still see that the handling of these things is included in response to the heartbeat:

For the successful response of the primary node, the electionTimeout timer is rescheduled (cancel the previous schedule and restart)

Parse the heartbeat response through the processHeartbeatResponse method of the _ topCoord object, and return the Action indication of the next step

Set the next heartbeat timing task according to the next heartbeat time in the Action indication

Handle the actions indicated by Action

So, after the heartbeat response, how long will you wait for the next heartbeat? In the TopologyCoordinatorImpl::processHeartbeatResponse method, the implementation logic is:

If the heartbeat response is successful, it will wait for heartbeatInterval, which is a configurable parameter. The default is 2s.

If the heartbeat response fails, the heartbeat is sent directly (without waiting).

The code is as follows:

HeartbeatResponseAction TopologyCoordinatorImpl::processHeartbeatResponse (...) {... Const Milliseconds alreadyElapsed = now-hbStats.getLastHeartbeatStartDate (); Date_t nextHeartbeatStartDate; / / calculate the start time of the next heartbeat / / numFailuresSinceLastStart corresponds to the number of consecutive failures (less than 2 times) if (hbStats.getNumFailuresSinceLastStart () = _ rsConfig.getElectionTimeoutPeriod ()) {. / / the node is still not updated after the preservation cycle, set to down status slaveInfo.down = true / / if the current node is the master and a standby node is detected as down, enter the memberdown process if (_ memberState.primary ()) {/ / call the setMemberAsDown method of _ topCoord, record that a standby node is unreachable, and get an indication of the next step / / when most of the nodes are not visible, you will get the instruction HeartbeatResponseAction action = _ topCoord- > setMemberAsDown (now, memberIndex, _ getMyLastDurableOpTime_inlock ()) / / execute the instruction _ handleHeartbeatResponseAction (action, makeStatusWith (), true);} / / continue scheduling the next cycle _ scheduleNextLivenessUpdate_inlock ();}

As you can see, this timer is mainly used to implement the alive detection logic of the master node to other nodes:

When the master node finds that most of the nodes are unreachable (does not meet most principles), it will allow itself to perform backup.

Therefore, in a three-node replica set, when two standby nodes hang up, the primary node automatically drops backup. This design is mainly designed to avoid unexpected data inconsistencies.

Figure-main automatic backup

The second is the _ cancelAndRescheduleElectionTimeout_inlock function, which is the key to implementing automatic Failover

Its logic includes an election timer with the following code:

Void ReplicationCoordinatorImpl::_cancelAndRescheduleElectionTimeout_inlock () {/ / if the previous timer has been enabled, cancel if (_ handleElectionTimeoutCbh.isValid ()) {LOG (4) directly

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.