What is the principle of replication sets in MongoDB? 07/12 Update SLTechnology News&Howtos

What is the principle of replication sets in MongoDB?

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article introduces what is the principle of replication set in MongoDB. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.

Introduction to replication set

A Mongodb replication set consists of a set of Mongod instances (processes), including a Primary node and multiple Secondary nodes. All data of the Mongodb Driver (client) is written to the data synchronously written by Primary,Secondary from the Primary to keep all members of the replication set storing the same dataset and provide high availability of data.

The following figure (from the official Mongodb documentation) is a typical Mongdb replication set that contains one Primary node and two Secondary nodes.

Primary election

The replication set is initialized by the replSetInitiate command (or mongo shell's rs.initiate ()). After initialization, heartbeat messages are sent among the members, and an Priamry election operation is initiated. The node supported by the "majority" member vote will become Primary, and the rest of the nodes will become Secondary.

Initialize replication set

Config = {_ id: "my_replica_set", members: [{_ id: 0, host: "rs1.example.net:27017"}, {_ id: 1, host: "rs2.example.net:27017"}, {_ id: 2, host: "rs3.example.net:27017"},]} rs.initiate (config)

The definition of "majority"

Assuming that the number of voting members (described later) in the replication set is N, then most of them are Nmax 2 + 1. When the number of surviving members in the replication set is insufficient, the whole replication set will not be able to elect Primary, and the replication set will not be able to provide write services and will be in a read-only state.

Number of voting members majority tolerated invalidation 110220321431532642743

It is generally recommended to set the number of replication set members to an odd number. It can be seen from the above table that replication sets with 3 nodes and 4 nodes can only tolerate one node failure. From the perspective of "service availability", the effect is the same. (but there is no doubt that 4 nodes can provide more reliable data storage)

Special Secondary

Normally, the Seconary of the replication set participates in the Primary election (or it may be selected as Primary itself) and synchronizes the recently written data from the Primary to ensure that the same data is stored as the Primary.

Secondary can provide read service, increase the ability of Secondary node to provide read service of replication set, and improve the availability of replication set at the same time. In addition, Mongodb supports flexible configuration of Secondary nodes of replication sets to meet the needs of a variety of scenarios.

Arbiter

The Arbiter node only participates in voting, cannot be selected as Primary, and does not synchronize data from Primary.

For example, if you deploy a replication set with 2 nodes and 1 Primary,1 and Secondary, and any node is down, the replication set will not be able to provide services (Primary cannot be selected). In this case, you can add an Arbiter node to the replication set. Even if the node is down, you can still select Primary.

Arbiter itself does not store data and is a very lightweight service. When the members of the replication set are even, it is best to add an Arbiter node to improve the availability of the replication set.

Priority0

The Priority0 node has an election priority of 0 and will not be elected as Primary

For example, if you deploy a replica set across data center An and B, and you want to specify that the Primary must be in computer room A, you can set the replica set member Priority of computer room B to 0, so that Primary must be a member of computer room A. (note: if deployed in this way, it is best to deploy "most" nodes in computer room A, otherwise the Primary may not be selected when the network is partitioned)

Vote0

In Mongodb 3.0, the maximum number of replica set members is 50, the maximum number of members voting in the Primary election is 7, and the vote property of other members (Vote0) must be set to 0, that is, do not vote.

Hidden

The Hidden node cannot be selected as primary (Priority is 0) and is not visible to Driver.

Because the Hidden node will not accept the Driver request, you can use the Hidden node to do some data backup and offline computing tasks, which will not affect the service of the replication set.

Delayed

The Delayed node must be a Hidden node and its data lags behind Primary for a period of time (configurable, for example, 1 hour).

Because the data of Delayed node lags behind Primary for a period of time, when incorrect or invalid data is written into Primary, it can be recovered to the previous time point through the data of Delayed node.

Data synchronization

Data is synchronized between Primary and Secondary through oplog. After the write operation on Primary is completed, an oplog,Secondary is continuously fetched new oplog from Primary and applied to a special set of local.oplog.rs.

Because the data of oplog will continue to increase, local.oplog.rs is set as a capped collection, and when the capacity reaches the upper limit of the configuration, the oldest data will be deleted. In addition, considering that oplog may be repeatedly applied on Secondary, oplog must be idempotent, that is, repeated applications will get the same results.

The following oplog format, including ts, h, op, ns, o, and other fields

{"ts": Timestamp (1446011584, 2), "h": NumberLong ("1687359108795812092"), "v": 2, "op": "I", "ns": "test.nosql", "o": {"_ id": ObjectId ("563062c0b085733f34ab4129"), "name": "mongodb", "score": "100"}

Ts: operation time, current timestamp + counter, counter is reset every second

H: globally unique identification of the operation

V:oplog version Information

Op: type of operation

I: insert operation

U: update operation

D: delete operation

C: execute commands (such as createDatabase,dropDatabase)

N: null operation, special purpose

Ns: the collection for which the operation is directed

O: operation content, if it is an update operation

O2: query condition for operation. Only the update operation contains this field.

When Secondary synchronizes data for the first time, it will first init sync, synchronize all data from Primary (or other data updated Secondary), and then constantly query the latest oplog from Primary's local.oplog.rs collection through tailable cursor and apply it to itself.

The init sync process consists of the following steps

T1 time, synchronize the data of all databases (except local) from Primary, through the combination of listDatabases + listCollections + cloneCollection sensitive command, assuming T2 time to complete all operations.

For all the oplog within the [T1-T2] period applied from Primary, some of the operations may have been included in step 1, but due to the idempotency of oplog, it can be applied repeatedly.

According to the index settings of each collection of the Primary, create an index on the Secondary for the corresponding collection. (the index for each collection _ id has been completed in step 1).

The size of the oplog collection should be reasonably configured according to the DB size and application writing requirements. If the configuration is too large, the storage space will be wasted. If the configuration is too small, the init sync of the Secondary may not be successful. For example, in step 1, due to too much DB data and too small oplog configuration, the oplog is not enough to store all the oplog in [T1, T2] time, so Secondary cannot synchronize the complete data set from the Primary.

Modify replication set configuration

When you need to modify the replication set, such as adding members, deleting members, or modifying the member configuration (such as priorty, vote, hidden, delayed, and so on), you can reconfigure the replication set with the replSetReconfig command (rs.reconfig ()).

For example, if the second member of the replication set, Priority, is set to 2, you can execute the following command

Cfg = rs.conf (); cfg.members [1] .priority = 2tens rs.reconfig (cfg)

Talking about the Primary election in detail

In addition to the Primary election that occurs when the replication set is initialized, there are also the following scenarios

Replication set is reconfig

When the Secondary node detects a Primary downtime, it will trigger the election of a new Primary.

When there is a Primary node active stepDown (actively downgraded to Secondary), a new Primary election will also be triggered.

The election of Primary is affected by many factors, such as heartbeat between nodes, priority, the latest oplog time and so on.

Heartbeat between nodes

Members of the replica set send heartbeat information every 2s by default. If the heartbeat of a node is not received in 10s, the node is considered to be down; if the downtime node is Primary,Secondary (provided that it can be selected as Primary), a new Primary election will be initiated.

Node priority

Each node tends to vote for the node with the highest priority.

The node with priority 0 will not initiate Primary election actively.

When Primary finds that there is a higher priority Secondary and the data of the Secondary lags behind within 10 seconds, the Primary will actively downgrade, giving the higher priority Secondary a chance to become a Primary.

Optime

Only nodes with the latest optime (the timestamp of the most recent oplog) can be selected as primary.

Network partition

Only if more voting nodes maintain network connectivity will they have a chance to be selected Primary;. If Primary is disconnected from most nodes, Primary will be actively downgraded to Secondary. When network partitioning occurs, multiple Primary may occur in a short period of time, so when Driver writes, it is best to set the "most successful" policy, so that even if there are multiple Primary, only one Primary can successfully write to most.

Read and write settings for replication set

Read Preference

By default, all read requests for replication sets are sent to Primary,Driver to route read requests to other nodes by setting up Read Preference.

Primary: default rule, all read requests are sent to Primary

PrimaryPreferred: Primary first. If Primary is unreachable, request Secondary.

Secondary: all read requests are sent to secondary

SecondaryPreferred:Secondary priority. Request Primary when all Secondary is unreachable.

Nearest: the read request is sent to the nearest reachable node (the nearest node is detected by ping)

Write Concern

By default, Primary returns as soon as it completes the write operation, and Driver can set the rule for success by setting [Write Concern (https://docs.mongodb.org/manual/core/write-concern/)).

The following write concern rule setting must be successful on most nodes with a timeout of 5s.

Db.products.insert ({item: "envelopes", qty: 100, type: "Clasp"}, {writeConcern: {w: majority, wtimeout: 5000}})

The above setting is for a single request, or you can modify the default write concern of the replica set so that you don't have to set it separately for each request.

Cfg = rs.conf () cfg.settings = {} cfg.settings.getLastErrorDefaults = {w: "majority", wtimeout: 5000} rs.reconfig (cfg)

Exception handling (rollback)

When Primary goes down, if any data is not synchronized to Secondary, when Primary rejoins, if a write operation has already occurred on the new Primary, the old Primary needs to roll back some of the operations to ensure that the dataset is consistent with the new Primary.

The old Primary writes the rolled-back data to a separate rollback directory, and the database administrator can use mongorestore to recover as needed.

About what the principle of replication set in MongoDB is shared here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.