How to understand Mongos and Cluster equilibrium 07/19 Update SLTechnology News&Howtos

How to understand Mongos and Cluster equilibrium

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to understand Mongos and cluster equilibrium". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to understand Mongos and cluster balance.

Mongodb can be run as a single replication set, and client directly connected mongod reads data.

In the case of a single replication set, the responsibility for horizontal expansion of data is shifted to the business layer solution (sub-instance, sub-database and sub-table). Mongodb natively provides a cluster solution. The brief structure of this solution is as follows:

Mongodb cluster is a typical decentralized distributed cluster. Mongodb cluster mainly solves the following problems for users:

Consistency and high availability of metadata (Consistency + Partition Torrence)

Multi-backup disaster recovery of business data (guaranteed by replication set technology)

Dynamic automatic slicing

Dynamic automatic data equalization

The following is a step-by-step in-depth analysis of the principle of mongodb cluster by introducing the various components of mongodb cluster.

ConfigServer

Mongodb metadata is all stored in configServer, and configServer is a cluster of at least three mongod instances.

The only function of configServer is to provide the addition, deletion, modification and query of metadata. Similar to most metadata management systems (etcd,zookeeper), it also ensures consistency and partition fault tolerance. It does not have the centralized scheduling function.

ConfigServer and replication set

The partition fault tolerance (P) and data consistency (C) of ConfigServer are the nature of the replication set itself.

The read-write consistency of MongoDb is guaranteed by two parameters WriteConcern and ReadConcern.

The combination of the two can get different levels of consistency.

Specifying writeConcern:majority guarantees that write data will not be lost and will not be rolled back because of electing a new primary node.

ReadConcern:majority + writeConcern:majority can ensure strong consistency in reading.

ReadConcern:local + writeConcern:majority can guarantee the final consistent reading.

Mongodb specifies how writeConcern:majority is written to all configServer, so metadata is guaranteed not to be lost.

The way of ReadPreference:PrimaryOnly is specified for the reading of configServer, and the strong consistent reading of metadata is obtained by abandoning An and P in CAP.

Automatic slicing of Mongos data

For a read and write operation, mongos needs to know which replication set it should be routed to. Mongos realizes routing by dividing the chip key space into several intervals and calculating the replication set corresponding to the interval of the chip key of an operation.

Collection1 is divided into four chunk, of which

Chunk1 contains (- INF,1), and chunk3 contains data of [20,99), which is placed on shard1.

Chunk2 contains [1Jue 20), and chunk4 contains data of [99, INF), which is placed on shard2.

The information of chunk is stored in the config.chunks table of the mongod instance of configServer in the following format:

{"_ id": "mydb.foo-a_\" cat\ "," lastmod ": Timestamp (1000, 3)," lastmodEpoch ": ObjectId (" 5078407bd58b175c5c225fdc ")," ns ":" mydb.foo "," min ": {" animal ":" cat "}," max ": {" animal ":" dog "} "shard": "shard0004"}

It is worth noting that chunk is a logical organizational structure that does not involve the underlying file organization.

Heuristic triggers chunk splitting

By default in mongodb configuration, the size of each chunk is 16MB. If you exceed this size, you need to perform a chunk split. Chunk splitting is initiated by mongos, and the data is placed at mongod, so mongos cannot accurately determine the actual data size of a chunk after each add-delete operation. So mongos uses a heuristic trigger splitting method:

Mongos records a hash table of chunk_id-> incr_delta in memory.

For insert and update operations, the upper bound (WriteOp::targetWrites) of incr_delta is estimated, and when incr_delta exceeds the threshold, chunk splitting is performed.

It is worth noting that:

1) chunk_id- > incr_delta is a piece of data maintained in mongos memory, which is lost after reboot

2) the data between different mongos are independent of each other.

3) update without shardkey cannot act on chunk_id- > incr_delta

Therefore, this heuristic splitting method is very imprecise, and apart from manual command splitting, it is the only chunk splitting method that comes with mongos.

The execution process of chunk splitting

1) issue the splitVector command to the corresponding mongod to obtain a splittable point of chunk

2) after mongos gets these split points, issue a splitChunk command to mongod

SplitVector execution process:

1) calculate the avgRecSize= coll.size/ coll.count of the document of collection

2) calculate the number of count for each chunk in the split chunk, split_count = maxChunkSize / (2 * avgRecSize)

3) linearly traverse the [chunk_min_index, chunk_max_index] range of index corresponding to the shardkey of collection, and use split_count to segment several spli in the traversal process.

SplitChunk execution process:

1) acquire the distributed lock of the collection to be executed (write a record implementation to the mongod of the configSvr)

2) refresh (read from configSvr) the version number of this shard to check whether it is consistent with the version number carried by the command initiator

3) write split chunk information to configSvr, and modify local chunk information and shard version number after success

4) write the change log to configSvr

5) notify mongos that the operation is complete, and mongos modifies its own metadata

Flow chart of the execution of chunk split:

Problems and thinking

Question 1: why does mongos put the execution of splitChunk in mongod execution instead of mongos after receiving the return from splitVector? why doesn't mongos notify mongod to modify metadata after splitChunk execution?

We know that Chunk metadata is held in three places, namely configServer,mongos,mongod. If the chunk metadata is changed by mongos, neither other mongos nor mongod can get the latest metadata in the first place. Such problems can occur, as described in the following figure:

The modification of metadata by Mongos has not been perceived by mongod and other mongos, and the version number of other mongos is the same as that of mongod, causing other mongos to write the wrong chunk.

If the chunk meta-information is changed by mongod, mongod senses that the metadata of this shard has been changed before all mongos. Since all mongos write requests to mongod have a version number (with the version number held by the initiator mongos's POV), mongod will return StaleShardingError when it finds that a read and write has a version number lower than its own version number, thus avoiding reading and writing to the wrong chunk.

Mongos routing for read and write

Read request:

Mongos routes the read request to the corresponding shard, and if it gets StaleShardingError, it refreshes the local metadata (reading the latest metadata from configServer) and tries again.

Write request:

Mongos routes the write request to the corresponding shard. If you get the StaleShardingError, it will not retry like the read request. This is unreasonable. As of the current version, mongos only lists a TODO (batch_write_exec.cpp:185).

185 / / TODO: It maybe necessary to refresh the cache if stale, or maybe just186 / / cancel and retarget the batchchunk migration

Chunk migration is performed by the balancer module, which is not a separate service, but a threaded module of mongos. Only one balancer module is executing at a time, which is guaranteed by mongos registering distributed locks in configServer.

Balancer for the chunk distribution of each collection, calculate the chunk to which the collection needs to be migrated, and which shard each chunk needs to migrate to. The process of calculation is relatively trivial in the BalancerPolicy class.

Chunk Migration .Step1

For each collection, MigrationManager::scheduleMigrations balancer attempts to acquire the distributed lock of the collection (apply to configSvr). If the acquisition fails, it indicates that the collection already has a relocation task in progress. This shows that there can be only one relocation task for the same table at a time. If the table is distributed across different shard, completely isolated IO conditions can improve concurrency, but mongos does not take advantage of this.

If the lock is acquired successfully, a moveChunk command is issued to the source shard

Chunk Migration .Step2

Mongod executes the moveChunk command

CloneStage

1) the source mongod constructs the query plan according to the upper and lower limits of the chunk to be migrated, and scans the query based on the shard index. The recvChunkStart instruction is issued to the target mongod to let the target chunk start to enter the data pull phase.

2) when the source mongod modifies this stage, put the id field buffer in memory (MigrationChunkClonerSourceLegacy class). In order to prevent the unrestricted growth of buffer during relocation, the buffer size is set to 500MB. During the relocation process, the amount of key changes exceeding the buffer size will lead to relocation failure.

3) after receiving the recvChunkStart command, the target mongod

a. Based on chunk range, clean up the possible dirty data on this mongod

b. Initiates the _ migrateClone assignment to the source, and obtains the snapshot of the chunk data through the scan query based on the allocation index constructed in 1)

c. After copying the snapshot, issue the _ transferMods command to the source to change the changes maintained in the memory buffer in 2)

d. After receiving the _ transferMods, the source queries the corresponding collection through the recorded objid and returns the real data to the destination.

e. After receiving the data in the _ transferMods phase, the target enters the steady state and waits for the next command from the source. It is important to note that user data is constantly being written, and in theory, there will always be new data in the _ transferMods phase, but you must find a point to truncate the data flow and set the source data (the data corresponding to the relocation chunk) to be unwritable before you can initiate routing changes. Therefore, the "all data in the _ transferMods phase" mentioned here is only for a certain point in time, after which new data will still come in.

f. The source heartbeat checks whether the destination is already in the steady state, and if so, blocks the writing of the chunk, issues the _ recvChunkCommit command to the destination, and then there are no changes on the source's chunk.

g. After the destination receives the _ recvChunkCommit command, it pulls the modifications on the source chunk and executes them. After the successful execution, the source disables routing and cleans up the data of the source chunk.

At this point, I believe you have a deeper understanding of "how to understand Mongos and cluster balance". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.