How to understand shard fragmentation in MongoDB 04/29 Update SLTechnology News&Howtos

How to understand shard fragmentation in MongoDB

2025-04-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

How to understand shard fragmentation in MongoDB? in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

Purpose: to have a deep understanding of the slicing mechanism of MongoDB

Key to reading: it is important to have a clear understanding of MongoDB's concepts related to fragmented shard.

1: introduction to Shard

Shard refers to horizontal multi-node data scattered storage, to put it more simply, it is data that may not fit on a machine, we

Install it on multiple machines, for example, we store the national provincial certificate information in different Shard Server, where the Shard Server is each distributed Mongo machine.

The application can log in to the shard cluster through mongos process, and MongoPress performs a routing schedule. Query, and the same is true of the request process. Facing the user, there is only one layer and one node.

2: load balancing and failover

When the load of a shard exceeds a certain threshold, the data will be redistributed automatically to ensure the load balance of the system. To put it more simply,

When writing, if you write to a machine that exceeds the capacity of the machine itself, the data will be redistributed, and the corresponding Hbase will usually appear as a single machine.

The hot topic of writing, the processing mechanism of Mongo is still unknown.

In a normal configuration, each Shard should contain a device group of more than 2 nodes. The name of the device group is usually called replica.set,replica.set with N servers, one of which is primary and the other is secondary. Once one of them dies, it will automatically restart and switch one server to

Primary server.

Shard architecture diagram:

4: shard key

To implement the sharding function, we need to specify the sharding key of the collection, which is equivalent to the partition field of the database

This sharded key usually needs to create an index, and the sharded key can be composed of one or more fields.

5: chunks

A Chunks is a contiguous piece of data in a collection, and when a Chunk reaches a certain size, it begins to split. When a

When the shard exceeds a certain amount of data, the chunk will be migrated to other shard, and the addition of shard will also affect the movement of the chunk, such as if you put the log

Write to your local file and specify a file of 60m. Once the size is exceeded, there will be a new file, and once the file cannot be stored on the machine.

Then transfer the files on this machine to another machine.

MongoDB's shard, a database cluster system that expands huge amounts of data horizontally, and the database tables are stored in Sharding's

Above each node.

Relative to a relational database, Chunk is a row record in a database, and Collection is a table in a relational database.

Shard Server

That is, the fragments that store the actual data, and each Shard can be a mongod instance or a Replica Set composed of a group of mongod instances. In order to implement the auto-failover,MongoDB within each Shard, it is officially recommended that each Shard be a set of Replica Set. Please refer to my other article, http://gong1208.iteye.com/blog/1558355, on how to install and build replica set.

Config Server

In order to store a particular collection in multiple shard, you need to specify a shard key for that collection, for example {age: 1}, and the shard key can decide which chunk the record belongs to. Config Servers is used to store the configuration information of all shard nodes, the shard key range of each chunk, the distribution of chunk in each shard, and the sharding configuration information of all DB and collection in the cluster.

To put it more clearly, Config Server holds the metadata of the cluster.

Route Process

This is a front-end route from which the client accesses, then asks the Config Servers which Shard it needs to query or save records, connects to the corresponding Shard for operation, and finally returns the results to the client. The client only needs to send the query or update request originally sent to mongod to Routing Process intact, regardless of which Shard the operating record is stored on.

What route process says more clearly is a selector that constantly schedules appropriate client requests and returns the required data from the appropriate shard to the required data.

Let's build a simple Sharding Cluster on the same physical machine:

The architecture diagram is as follows:

This is the answer to the question on how to understand shard fragments in MongoDB. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.