Manual maintenance of chunk for Mongodb fragmentation 07/19 Update SLTechnology News&Howtos

Manual maintenance of chunk for Mongodb fragmentation

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Last year's notes.

For instance, if a chunk represents a single shard key value, then MongoDB cannot split the chunk even when the chunk exceeds the size at which splits occur.

If a chunk contains only one shard key, mongodb will not split the chunk, even if the chunk exceeds the size of the split needed by the chunk. So the choice of sharding key is very important.

Here is an example. For example, we use the date (accurate to the day) as the sharding key. When there is a lot of data on a certain day, the corresponding chunk of the sharding key (such as 2015-12-12) will be very large.

More than 64m, but this chunk is indivisible. This will cause data imbalance in each shard, resulting in performance problems.

So we need to use the field with high selectivity as the sharding key. If the selectivity of this field (such as log level) is low, we can add another field with high selectivity and two fields as the sharding key.

If we use the date as the sharding key, in order to avoid the big chunk, we can make the sharding key after the date is accurate to minutes and seconds.

If your chunk ranges get down to a single key value then no further splits are possible and you get "jumbo" chunks.

Here is an example of a large chunk:

Http://dba.stackexchange.com/questions/72626/mongo-large-chunks-will-not-split

A common mistake:

Mongos version 3.0.1 Split Chunk Error with Sharding

Http://dba.stackexchange.com/questions/96732/mongos-version-3-0-1-split-chunk-error-with-sharding?rq=1

Manually cut slicing:

Http://www.cnblogs.com/xuegang/archive/2012/12/27/2836209.html

First, use splitFind to manually segment the divisible chunk.

SplitFind (namespace, query), the value of query must include the sharding key. Splits a chunk specified by query into two basically equal-sized chunk.

Mongos > db.users003.getShardDistribution () Shard shard1 at shard1/192.168.137.111:27017192.168.137.75:27017 data: 212KiB docs: 3359 chunks: 2 estimated data per chunk: 106KiB estimated docs per chunk: 1679Shard shard2 at shard2/192.168.137.138:27018192.168.137.75:27018 data: 211KiB docs: 3337 chunks: 2 estimated data per chunk: 105KiB estimated docs per chunk: 1668Shard shard3 at shard3/192.168.137.111:27019192.168.137.138:27019 data: 209KiB docs: 3304 Chunks: 2 estimated data per chunk: 104KiB estimated docs per chunk: 1652Totals data: 633KiB docs: 10000 chunks: 6 Shard shard1 contains 33.58% data 33.58% docs in cluster, avg obj size on shard: 64B Shard shard2 contains 33.37% data, 33.37% docs in cluster, avg obj size on shard: 64B Shard shard3 contains 33.03% data, 33.04% docs in cluster, avg obj size on shard: 64Bmongos > mongos > mongos > AllChunkInfo ("test1.users003", true) ChunkID,Shard,ChunkSize,ObjectsInChunktest1.users003-_id_MinKey,shard1,106368,1662test1.users003-_id_-6148914691236517204,shard1,108608,1697test1.users003-_id_-3074457345618258602,shard3,107072,1673test1.users003-_id_0,shard3,104384,1631test1.users003-_id_3074457345618258602,shard2,110592,1728test1.users003-_id_6148914691236517204,shard2,102976,1609*Summary Chunk Information*Total Chunks: 6Average ChunkSize (bytes): 106666.66666666667Empty Chunks: 0Average ChunkSize (non-empty): 106666.66666666667mongos > db.users003.count () 10000

After the splitFind is executed, the chunk is split into two chunk of essentially the same size:

Mongos > sh.splitFind ("test1.users003", {"name": "Upright 100"}) {"ok": 0, "errmsg": "no shard key found in chunk query {name:\" upright 100\ "}} mongos > sh.splitFind (" test1.users003 ", {" _ id ": ObjectId (" 568bdf16e05cf980cec8c455 ")}) {" ok ": 1} mongos > AllChunkInfo (" test1.users003 ", true) ChunkID,Shard,ChunkSize,ObjectsInChunktest1.users003-_id_MinKey,shard1,106368,1662test1.users003-_id_-6148914691236517204,shard1,54272848test1.users003-_id_-4665891797978533183,shard1,54336849test1.users003-_id_-3074457345618258602,shard3,107072,1673test1.users003-_id_0,shard3,104384,1631test1.users003-_id_3074457345618258602,shard2,110592,1728test1.users003-_id_6148914691236517204 Shard2,102976,1609*Summary Chunk Information*Total Chunks: 7Average Chunk Size (bytes): 91428.57142857143Empty Chunks: 0Average Chunk Size (non-empty): 91428.57142857143mongos > db.users003.getShardDistribution () Shard shard1 at shard1/192.168.137.111:27017192.168.137.75:27017 data: 212KiB docs: 3359 chunks: 3 estimated data per chunk: 70KiB estimated docs per chunk: 1119Shard shard2 at shard2/192.168.137.138:27018192.168.137 .75 estimated data per chunk 27018 data: 211KiB docs: 3337 chunks: 2 estimated data per chunk: 105KiB estimated docs per chunk: 1668Shard shard3 at shard3/192.168.137.111:27019192.168.137.138:27019 data: 209KiB docs: 3304 chunks: 2 estimated data per chunk: 104KiB estimated docs per chunk: 1652Totals data: 633KiB docs: 10000 chunks: 7 Shard shard1 contains 33.58% data 33.58% docs in cluster, avg obj size on shard: 64B Shard shard2 contains 33.37% data, 33.37% docs in cluster, avg obj size on shard: 64B Shard shard3 contains 33.03% data, 33.04% docs in cluster, avg obj size on shard: 64B

Second, use splitAt to segment the divisible chunk manually.

SplitAt (namespace, query) official explanation:

Sh.splitAt () splits the original chunk into two chunks. One chunk has a shard key range

That starts with the original lower bound (inclusive) and ends at the specified shard key value (exclusive).

The other chunk has a shard key range that starts with the specified shard key value (inclusive) as the lower bound

And ends at the original upper bound (exclusive).

3. Migrate chunk manually

Db.runCommand ({moveChunk: "myapp.users"

Find: {username: "smith"}

To: "mongodb-shard3.example.net"})

Note:

MoveChunk: the name of a collection is added with the name of the database: for example, test.yql

Find: a query statement that specifies the data in the collection that matches the query or chunk, and the system automatically finds out the shard of from

To: point to the destination shard of the chunk

The command returns as long as the destination shard and the source sharad agree that the specified chunk is taken over by the destination shard. Migrating chunk is a complex process that includes two internal communication protocols:

1 replicate data, including data that changes during replication

2 make sure that all the components involved in the migration: the destination shard, the source shard, and the config server are sure that the migration has been completed!

The command will block until the migration is complete.

IV. Related scripts

-display chunk distribution information of collection db.collection.getShardDistribution () display chunk information script: AllChunkInfo = function (ns, est) {var chunks = db.getSiblingDB ("config"). Chunks.find ({"ns": ns}). Sort ({min:1}); / / this will return all chunks for the ns ordered by min / / some counters for overall stats at the end var totalChunks = 0; var totalSize = 0; var totalEmpty = 0 Print ("ChunkID,Shard,ChunkSize,ObjectsInChunk"); / / header row / / iterate over all the chunks, print out info for each chunks.forEach (function printChunkInfo (chunk) {var db1 = db.getSiblingDB (chunk.ns.split (".") [0]); / / get the database we will be running the command against later var key = db.getSiblingDB ("config"). Collections.findOne ({_ id:chunk.ns}). Key / / will need this for the dataSize call / / dataSize returns the info we need on the data, but using the estimate option to use counts is less intensive var dataSizeResult = db1.runCommand ({datasize:chunk.ns, keyPattern:key, min:chunk.min, max:chunk.max, estimate:est}); / / printjson (dataSizeResult) / / uncomment to see how long it takes to run and status print (chunk._id+ "," + chunk.shard+ "," + dataSizeResult.size+ "," + dataSizeResult.numObjects); totalSize + = dataSizeResult.size; totalChunks++; if (dataSizeResult.size = = 0) {totalEmpty++}; / / count empty chunks for summary}) print ("* Summary Chunk Information*") Print ("Total Chunks:" + totalChunks); print ("Average Chunk Size (bytes):" + (totalSize/totalChunks)); print ("Empty Chunks:" + totalEmpty); print ("Average Chunk Size (non-empty):" + (totalSize/ (totalChunks-totalEmpty);} example: mongos > AllChunkInfo ("test1.users001", true) ChunkID,Shard,ChunkSize,ObjectsInChunktest1.users001-_id_MinKey,shard3,11347710,171935test1.users001-_id_-6148914691236517204,shard1,11293458,171113test1.users001-_id_-3074457345618258602,shard1,11320716,171526test1.users001-_id_0,shard3,11349096,171956test1.users001-_id_3074457345618258602,shard2,11340054,171819test1.users001-_id_6148914691236517204,shard2,11328966,171651*Summary Chunk Information*Total Chunks: 6Average ChunkSize (bytes): 11330000Empty Chunks: 0Average ChunkSize (non-empty): 11330000

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.