In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
problem
After the recent upgrade of an online three-shard cluster from version 3.2 to version 4.0, the CPU load of the cluster node has increased a lot (10%-> 40%). Except for the version upgrade, the project logic and operation volume have not changed. After shutting down Balancer, the CPU load returned to normal and stabilized at less than 10%. To this end, you can only shut down the balancer that is currently writing to the table, and turn on balancer to enable balancer every Tuesday. During this period, the CPU load of the node remains stable at 40%. The cluster has three shards, and the logic of the project itself remains unchanged except for the change in the MongoDB version. So what is the reason behind the big change in CPU load after the upgrade?
Monitoring and logging
First of all, it can be made clear that the increase in CPU load after the upgrade is related to balancer migration data. Observe the load and mongostat results during the opening of balancer on Tuesday after the upgrade:
It can be found that the increase in CPU load is consistent with the delete data. After migrating the data, the source node needs to delete the migrated data, so there must be a lot of delete. The deletion after migrating the data will also have the following log:
53094 08T10:09:24.035199+08:00 2019-10-08T10:09:24.035199+08:00 I SHARDING [Collection Range Deleter] No documents remain to delete in dt2log.tbl_log_item_20191001 range [{_ id:-3074457345618258602}, {_ id:-3033667061349287050})
53095 08T10:09:24.035222+08:00 2019-10-08T10:09:24.035222+08:00 I SHARDING [Collection Range Deleter] Waiting for m ajority replication of local deletions in dt2log.tbl_log_item_20191001 range [{_ id:-3074 457345618258602}, {_ id:-3033667061349287050})
53096Finished dele ting documents in dt2log.tbl_log_item_20191001 range 2019-10-08T10:09:24.035274+08:00 I SHARDING [Collection Range Deleter] Finished dele ting documents in dt2log.tbl_log_item_20191001 range [{_ id:-3074457345618258602}, {_ id
-3033667061349287050})
Therefore, judging from the monitoring and logging, the high load of CPU is mainly due to the deletion after migrating the data. Moreover, the tables in the cluster are all tables of {_ id: hashed} sharding type, with a large amount of data, but each piece of data is small. The average number of documents per chunk is 10w+, and the data deletion speed is about 200-300 paces, so the deletion caused by moving a chunk will last about 10 minutes.
Count the moveChunk status of the last 2 cycles after balancer is enabled:
As can be seen from the above table, in this scenario, the collection data of {_ id: hashed} sharding types is basically uniform, so it is not necessary to restart and enable balancer. Because of the large number of documents per chunk, deletion can be resource-intensive.
Turning off the balancer of the table can solve the problem of increased load after the upgrade, but why is the CPU load higher after the upgrade to 4.0, while version 3.2 is stable at a low level? There is only one possible reason: moveChunk occurs more frequently in version 4.0, and the CPU load is always high due to continuous data deletion; moveChunk occurs less in version 3.2, so the load is very low because the data is not deleted.
So the root of this question is: is there any difference in logic between version 4.0 and version 3.2 of balancer and moveChunk? With the same operation, why does a version 4.0 cluster have more moveChunk?
Codes: splitChunk, balancer, and moveChunk
When insert, update and delete operations occur through mongos, mongos will estimate the amount of data corresponding to chunks. If the condition is met, the operation of splitChunk will be triggered, and splitChunk may lead to uneven distribution of chunk in the cluster. Balancer detects the distribution of data. When the data is unevenly distributed, it initiates a moveChunk task to migrate data from shards with more chunks to those with less chunks. After migration, the source node will asynchronously delete the migrated chunk data.
Version 3.2 and version 4.0, the biggest difference in the logic of this part is that version 3.2 balancer in mongos,4.0 version starts in config (version 3.4), and there is basically no difference between the moveChunk process and the logic of deleting data.
SplitChunk
Generally speaking, when split chunks inserts, updates, and deletes data, the mongos issues the splitVector command to the shard, and then the shard will judge whether split is needed. But mongos does not know the real amount of data of each chunk and is judged by a simple estimation algorithm.
When starting, mongos defaults that the original size of each chunk is 0-1 to take a random value in the 5 maxChunkSize range.
After that, the data in chunk is chunkSize = chunkSize + docSize for each update/insert operation.
When chunkSize > maxChunkSize/5, a possible split chunk operation is triggered; when the shard mongod executes the splitVector command, the splitVector command returns the split point of the chunk. If the return is empty, no split is required, otherwise the splitChunk is continued.
In other words, splitChunk operation has lag, even if the data distribution is balanced, it is possible that the difference in splitChunk execution time leads to the middle uneven state of chunks distribution, resulting in a large number of moveChunk.
Balancer
The default detection period is 10s for both 3.2s and 4.0s balancer, and 1s if moveChunk occurs. The basic process of balancer is roughly the same:
Config.shards reads shard information
Config.collections reads all the collection information and saves the random sort to an array
Read chunks information from config.chunks for each collection
The shard with the largest number of chunks (maxChunksNum) is the source shard, and the shard with the least number of chunks (minChunksNum) is the destination shard; if the maxChunksNum-minChunksNum is greater than the migration threshold (threshold), then it is in an unbalanced state and needs to be migrated. The first chunk of the chunks of the source shard is the chunk to be migrated, and a migration task (source shard, destination shard, chunk) is constructed.
Each time, balancer detects all collections, and each collection has at most one migration task. When constructing a migration task, if a collection contains the maximum number of shards or the minimum number of chunks shards, and already belongs to a migration task, then the current round of balancer for this collection will not migrate. Finally, the next balancer process will not start until the migration task detected this time is complete.
In the balancer process, a random sort of set is made. When the data of multiple sets need to be balanced, the migration is also random, not after the migration of one set to the next.
Focus on the above migration threshold, which is that the migration threshold threshold is different in versions 3.2 and 4.0.
Version 3.2, it is 2 when the number of chunks is less than 20, 4 when it is less than 80, and 8 when it is greater than 80. In other words, suppose a two-shard cluster has 100 chunk for a table, and each shard has 47 and 53 chunk, respectively. Then balance considers it to be balanced at this time, and there will be no migration.
Int threshold = 8
If (balancedLastTime | | distribution.totalChunks () < 20) threshold = 2
Else if (distribution.totalChunks () < 80) threshold = 4
In version 4. 0, migration occurs when the gap in the number of chunks is greater than 2. In the same example above, when each shard has 47 and 53 chunk respectively, balance believes that it is uneven and migration will occur.
Const size_t kDefaultImbalanceThreshold = 2; const size_t kAggressiveImbalanceThreshold = 1
Const size_t imbalanceThreshold = (shouldAggressivelyBalance | | distribution.totalChunks ()
< 20)
? KAggressiveImbalanceThreshold: kDefaultImbalanceThreshold
/ / although there is a 1 here, migration will not occur when the actual gap is 1, because when judging the migration, there is another indicator: the average maximum ch of each shard.
Number of unks, migration occurs only when the number of chunks is greater than this value.
Const size_t idealNumberOfChunksPerShardForTag = (totalNumberOfChunksWithTag / totalNumberOfShardsWithTag) + (totalNumberOfChunksWithTag% totalNumberOfShardsWithTag? 1: 0)
This threshold is also described in the official documentation:
To minimize the impact of balancing on the cluster, the balancer only begins balancing after the distribution of chunks for a sharded collection has reached certain thresholds. The thresholds apply to the difference in number of chunks between the shard with the most chunks for the collection and the shard with the fewest chunks for that collection. The balancer has the following thresholds:
The balancer stops running on the target collection when the difference between the number of chunks on any two shards for that collection is less than two, or a chunk migration fails.
But from the code, the logic of this threshold has changed since version 3.4, but the documentation has not been updated.
MoveChunk
MoveChunk is a relatively complex action, and the general process is as follows:
In order to slice, the first step is to delete the data of the chunk to be moved. So there will be a delete task.
You can insert and delete the write concern of data during config.settings setting _ secondaryThrottle and waitForDelete setting moveChunk
_ secondaryThrottle: true means to wait for at least one secondary node to reply when balancer inserts data; false means not to wait for writing to secondary node; or it can be set to write concern directly, then this write concern is used during migration. Version 3.2 defaults to true, and version 3.4 defaults to false
WaitForDelete: after migrating a chunk data, whether to wait for the data to be deleted synchronously; default is false, and orphan data is deleted asynchronously by a separate thread.
The settings are as follows:
Use config db.settings.update (
{"_ id": "balancer"}
{$set: {"_ secondaryThrottle": {"w": "majority"}, "_ waitForDelete": true}}
{upsert: true}
)
Version 3.2 _ secondaryThrottle defaults to true, and version 3.4 defaults to false, so when version 3.2 and version 4.0 moveChunk migrate data, version 4.0 will be completed faster, and the insert per second of destination shards in the migration will also be more, which will have some impact on the CPU load.
In addition, 3.4.18 Parameter 3.6.10 pick 4.0.5 and later, there are the following parameters (Parameter) to adjust the speed of data insertion:
MigrateCloneInsertionBatchDelayMS: the interval between each insert when migrating data. Default 0 does not wait.
MigrateCloneInsertionBatchSize: when migrating data, there is no limit to the number of inserts each time.
The settings are as follows:
Db.adminCommand ({setParameter:1,migrateCloneInsertionBatchDelayMS:0})
Db.adminCommand ({setParameter:1,migrateCloneInsertionBatchSize:0})
Delete data thread asynchronously
The implementation of asynchronous delete thread in version 3.2 and version 4.0 is slightly different, but the fundamental process is the same, using a queue to save the range that needs to be deleted, and cyclically fetching the queue's data to delete data. So the asynchronous delete data thread is deleted one by one according to the order in which chunk enters the queue. Total entrance:
Version 3.2 db/range_deleter.cpp thread entry RangeDeleter::doWork ()
In version 4.0 of db/s/metadata_manager.cpp scheduleCleanup, there is a unique thread that performs the cleanup task.
In version 4.0, when deleting data, the data is deleted in batches, and the amount of each deletion is calculated as follows:
MaxToDelete = rangeDeleterBatchSize.load ()
If (maxToDelete
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.