In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces how to solve the efficiency problem of MongoDB beware index seek, the article is very detailed, has a certain reference value, interested friends must read it!
Background
Recently, a work order analysis service online has been unstable, and the monitoring platform has issued warnings from time to time that the database operation has timed out.
After communicating with the operation and maintenance brothers, they found that there were several business operation failures at 1: 00 a. M. every day, but no obvious anomalies were found in the database monitoring.
In the log of the analysis service, it is found that a database operation has generated SocketTimeoutException.
At first, the developers hope to avoid this problem by adjusting the timeout parameters of MongoDB Java Driver.
However, after detailed analysis, it is impossible to cure the problem, and it is difficult to evaluate how the timeout configuration should be adjusted.
The following is about the process of analysis and tuning of this problem.
Preliminary analysis
Judging from the error message, the operation response of the database timed out, and the SocketReadTimeout configured by the client is 60s.
So, what caused the database to fail to return for 60s?
Business operation
The database on the left is a work order data sheet (t_work_order), which records the information of each work order, including the work order number (oid) and the last modification time (lastModifiedTime).
The analysis service is an application implemented by Java that pulls out the work order information modified the previous day (required to be sorted by work order number) at 1:00 every morning for processing.
Because the work order table is very large (tens of millions of levels), the paging method (1000 items at a time) will be adopted in the processing, and the method of turning the page by work order number will be used:
Pull the first time
Db.t_work_order.find ({"lastModifiedTime": {$gt: new Date ("2019-04-09T09:44:57.106Z"), $lt: new Date ("2019-04-09T10:44:57.106Z")}, "oid": {$exists: true}}) .sort ({"oid": 1}) .limit (1000)
The second pull starts with the work order number of the last record pulled for the first time.
Db.t_work_order.find ({"lastModifiedTime": {$gt: new Date ("2019-04-09T09:44:57.106Z"), $lt: new Date ("2019-04-09T10:44:57.106Z")}, "oid": {$exists: true, $gt: "VXZ190"}) .sort ({"oid": 1}) .limit (1000)..
Based on such a query, the developer uses the following indexes for the data table:
Db.t_work_order.ensureIndexes ({"oid": 1, "lastModifiedTime":-1})
Although the index basically matches the query field, it is inefficient in actual execution:
The first pull time is very long, often more than 60 seconds lead to an error, while the later pull time will be faster
In order to simulate this scenario accurately, we preset a small part of the data in the test environment and Explain the SQL that pulls the record:
Db.t_work_order.find ({"lastModifiedTime": {$gt: new Date ("2019-04-09T09:44:57.106Z"), $lt: new Date ("2019-04-09T10:44:57.106Z")} "oid": {$exists: true}}) .sort ({"oid": 1}) .limit (1000) .explain ("executionStats")
The output is as follows
"nReturned": 1000
"executionTimeMillis": 589
"totalKeysExamined": 136661
"totalDocsExamined": 1000
...
"indexBounds": {
"oid": [
"[MinKey, MaxKey]"
]
"lastModifiedTime": [
(new Date (1554806697106), new Date (1554803097106)
]
}
"keysExamined": 136661
"seeks": 135662
In the course of implementation, it is found that 1000 records need to be scanned for 13.6W index items.
Almost all of the overhead is spent on a seeks operation.
Reasons for indexing seeks
The official documentation explains seeks as follows:
The number of times that we had to seek the index cursor to a new position in order to complete the index scan.
The translation is:
Seeks is the number of times the executor must position the cursor to a new location in order to complete an index scan (stage).
We all know that the index of MongoDB is the implementation of B+ tree (3.x or above), which is very fast for continuous leaf node scanning (only one addressing is required), then too many seeks operations indicate a large number of addressing (skipping non-target nodes) throughout the scanning process.
Moreover, this seeks metric is supported in version 3.4, so it can be speculated that this operation has an impact on performance.
To explore how seeks was created, we tried to make some changes to the query statement:
Remove the exists condition
The exists condition exists because of a historical problem (some old records do not contain the field of the work order number). In order to check whether the exists query is a critical issue, modify it as follows:
Db.t_work_order.find ({"lastModifiedTime": {$gt: new Date ("2019-04-09T09:44:57.106Z"), $lt: new Date ("2019-04-09T10:44:57.106Z")}) .sort ({"oid": 1}) .limit (1000) .explain ("executionStats")
The result after execution is:
"nReturned": 1000
"executionTimeMillis": 1533
"totalKeysExamined": 272322
"totalDocsExamined": 272322
...
"inputStage": {
"stage": "FETCH"
"filter": {
"$and": [
{
"lastModifiedTime": {
$lt: ISODate ("2019-04-09T10:44:57.106Z")
}
}
{
"lastModifiedTime": {
$gt: ISODate ("2019-04-09T09:44:57.106Z")
}
}
]
}
...
"indexBounds": {
"oid": [
"[MinKey, MaxKey]"
]
"lastModifiedTime": [
"[MaxKey, MinKey]"
]
}
"keysExamined": 272322
"seeks": 1
It is found here that after removing the exists, the seeks becomes one time, but the entire query scans 27.2W index entries! It's exactly twice what it was before it was removed.
The change of seeks to one time means that sequential scanning of leaf nodes has been used. however, because the scanning range is very large, sequential scanning will be performed and a large number of records that do not meet the criteria will be filtered in order to find the target records.
The emergence of filter during the FETCH phase illustrates this. At the same time, we checked the characteristics of the data table: there are two records for the same work order number! Therefore, it can be explained:
When the exists query condition exists, the executor chooses to perform seeks jump retrieval by work order number, as shown below:
In the absence of the exists condition, the executor chooses the sequential scanning of leaf nodes, as shown below:
Gt condition and reverse order
In addition to the first query, we also analyzed the subsequent paging query, as follows:
Db.t_work_order.find ({"lastModifiedTime": {$gt: new Date ("2019-04-09T09:44:57.106Z"), $lt: new Date ("2019-04-09T10:44:57.106Z")}, "oid": {$exists: true, $gt: "VXZ190"}) .sort ({"oid": 1}) .limit (1000) .explain ("executionStats")
In the above statement, a condition of $gt: "VXZ190" is added, and the execution process is as follows:
"nReturned": 1000, "executionTimeMillis": 6, "totalKeysExamined": 1004, "totalDocsExamined": 1000... "indexBounds": {"oid": ["(\" VXZ190\ ", {})"], "lastModifiedTime": ["(new Date (1554806697106), new Date (1554803097106))"]}, "keysExamined": 1004, "seeks": 5
It can be found that the number of seeks is very small, and the retrieval process scans only 1004 records, which is very efficient.
So, does it mean that in the later data, the records that meet the criteria of the query are very dense?
To verify this, let's adjust the query that was paged for the first time to follow the descending order of the work order number (scan from back to front):
Db.t_work_order.find ({"lastModifiedTime": {$gt: new Date ("2019-04-09T09:44:57.106Z"), $lt: new Date ("2019-04-09T10:44:57.106Z")}, "oid": {$exists: true}}) .sort ({"oid":-1}) .limit (1000) .explain ("executionStats")
The execution of the new "reverse query statement" is as follows:
"nReturned": 1000, "executionTimeMillis": 6, "totalKeysExamined": 1001, "totalDocsExamined": 1000. Direction ":" backward "," indexBounds ": {" oid ": [" [MaxKey, MinKey] "]," lastModifiedTime ": [" (new Date (1554803097106), new Date (1554806697106)) "]}," keysExamined ": 1001," seeks ": 2
As you can see, the execution is more efficient, and almost no seeks operations are needed!
After some confirmation, we learned that in the distribution of all the data, the larger the work order number, the greater the update time value. Basically, the target data we want to query are concentrated at the end.
As mentioned at the beginning, the first query is very slow or even timed out, and the later query is fast.
The two query execution routes mentioned above are shown in the figure:
Add the $gt condition and start the search from the middle
Reverse the order and retrieve it from the back
Optimization idea
Through the analysis, we know that the crux of the problem is that the scanning range of the index is too large, so how to optimize to avoid scanning a large number of records?
From the point of view of the existing indexes and conditions, due to the simultaneous existence of gt, exists and the time range limit of leaf nodes, seeks operations are inevitable.
And the performance of the query is unstable, which has a lot to do with the data distribution and specific query conditions.
So what is mentioned at the beginning is that increasing the threshold of socketTimeout may be a palliative rather than a permanent cure. Once the distribution of index values of data changes or the amount of data continues to increase, more serious things may happen.
Going back to the initial demand scenario, the timer requires that the daily updated work order (sorted by work order number) be read and processed in batches.
Then, according to the idea of breaking parts into whole, add a lastModifiedDay field, which stores the date value corresponding to lastModifiedTime (rounded by the low bit), so that all work order records updated in the same day have the same value.
Set up the composite index {lastModifiedDay:1, oid:1}, and the corresponding query conditions are changed to:
{"lastModifiedDay": new Date ("2019-04-09 00oid"), "oid": {$gt: "VXZ190"}-- limit 1000
The implementation results are as follows:
"nReturned": 1000
"executionTimeMillis": 6
"totalKeysExamined": 1000
"totalDocsExamined": 1000
...
"indexBounds": {
"lastModifiedDay": [
(new Date (1554803000000), new Date (1554803000000)
]
"oid": [
"(\" VXZ190\ ", {})"
]
}
"keysExamined": 1000
"seeks": 1
After this optimization, each query can only scan 1000 records at most, and the query speed is very fast!
The above is all the contents of this article entitled "how to solve the efficiency problem of MongoDB beware of indexing seek". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.