How to solve the problem of high cpu occupancy in mongo database caused by regular expression matching 07/01 Update SLTechnology News&Howtos

How to solve the problem of high cpu occupancy in mongo database caused by regular expression matching

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces how to solve the problem of high cpu occupancy rate of mongo database caused by regular expression matching, which has a certain reference value. Interested friends can refer to it. I hope you can learn a lot after reading this article.

One day, it was monitored that the utilization rate of cpu in mongo database was much higher. After checking it, it was found that it was caused by the following statement:

Db.example_collection.find ({"idField": {"$regex": "123456789012345678"}, "dateField": {"$regex": "2019-10-10"}})

Usually, my first reaction to this situation is the lack of an index for the relevant field, resulting in a full table scan each time such a statement is executed.

But I analyzed it with the explain () statement and found that the two fields idField and dateField mentioned above are indexed, and the statement also uses an index. The following is the result of explain ():

Mgset-11111111:PRIMARY > db.example_collection.find ({"idField": {"$regex": "123456789012345678"}, "dateField": {"$regex": "2019-10-10"}). Explain ("queryPlanner") {"queryPlanner": {"plannerVersion": 1, "namespace": "example_db.example_collection", "indexFilterSet": false "parsedQuery": {"$and": [{"idField": {"$regex": "123456789012345678"}} {"dateField": {"$regex": "2019-10-10"}]}, "winningPlan": {"stage": "FETCH" "inputStage": {"stage": "IXSCAN" "filter": {"$and": [{"idField": {"$regex": "123456789012345678"}} {"dateField": {"$regex": "2019-10-10"}]} "keyPattern": {"idField": 1, "dateField": 1}, "indexName": "idField_1_dateField_1", "isMultiKey": false, "multiKeyPaths": {"idField": [] "dateField": []}, "isUnique": false, "isSparse": false, "isPartial": false, "indexVersion": 2, "direction": "forward" "indexBounds": {"idField": ["[\"\ ", {})", "[/ 123456789012345678ax / 123456789012345678 /]"], "dateField": ["[\"\ " {}) "," [/ 2019 ok 10 /]]}, "ok": []}

Looking at mongo's log, we find that it is really slow to 800~900ms this statement at one time. Unless the database has a large number of cpu cores, cpu will soon be full as long as the concurrency of such statements is slightly higher per second.

Then I searched and found that there might be a problem with the regular expression. It turns out that although the statement does use an index, there is also a field "indexBounds" in the output of the explain () statement that represents the range of indexes to be scanned when the statement is executed. To tell you the truth, I never understood the index range in the above output. The above statement makes a general regular expression matching for both idField and dateField fields, and I guess it scans the entire index tree, so the index does not actually improve the query efficiency of the statement.

I looked at the data in the database and found that there is no need for regular matching in the idField and dateField fields, just ordinary text matching. After removing the regular matching operation $regex, and analyzing it again, the result is as follows:

Mgset-11111111:PRIMARY > db.example_collection.find ({"idField": "123456789012345678", "dateField": "2019-10-10"}). Explain ("queryPlanner") {"queryPlanner": {"plannerVersion": 1, "namespace": "example_db.example_collection", "indexFilterSet": false "parsedQuery": {"$and": [{"idField": {"$eq": "123456789012345678"}} {"dateField": {"$eq": "2019-10-10"}]}, "winningPlan": {"stage": "FETCH" "inputStage": {"stage": "IXSCAN", "keyPattern": {"idField": 1, "dateField": 1}, "indexName": "idField_1_dateField_1", "isMultiKey": false "multiKeyPaths": {"idField": [], "dateField": []}, "isUnique": false, "isSparse": false, "isPartial": false, "indexVersion": 2 "direction": "forward", "indexBounds": {"idField": ["[\" 123456789012345678\ ",\" 123456789012345678\ "]]," dateField ": [[\" 2019-10-10\ " \ "2019-10-10\"]}, "rejectedPlans": []}, "ok": 1}

As you can see, the index is still used, and the index scan range is limited to one value.

Later, I confirmed with the developer that there was really no need to use regular matching in this statement, so I asked him to remove the regular matching. There was no problem after that, and the statement did not appear in the mongo slow log.

Thank you for reading this article carefully. I hope the article "how to solve the problem of high cpu occupancy in mongo database caused by regular expression matching" shared by the editor will be helpful to everyone. At the same time, I hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.