In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Author: Wu Xufei, senior engineer of Tencent Cloud database, is mainly responsible for Tencent Cloud Redis and MongoDB development.
The story begins with a case of MongoDB database connection timeout. This exception caused two failure of server integration. The package has been grabbed on the server and dump has been downloaded. Below is a screenshot of the client timeout:
It is not difficult to see from the screenshot that this is a Nodejs service error message. It is speculated that DBA should use nodejs mongodb to connect to the database and operate. Find the driver's official website https://-mongodb-native, and clone a code down. After a brief look, combined with the above figure, the preliminary analysis shows that the error connection timed out in the 38th connection.
1. Analyze the contents of grasping the package
As I have already grabbed the package on the error server, first of all, I use wireShark to open the files dump from the server. WireShark is very intelligent and can analyze a variety of common protocols, which is very convenient, but it is also easy to lead to misjudgment. For example, many of our database connections are misjudged as the X11 protocol:
At first, I struggled with this error, but of course it was not an x11 protocol, but it just happened to be a pattern match. When wireShark was set up and X11 analysis was cancelled, it was easy to see from the port and connection that it was a database connection.
Carefully examined the contents of the grab package, roughly as follows:
(1) at the beginning, a connection pulled more than 3m of data from the database.
(2) after that, conventional three-way handshake connections have been established successfully, but there is basically no substantive data transmission, so we have gone through the normal tcp termination process.
(3) from the point of view of the packet capture content, the server does not respond to the client syn connection packet.
Well, the content analyzed here doesn't seem to explain the timeout at all, so the next step is to communicate with users and get more information.
(4) all tcp links initiate FIN active shutdown for the client, and there is no case that the server actively closes the client connection.
two。 Error code
Through communication, you get the code snippet of the error part of the tool (there is no complete function at first, and then get the complete function later):
Function merge_union_info (dbs) {var union_data = []; async.each (dbs, function (path, cb) {mongodb.connect (path, (err, db) = > {if (err) {cb (err)) Return} db.collection ('union'). Find (). Sort ({level:-1, exp:-1}). ToArray ((err1, v) = > {if (err1) {db.close (); console.log (err1)) Return} let loop = v.length > 50? 50: v. duration; let u_data = []; for (let I = 0; I
< loop; i++) { v[i].merge_flag = 1; u_data.push(v[i]); } union_data.push(u_data); db.close(); cb(); }); }) }, function(err) { if (err) { console.log("[ERROR]merge union-data failed !!!"); return } async.waterfall([ function(cb1) { mongodb.connect(dbs[0], (err1, db) =>{if (err1) {cb1 (`[ERROR] gen union-data [drop] failed for ${err1}`) return} var col = db.collection ('union') Db.collection ('union') .drop ((err2, R2) = > {db.close (); cb1 (err2);});}) }, function (cb1) {async.each (union_data, function (u_data, cb2) {mongodb.connect (dbs [0], (err1) Db) = > {if (err1) {cb2 (`[ERROR] gen union-data [insert] failed for ${err1}`) return} var col = db.collection ('union') Col.insertMany (u_data, (err2, R2) = > {db.close (); cb2 (err2);}) }, function (errN) {cb1 (errN);}) },], function (errX, r) {if (errX) {console.log ("[ERROR] gen union-data failed for", errX);} else {console.log ("4-update union-data ok!!");}}) });}
Those who are familiar with nodejs know that the advantage of nodejs is that it has no synchronous operation, so its performance is relatively high. Therefore, through code analysis, the first reaction is that too much data leads to the establishment of too many tcp connections, and mongodb is a processing model of one thread per user, which is very likely to cause tcp connections to reach the number of max open file or too many threads, resulting in the decline of the performance of the whole system, so that there is no response.
3. Try to recreate
Through code analysis, I simply wrote a function, hoping to reproduce it. Mongodb is built on a virtual machine, and the code is as follows:
Function doLoopInsertTest (mongourl:string) {for (var I = 0; I)
< maxInsertCount/500; i++){ mongodb.MongoClient.connect(mongourl, function(err, client) { if (err != null){ console.log("error:", err, "\n") return } console.log("Connected successfully to server"); const db = client.db("testdb"); db.collection("testfei").insertMany(getNewDoc(i*500), (err, result:mongodb.InsertWriteOpResult)=>{if (err! = null) {console.log ("write error:", err) return}})})}
It is simulated here that the user inserts code for every 500 merges, and on the self-built mongodb, the 1024 max fd limit is quickly exceeded. After the unlimit is modified, restart the mongodb process and test it again. Soon there is no response on the client side, but the error message is not exactly the same as that of the user. Several attempts have been made, including an error message of timeout.
4. User feedback
It seemed to be done, so the next day the relevant information was communicated with the user, and the user reported that there was not so much data in the error part, and a total of 100 pieces of data were inserted and provided the data of the two tables to be merged.
I looked at the test environment mongodbrestore, one represents 257 items, one represents 121 items, a total of 378 pieces of data! This can't go wrong at all, even if a connection is established for each insert.
Out of caution, I wrote a small code snippet and tested it in a local virtual machine environment, which was no problem at all. In case it is a real problem unique to the cloud server, I applied for a test mongodb, imported the data, and then tested it with nodejs code, and there was still no problem!
So, further communicate with users, is the version of mongodb driver code not up to date?
The feedback is that the mongodb driver code is not new enough, but it is the same code that they used to work together several times before, and it was successful. Users do not think there is a problem with the code, nor do they agree that it is the version of the library.
It is also not suitable for users to test directly with the code provided by me, because each test has to issue an announcement to stop the service, and after the failure of the service, part of the data has to be manually rolled back, which is too risky.
5. Dense willow trees and bright flowers
It seems to be in a dead end, we do not believe that there will be problems with more than 300 data inserts, users do not agree with our conclusion, and I even ask users if it is possible that the ip and the port are written wrong (in fact, the port in the error log is correct).
At this time, I suddenly remembered that it is impossible for United Services to merge only trade unions (the previous code is part of the merger of trade unions), there should be roles in front of these mergers. The user confirmed and sent a code screenshot:
And marked the error code is update_union_info.
Here we can clearly see the problem: the largest amount of data is update_user_info, and based on the characteristics of nodejs, update_user_info should also operate asynchronously, that is to say, when it comes to update_union_info, update_user_info must not be finished! Nodejs is single-threaded! Single thread! Single thread! Important things three times.
Therefore, if the amount of computation of update_user_info is very large, then, even if the network layer tcp connection is successful, it is very likely that it will not get the opportunity to run, and by the time cpu is released, it will probably have timed out.
6. Try to recreate it again
Through the previous analysis, a simple rough code is written to reproduce:
This code is rough, and setTimeout simulates the process of merging role data by the user, assuming that 50s is running.
Soon our timeout breakpoint hit:
After the analysis results and user feedback, the user agreed to change the code, but wanted to test it in advance.
7. Problem solving
It just so happens that our mongodb rollback feature will provide a temporary example. Isn't it amazing that the rollback process has no effect on the online?
The created temporary instance can either replace the online instance or not and become a temporary instance (saved for 2 days). 2 days is enough for the user to test. After the user modifies the code to be all serial, he rolls back more than 20 servers to test one, and the final test is successful!
Previous recommendation
Cloud MongoDB optimization increases LBS service performance tenfold
Open a New year gift bag
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.