In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/03 Report--
This article was first posted on the official account of Wechat, vivo Internet technology.
Link: https://mp.weixin.qq.com/s/-OcCDI4L5GR8vVXSYhXJ7w
Authors: Huang Weibing, Chen Jinxia
1. Tomcat container version 9.0.26 Deadlock problem 1.1 problem phenomenon 1.1.1 background of Deadlock
After 3 minutes of stress testing on an interface / get.do, the number of successful transactions TPS plummeted from 1W to 0.
1.1.2 A large number of CLOSE_WAIT appears on the Tomcat server
The number of TCP CLOSE_WAIT states in the server under stress test is about 200 million 2W.
1.2 preliminary positioning: start with thread stack information
Print Tomcat stack information through jstack and find "Found 1 deadlock"
Found one Java-level deadlock:== "http-nio-8080-exec-409": waiting to lock monitor 0x00007f064805aa78 (object 0x00000006c0ebf148, a java.util.HashSet), which is held by "http-nio-8080-ClientPoller"http-nio-8080-ClientPoller": waiting to lock monitor 0x00007f05e8061058 (object 0x00000007bfe40a70, a java.lang.Object), which is held by "http-nio-8080-exec-205"http-nio-8080-exec-205": waiting to lock monitor 0x00007f0614018448 (object 0x00000006c0e8e088, a java.util.HashSet) Which is held by "http-nio-8080-BlockPoller"http-nio-8080-BlockPoller": waiting to lock monitor 0x0000000001ed06e8 (object 0x00000007bfe110f8, a java.lang.Object), which is held by "http-nio-8080-exec-380"http-nio-8080-exec-380": waiting to lock monitor 0x00007f064805aa78 (object 0x00000006c0ebf148, a java.util.HashSet), which is held by "http-nio-8080-ClientPoller" 1.2.1 Quick repair Scheme
After internal discussion, it is concluded that there may be Bug in the current version of Tomcat. Without affecting the progress of the project, simply modify the scheme to downgrade Tomcat 9.0.26 used by SpringBoot to Tomcat 8. Pressure test again after downgrade, no problem was found. It is basically certain that Tomcat 9.0.26 should have a Deadlock problem.
1.3 issues further follow up 1.3.1 feedback to the Apache community
To confirm the problem, we tried to submit Bug feedback to Tomcat.
From the stack information, there is a deadlock between 3 types of threads and 5 threads due to the lack of locking order. The process of graphical locking is shown in the following figure.
1.4 cause analysis of the problem
Clear the deadlock process, but which link went wrong. This requires going deep into the source layer to locate the problem. First you need to download the OpenJDK source code, then the Tomcat 9.0.26 source code. Navigate to the appropriate code location according to the stack information. We come up with the following figure Tomcat 9.0.26 deadlock process description.
To have a better understanding of the above figure, you need to have some understanding of NIO. In Tomcat, NIO is mainly about understanding NIO Endpoint.
Poller is an encapsulation of Selector, while an execution thread named exec-xx is an encapsulation of Channel. Channel registers to Selector in NIO and then records the correspondence through SelectionKey. At this point, the protagonists are all on the stage.
The run method of Poller, as a background thread, has been polling (select) the prepared SelectionKey, and by the way, you need to unregister the SelectionKey in the cancelledKey. When processing, the execution thread EXEC-XX will first determine the state of the connection, such as failure, exception, and so on, it will call the close method of Channel to close the connection.
Channel's close actually just adds SelectionKey to cancelledKey. Both need to be locked first, but the locking order is inconsistent, resulting in a deadlock.
1.4.1 Communication with Tomcat developers
After submitting the Bug, he quickly got a reply from Remy Maucherat, first of all, he mentioned the deadlock inside the NIO. Then we mention that the deadlock within NIO is due to the concurrency of Poller.run and Poller.canceledKey.
Remy Maucherat was quickly fixed, mainly by moving the close in Poller.canceledKey to finally for execution, that is, letting Poller.run acquire the lock first.
After being fixed, we did a stress test again with the replaced code, and the deadlock problem did not occur. Remy Maucherat also mentions fixes for related issues in the latest OpenJDK, but only in jdk versions 11 and 14.
The details of the communication are shown in the picture below.
1.4.2 Verification of fixes on Github
Https://github.com/apache/tomcat/commit/9b1a8b67bffe462fc745b19e15ed59c37e2e1dcf
1.5 result verification
Use https://github.com/apache/tomcat/commit/9b1a8b67bffe462fc745b19e15ed59c37e2e1dcf to provide the repaired code, repackage the tomcat-embed-core.jar to replace the 9.X.XX stress test again, the TPS is stable at about 1.5W.
So far, the problem is basically clearly located and fixed. Remy Maucherat also replied to "The fix will be in Tomcat 9.0.31 +".
Currently, the latest version of Tomcat is Tomcat 9.0.30, and you still need to wait patiently for the update of version 31. The Tomcat 8 version is recommended.
II. Related links and references
Download OpenJdk source code
Tomcat source code
From Aliyunxi Community: Mtop triggers BUG troubleshooting and repair in Tomcat scenarios with high concurrency in case of network outage
In-depth interpretation of NIO Model in Tomcat
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.