Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

DeadLock problem in Tomcat 9.0.26 High concurrency scenario

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/03 Report--

This article was first posted on the official account of Wechat, vivo Internet technology.

Link: https://mp.weixin.qq.com/s/-OcCDI4L5GR8vVXSYhXJ7w

Authors: Huang Weibing, Chen Jinxia

1. Tomcat container version 9.0.26 Deadlock problem 1.1 problem phenomenon 1.1.1 background of Deadlock

After 3 minutes of stress testing on an interface / get.do, the number of successful transactions TPS plummeted from 1W to 0.

1.1.2 A large number of CLOSE_WAIT appears on the Tomcat server

The number of TCP CLOSE_WAIT states in the server under stress test is about 200 million 2W.

1.2 preliminary positioning: start with thread stack information

Print Tomcat stack information through jstack and find "Found 1 deadlock"

Found one Java-level deadlock:== "http-nio-8080-exec-409": waiting to lock monitor 0x00007f064805aa78 (object 0x00000006c0ebf148, a java.util.HashSet), which is held by "http-nio-8080-ClientPoller"http-nio-8080-ClientPoller": waiting to lock monitor 0x00007f05e8061058 (object 0x00000007bfe40a70, a java.lang.Object), which is held by "http-nio-8080-exec-205"http-nio-8080-exec-205": waiting to lock monitor 0x00007f0614018448 (object 0x00000006c0e8e088, a java.util.HashSet) Which is held by "http-nio-8080-BlockPoller"http-nio-8080-BlockPoller": waiting to lock monitor 0x0000000001ed06e8 (object 0x00000007bfe110f8, a java.lang.Object), which is held by "http-nio-8080-exec-380"http-nio-8080-exec-380": waiting to lock monitor 0x00007f064805aa78 (object 0x00000006c0ebf148, a java.util.HashSet), which is held by "http-nio-8080-ClientPoller" 1.2.1 Quick repair Scheme

After internal discussion, it is concluded that there may be Bug in the current version of Tomcat. Without affecting the progress of the project, simply modify the scheme to downgrade Tomcat 9.0.26 used by SpringBoot to Tomcat 8. Pressure test again after downgrade, no problem was found. It is basically certain that Tomcat 9.0.26 should have a Deadlock problem.

1.3 issues further follow up 1.3.1 feedback to the Apache community

To confirm the problem, we tried to submit Bug feedback to Tomcat.

From the stack information, there is a deadlock between 3 types of threads and 5 threads due to the lack of locking order. The process of graphical locking is shown in the following figure.

1.4 cause analysis of the problem

Clear the deadlock process, but which link went wrong. This requires going deep into the source layer to locate the problem. First you need to download the OpenJDK source code, then the Tomcat 9.0.26 source code. Navigate to the appropriate code location according to the stack information. We come up with the following figure Tomcat 9.0.26 deadlock process description.

To have a better understanding of the above figure, you need to have some understanding of NIO. In Tomcat, NIO is mainly about understanding NIO Endpoint.

Poller is an encapsulation of Selector, while an execution thread named exec-xx is an encapsulation of Channel. Channel registers to Selector in NIO and then records the correspondence through SelectionKey. At this point, the protagonists are all on the stage.

The run method of Poller, as a background thread, has been polling (select) the prepared SelectionKey, and by the way, you need to unregister the SelectionKey in the cancelledKey. When processing, the execution thread EXEC-XX will first determine the state of the connection, such as failure, exception, and so on, it will call the close method of Channel to close the connection.

Channel's close actually just adds SelectionKey to cancelledKey. Both need to be locked first, but the locking order is inconsistent, resulting in a deadlock.

1.4.1 Communication with Tomcat developers

After submitting the Bug, he quickly got a reply from Remy Maucherat, first of all, he mentioned the deadlock inside the NIO. Then we mention that the deadlock within NIO is due to the concurrency of Poller.run and Poller.canceledKey.

Remy Maucherat was quickly fixed, mainly by moving the close in Poller.canceledKey to finally for execution, that is, letting Poller.run acquire the lock first.

After being fixed, we did a stress test again with the replaced code, and the deadlock problem did not occur. Remy Maucherat also mentions fixes for related issues in the latest OpenJDK, but only in jdk versions 11 and 14.

The details of the communication are shown in the picture below.

1.4.2 Verification of fixes on Github

Https://github.com/apache/tomcat/commit/9b1a8b67bffe462fc745b19e15ed59c37e2e1dcf

1.5 result verification

Use https://github.com/apache/tomcat/commit/9b1a8b67bffe462fc745b19e15ed59c37e2e1dcf to provide the repaired code, repackage the tomcat-embed-core.jar to replace the 9.X.XX stress test again, the TPS is stable at about 1.5W.

So far, the problem is basically clearly located and fixed. Remy Maucherat also replied to "The fix will be in Tomcat 9.0.31 +".

Currently, the latest version of Tomcat is Tomcat 9.0.30, and you still need to wait patiently for the update of version 31. The Tomcat 8 version is recommended.

II. Related links and references

Download OpenJdk source code

Tomcat source code

From Aliyunxi Community: Mtop triggers BUG troubleshooting and repair in Tomcat scenarios with high concurrency in case of network outage

In-depth interpretation of NIO Model in Tomcat

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report