How to use Arthas's trace command to troubleshoot the long average response time of online services 04/27 Update SLTechnology News&Howtos

How to use Arthas's trace command to troubleshoot the long average response time of online services

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to use Arthas's trace command to troubleshoot the problem that the average response time of online services is too long." interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to use Arthas's trace command to troubleshoot the long average response time of online services.

Preface

Recently, the response time of an interface service in the online environment is too long, resulting in poor user experience, so how to quickly troubleshoot this kind of problem?

Add a detailed print log to the code; it is not recommended that you cannot easily redeploy the code that has changed the detailed log in the online environment, and add detailed log output, which means that a large number of log files will be generated. These log files will take up a lot of server disk space. Build a test environment that simulates an online environment for review and troubleshooting; well, there is not so much time for you to conduct an environmental review when this problem arises, so this solution is not recommended. Online diagnostic artifact Arthas, this tool is open source by Ali, specifically for online environmental problems troubleshooting, this tool provides a lot of commands to troubleshoot problems When the above response time is too long, you can use the trace command provided by Arthas to troubleshoot. The trace command using this tool can count all the performance overhead on the entire call link in the method and track the call link. You can find the more time-consuming methods and then troubleshoot them.

Next, this article will be carried out from two aspects:

Set up an environment that simulates the long response time of online service interface; SpringBoot service interface + JMeter simulation service interface invocation; use the command trace command provided by diagnostic artifact Arthas to troubleshoot the problem of long response time

Let's first simulate the online environment: build it using the SpringBoot project and write the service interface.

Note: the service interface code is only written in large loops to simulate long time-consuming code for simplicity; in addition, it actually contains many other common situations, such as:

There are a lot of JDBC operations in the service interface method, and because the amount of data in the database is too large, many JDBC queries are very time-consuming, and at this time, the query time may be longer because the appropriate index has not been created, resulting in a longer response time of the service interface. Other service interfaces are called in this service interface, which takes a long time to execute due to problems with other service interfaces invoked internally, which leads to a long response time of the service interface.

The service interface code is as follows:

The test1 and test2 methods are as follows:

Step 2: JMeter simulates the configuration of the test script called by the user:

The third step is the project location of the service interface SpringBoot code and JMeter test script:

Once the service interface code is ready, export it as a Jar package using IDEA development tools and other means.

In order to simulate the most real online environment, you need to put the prepared service interface Jar package into the server, and then use the command java-jar * .jar to run the Jar package; then use JMeter to call the interface, and find that the aPCge response time is too long in the aggregation report, as shown in the figure:

If a user reports that the response time of a function is too long, don't worry, check according to the following method to find the cause of the problem easily and quickly.

Arthas troubleshooting:

First, you need to download the Jar package of Ali's open source Arthas diagnostic tool: arthas-boot.jar;, and then put the Jar package on the server where the service interface project is deployed. Then use the ps command to query the process number of the program currently running the service interface; for example: the name of the service interface program Jar package simulated in this article is springboot_arthas-1.0.0.jar, so the command is: ps-ef | grep springboot_arthas-1.0.0. Then run the Arthas diagnostic tool with the command: java-jar arthas-boot.jar; the interface that starts running is shown in the figure:

At this time, the diagnostic tool is not finished, and you need to manually select the java process to diagnose / monitor, and this tool will also list all java process numbers. You only need to enter their first sequence number [1], as shown in the figure:

After running, you can use the trace command to monitor the time consuming of other methods called in the service interface method

The trace command can actively search the method call path corresponding to class-pattern/method-pattern, render and count all the performance overhead on the entire call link and track the call link.

Specific command format: trace [fully qualified class name] [method name in the class]

For example: monitoring this service interface; com.lyl.controller.TestController: fully qualified class name, method in process:TestController class

Examples of specific commands are as follows:

Trace com.lyl.controller.TestController process

The execution result of trace command is shown, as shown in the figure:

❝

The time-consuming execution of each method in the calling link is monitored by the trace command, and it can be found that the execution of the test2 () method in the called com.lyl.util.StringUtil class is relatively time-consuming, so you need to check whether there is a problem with the code of this method. If there are still a lot of method call links in this code, you need to use the trace command again to monitor the time-consuming of the calling link to find out the specific methods that may have problems.

❞

Arthas Ali's open source diagnostic tool also provides a lot of commands to use, you can check and learn. Bilibili search for "Amateur Grass" can also view my previously recorded video tutorial.

In addition, there are two things to pay attention to:

The program code diagnosed by Arthas should not be confused when packaging, otherwise the class or method will not be found when using the trace command; when using the trace command to monitor statistics, the JMeter test script is required to be running to invoke the service interface, and if there is no call, the time spent on the internal calling link cannot be counted. At this point, I believe you have a deeper understanding of "how to use Arthas's trace command to troubleshoot the problem of long average response time of online services". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.