Using Arthas to accurately locate the problem of excessive CPU load in Java applications 07/02 Update SLTechnology News&Howtos

Using Arthas to accurately locate the problem of excessive CPU load in Java applications

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

The official community of Arthas is holding an essay solicitation activity. If you participate, you will get the prize and click to contribute.

Author | Zhang Yunxiang

Recently, we have an application server online. CPU can always run to 99%. I wonder why it doesn't have much traffic, so why can I make myself so tired? So I logged on to this server to see what it was doing!

If you encounter similar problems in the past, you may consider using the top-Hp plus jstack command to troubleshoot. Although you can roughly locate the scope of the problem, there is still too little effective information, and most of the time you still have to guess. Today, I recommend a more efficient and accurate tool: Arthas! Arthas is Alibaba's open source Java diagnostic tool that can help us quickly locate online problems. For basic installation and use, please refer to the official document: https://alibaba.github.io/arthas

This time we use it to troubleshoot the problem of high CPU load. CPU overload is usually caused by a problem with one or more threads, so we try the first command: thread, which displays information about all threads and prioritizes threads with high CPU usage.

[arthas@384] $threadThreads Total: 112, NEW: 0, RUNNABLE: 26, BLOCKED: 0, WAITING: 31, TIMED_WAITING: 55, TERMINATED: 0ID NAME STATE% CPU TIME108 h..ec-0 RUNNABLE 51 4011 h..ec-2 RUNNABLE 48 100 h..ec-2 RUNNABLE 48 4011

Some unimportant information has been deleted in order to facilitate reading.

You can see that CPU resources are almost full by the first two threads and have been executed for more than 4000 minutes, and our server has only been started for two days, so they have not been idle for a moment these two days! What on earth are they doing? We can look at the thread stack with the command: thread id.

[arthas@384] $thread 108 "http-nio-7001-exec-10" Id=108 cpuUsage=51% RUNNABLE at c.g.c.c.HashBiMap.seekByKey (HashBiMap.java) at c.g.c.c.HashBiMap.put (HashBiMap.java:270) at c.g.c.c.HashBiMap.forcePut (HashBiMap.java:263) at c.y.r.j.o.OaInfoManager.syncUserCache (OaInfoManager.java:159)

You can also use the thread-n 3 command to print out the first three threads with the highest percentage of CPU, which is about the effect of > top-Hp > & > printf > & > jstack >

As you can see, this thread has been executing the HashBiMap.seekByKey method (the thread id can be executed several times to ensure that the method executed by the thread is not changing all the time). There are generally two reasons for this problem:

The seekByKey method is called in a loop. There is an endless loop inside the seekByKey.

To see if it is the first, let's use the tt command to listen for calls to this method:

Tt-t com.google.common.collect.HashBiMap seekByKey-n 100

Note: when executing this command online, be sure to add the-n parameter, otherwise the huge online traffic may instantly burst your JVM memory execution results show that the seekByKey method has not been called, the probability is that there is an endless loop inside the seekByKey method. Looking at the logic inside this method, we can decompile it with the jad com.google.common.collect.HashBiMap seekByKey command, which has the advantage of being high-end, but I'm going to go straight to the source code, and maybe even comments. The source code is as follows

Private BiEntry seekByKey (@ Nullable Object key, int keyHash) {for (BiEntry entry = hashTableKToV [keyHash & mask]; entry! = null; entry = entry.nextInKToVBucket) {if (keyHash = = entry.keyHash & & Objects.equal (key, entry.key)) {return entry;}} return null;}

Then there are no comments, but fortunately this method is relatively simple in logic and easy to understand.

Find the bucket through hash, and each bucket is a linked list; iterate through the linked list to find the entry corresponding to the key. Note here that the next node of entry is nextInKToVBucket, which will be used later.

There is a dead loop, and we guess it may be because the linked list has a loop. So is there any way to verify this conjecture? The answer is yes! So how to verify it? First we need to get the HashBiMap object so that we can query the data in the object. There are many ways to get this object, such as listening to a method of the object and then actively triggering the method. Here is a more general method, which is very useful in SpringMVC programs. Because we are SpringMVC applications, all requests will be intercepted by RequestMappingHandlerAdapter. We use the tt command to monitor the execution of invokeHandlerMethod, and then click on the page to get the following content:

[arthas@384] $tt-t org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter invokeHandlerMethod-n 10Press Q or Ctrl+C to abort.Affect (class-cnt:1, method-cnt:1) cost in 622 ms. INDEX COST (ms) OBJECT CLASS METHOD---- 1000 481.203383 0x481eb705 RequestMappingHandlerAdapter invokeHandlerMethod 1001 3.432024 0x481eb705 RequestMappingHandlerAdapter invokeHandlerMethod...

The tt command records all input parameters and return values when the method is called, the exception thrown, the object itself, and so on. The INDEX field represents a call, and many subsequent tt commands specify the record operation based on this number.

We can view the details of this record through the-I parameter followed by the corresponding INDEX number. Then, with the-w parameter, specify an OGNL expression to find the relevant objects:

[arthas@384] $tt-I 1000-w 'target.getApplicationContext ()'@ AnnotationConfigServletWebServerApplicationContext [reader=@AnnotatedBeanDefinitionReader [org.springframework.context.annotation.AnnotatedBeanDefinitionReader@50294e97], scanner=@ClassPathBeanDefinitionScanner [org.springframework.context.annotation.ClassPathBeanDefinitionScanner@5eeeaae2], annotatedClasses=@LinkedHashSet [isEmpty=true;size=0], basePackages=null

OGNL usage document: https://commons.apache.org/proper/commons-ognl/language-guide.html

Arthas will put the currently executed object in the target variable, get the SpringContext object through target.getApplicationContext (), and then we can do whatever we want!

Next we need to write a function in OGNL to implement the loop detection of the linked list. It is not easy to write a section of loop detection code in OGNL. Here I use a tricky pseudo implementation.

# loopCnt=0,#foundCycle=: [# this = = null? False: # loopCnt > 50? True: (# loopCnt = # loopCnt + 1, # foundCycle (# this.nextInKToVBucket))]

Because I know that a bucket is unlikely to have more than 50 nodes, I determine whether there are loops by traversing more than 50 times.

Complete command:

Tt-I 1000-w 'target.getApplicationContext (). GetBean ("oaInfoManager"). UserCache.entrySet (). {delegate}. {^ # loopCnt = 0 grammar cycle =: [# this = = null? false: # loopCnt > 50? true: (# loopCnt = # loopCnt + 1, # foundCycle (# this.nextInKToVBucket)], # foundCycle (# this)} .get (0)'-x 2

Command parsing:

Get the HashBiMap object: target.getApplicationContext (). GetBean ("oaInfoManager"). UserCache iterates through all the entry and takes out the first entry-x parameter with a loop to specify the deployment level. We need to set this parameter larger than the loop to ensure that the loop can be found. Here our loop is very small, so it is set to 2.

The implementation results are as follows:

@ BiEntry [key=@String [Zhang San], value=@Long [1111], nextInKToVBucket=@BiEntry [key=@String [Li Si], value=@Long [2222], nextInKToVBucket=@BiEntry [Zhang San = 1111]

We can see that there is a ring road like Zhang San-> Li Si-> Zhang San. At this point, the cause of the endless cycle has been determined. Combining two threads starting almost at the same time and executing the HashBiMap.forcePut method at the same time, it is easy to think that concurrency leads to data inconsistency, which can also be verified, but due to the limited space, I will not repeat it here. If you find a problem, you will be 99% successful. The solution to this problem is very simple: add a synchronized keyword to the syncUserCache method!

Conclusion

The problem encountered this time is not complicated and can be solved with the jstack command. But we hope that through such a case, we can show you some powerful features of Arthas to help you open your mind and have more handy tools when you encounter more complex scenes in the future!

Arthas essay solicitation activity is in full swing.

Arthas officials are holding essay solicitation activities, if you have:

Use the problems checked by Arthas to interpret the source code of Arthas and make suggestions to Arthas.

No restrictions, other Arthas-related content

Welcome to participate in the essay solicitation activities, and there are prizes to get oh ~ click to contribute

"Alibaba Cloud Native focus on micro services, Serverless, containers, Service Mesh and other technology areas, focus on cloud native popular technology trends, cloud native large-scale landing practice, to be the official account of cloud native developers."

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.