What is the troubleshooting and resolution of the JAVA process leading to the soaring CPU of Kubernetes nodes? 11/01 Update SLTechnology News&Howtos

What is the troubleshooting and resolution of the JAVA process leading to the soaring CPU of Kubernetes nodes?

2025-11-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article shows you how the JAVA process leads to the high CPU of Kubernetes nodes. The content is concise and easy to understand, which can definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

First, find the problem

After a system is online, we find that some nodes will have the problem of CPU soaring continuously after running for a long time. As a result, this node in the Kubernetes cluster will expel (schedule) the Pod. If the node is dispatched to the node with the same problem, there will also be a problem that Pod will not work all the time. We tried the method of manual scheduling after killing Pod (label), of course, we can also exclude the scheduling node. However, it will reappear after a period of time, and we have also checked the traffic during this period through the monitoring system, but it should not be related to the continuous occupation of CPU, and we realize that this may be a problem with the program.

2. Troubleshooting and positioning Pod

The kubectl top pods command is used here to determine which pods is most consumed by CPU.

Kubectl-n app top pods

Because the problem has been solved, the above picture is just an example.

Troubleshooting tool Arthas

We use Ali's Arthas here, which is Alibaba's open source Java diagnostic tool. Arthas can help you solve problems like the following when you are at a loss as to what to do:

From which jar package was this class loaded? Why are all kinds of related Exception reported?

Why didn't the code I changed be executed? Is it because I don't have commit? Is the branch wrong?

If you can't debug online if you encounter a problem, can you only republish it by adding a log?

Online encounter a user's data processing problems, but online also can not debug, offline can not be reproduced!

Is there a global perspective to view the health of the system?

Is there any way to monitor the real-time running status of JVM?

How to quickly locate the hot spots of the application and generate a flame map?

Troubleshoot the problem

After locating the problematic Pod, use kubectl exec to enter the Pod container:

Kubectl-n app exec-it 49a89b2f-73c6-40ac-b6de-c6d0e47ace64-5d489d9c48qwc7t-/ bin/bash

Download Arthas in the container

Wget https://arthas.gitee.io/arthas-boot.jar

Since there is only one service in our packaged image, there is only one process in a Pod; here 1 refers to PID.

Java-jar arthas-boot.jar 1

Execute process Kanban dashboard:

[arthas@1] $dashboard

The top half shows the thread content, and we can see which thread ID corresponds to:

For example, if you get thread ID from above, use the following command to enter the thread, such as ID 12262:

[arthas@1] $thread-n 12262

Print out the thread log:

[arthas@1] $thread-n 12262

"com.alibaba.nacos.client.Worker.addr-bj-internal.edas.aliyun.com-7362814c-538b-4c26-aa07-1fd47765a145" Id=20190 cpuUsage=7% TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@d30d0a4e (in native)

At sun.misc.Unsafe.park (Native Method)

-waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@d30d0a4e

At java.util.concurrent.locks.LockSupport.parkNanos (LockSupport.java:215)

At java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos (AbstractQueuedSynchronizer.java:2078)

At java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take (ScheduledThreadPoolExecutor.java:1093)

At java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take (ScheduledThreadPoolExecutor.java:809)

At java.util.concurrent.ThreadPoolExecutor.getTask (ThreadPoolExecutor.java:1074)

At java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1134)

At java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624)

At java.lang.Thread.run (Thread.java:813)

Third, solve the problem

After troubleshooting and locating the problem, and finally with the help of the community and Aliyun partners, it was found that this is a BUG of Nacos 2.0.0.RELEASE. We upgraded the Nacos client version, and after testing, the problem was solved. It also deepens the ability to debug Kubernetes clusters [come on].

Com.alibaba.cloud

Spring-cloud-starter-alibaba-nacos-discovery

2.0.1.RELEASE

The above is how the JAVA process leads to the high CPU of Kubernetes nodes. Have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.