Troubleshooting and Analysis of Java memory leaks 10/21 Update SLTechnology News&Howtos

Troubleshooting and Analysis of Java memory leaks

2025-10-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "Java memory leak instance troubleshooting analysis". In daily operation, I believe many people have doubts about Java memory leak instance troubleshooting analysis. The editor consulted all kinds of data and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "Java memory leak instance troubleshooting analysis". Next, please follow the editor to study!

Problems arise

Starting after seven o'clock in the evening, I began to receive alarm emails constantly, which showed that several interfaces of the probe had timed out.

Most execution stacks are:

Java.io.BufferedReader.readLine (BufferedReader.java:371) java.io.BufferedReader.readLine (BufferReader.java:389) java_io_BufferedReader$readLine.call (Unknown Source) com.domain.detect.http.HttpClient.getResponse (HttpClient.groovy:122) com.domain.detect.http.HttpClient.this$2$ getResponse (HttpClient.groovy)

I've seen a lot of error reports on this thread stack. The HTTP DNS timeout we set is 1s, which is 2s, and the read time is 3s.

This kind of error report is that the detection service sends the HTTP request normally, and the server responds normally after receiving the request, but the packet is lost in the network layer-by-layer forwarding, so the execution stack of the request thread stays at the place where the interface response is obtained.

A typical feature of this situation is that the corresponding log records can be found on the server. And the log shows that the server response is completely normal.

In contrast, the thread stack stays at the Socket connect, which fails when the connection is established, and the server is completely unaware of it.

I noticed that one of the interfaces reported errors more frequently, this interface needs to upload a 4m file to the server, then go through a series of business logic processing, and then return 2m text data.

While other interfaces are simple business logic, I guess there may be too much data to be uploaded and downloaded, so timeouts are more likely to lead to packet loss.

According to this conjecture, the group logs on to the server and uses the requested request_id to search the recent service log. Sure enough, it is the interface timeout caused by network packet loss.

Of course, leader will not be satisfied with this, and someone has to take over the conclusion. So they quickly contacted the operation and peacekeeping network team to confirm the status of the network at that time.

The students in the network group replied that the switch in the computer room where our detection service is located is old, there is an unknown forwarding bottleneck, and is being optimized, which makes me feel more at ease, so I simply explain it in the department group, which is regarded as the completion of the task.

The problem broke out

I thought there was such a small wave on duty this time, but at more than eight o'clock in the evening, alarm emails from various interfaces swarmed in, and I was so prepared to pack up my things that I was caught off guard on Sunday.

This time, almost all the interfaces are in timeout, while our large number of network Icano interfaces are bound to time out every detection. Is it that the whole computer room is malfunctioning?

Once again, I see that the indicators of each interface are normal through the server and monitoring. I tested the next interface and completely OK it. Since it does not affect the online service, I intend to stop the probe task through the API of the probe service and then slowly troubleshoot it.

As a result, I knew it wasn't that simple when I sent a request to the interface that paused the probe task for a long time and didn't respond.

Solve the problem

Memory leak

So I quickly logged in to the probe server, first of all, the third company of top free df, and I really found some anomalies:

Our probe process CPU occupancy rate is particularly high, reaching 900%.

Our Java process does not do a lot of CPU operations. Normally, CPU should be between 100% and 200%. When CPU soars, it either goes to an endless loop or is doing a lot of GC.

Use the jstat-gc pid [interval] command to check the GC status of the Java process, and sure enough, the FULL GC reaches once per second:

With so many FULL GC, it must be that the memory leak didn't run away, so use jstack pid > jstack.log to save the thread stack site.

The heap site was saved using jmap-dump:format=b,file=heap.log pid, then the probe service was restarted, and the alarm email finally stopped.

Jstat

Jstat is a very powerful JVM monitoring tool, and its general usage is:

Jstat [- options] pid interval

The view items it supports are:

Class looks at the class loading information.

Compile compilation statistics.

Gc garbage collection information.

GcXXX details of the GC for each region, such as-gcold.

Using it is very helpful in locating memory problems in JVM.

Troubleshoot the problem

Although the problem has been solved, it is still necessary to find out the root cause in order to prevent it from happening again.

Analysis stack

The stack analysis is simple to see if there are too many threads and what most stacks are doing:

> grep 'java.lang.Thread.State' jstack.log | wc-l > 464

There are only more than 400 threads, and there is no exception:

> grep-A 1 'java.lang.Thread.State' jstack.log | grep-v' java.lang.Thread.State' | sort | uniq-c | sort-n 10 at java.lang.Class.forName0 (Native Method) 10 at java.lang.Object.wait (Native Method) 16 at java.lang.ClassLoader.loadClass (ClassLoader.java:404) 44 at sun.nio.ch.EPollArrayWrapper.epollWait (Native Method) 344 At sun.misc.Unsafe.park (Native Method)

There seems to be no exception in the thread state, and then analyze the heap file.

Download heap dump files

The heap files are all binary data, so it is very troublesome to view them on the command line. The tools provided by Java are all visual, but you can't view them on the Linux server, so you have to download the files locally first.

Because we set the heap memory to 4G, the heap file from dump is also very large, so it is really troublesome to download it, but we can compress it first.

Gzip is a very powerful compression command, especially we can set-1 compression 9 to specify its compression level.

The larger the data, the greater the compression ratio, the longer the time consuming. It is too slow to recommend using-6room7, and the benefit is not great. With this compression time, the extra files can be downloaded.

Using MAT to analyze jvm heap

MAT is a powerful tool for analyzing Java heap memory. Use it to open our heap file (change the file suffix to .hprof), and it will prompt us to analyze the types.

For this analysis, choose memory leak suspect decisively:

As can be seen from the pie chart above, most of the heap memory is occupied by the same memory. If you look at the details of the heap memory and trace it back to the upper level, you will soon find the culprit.

Analysis code

Now that you have found the memory leak object, search the project for the object name globally, which is a Bean object, and then navigate to one of its properties of type Map.

This Map stores the results of each probe interface response in ArrayList according to the type, and each time the probe is completed, it is stuffed into the ArrayList for analysis.

Since the Bean object will not be recycled and there is no cleanup logic for this property, the Map becomes larger and larger until the memory is full after more than 10 days of service without restart.

After the memory is full, the memory can no longer be allocated to the HTTP response result, so it is stuck with readLine all the time. On the other hand, our interface with a large number of Iamp O has a lot of alarms, which is probably related to the fact that the response is too large and requires more memory.

PR is given to code owner, and the problem is solved satisfactorily.

At this point, the study on "Java memory leak instance troubleshooting analysis" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.