Analysis of JVM memory overflow in production Environment 07/09 Update SLTechnology News&Howtos

Analysis of JVM memory overflow in production Environment

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article will explain in detail the analysis of JVM memory overflow in the production environment. The content of the article is of high quality, so the editor will share it with you for reference. I hope you will have some understanding of the relevant knowledge after reading this article.

If the business volume of our company is relatively large, and the phenomenon of JVM memory overflow often occurs in the production environment, how can we respond quickly, locate quickly, and recover the problem quickly?

Through an online environment JVM memory overflow case to introduce the processing ideas and analysis methods.

Case: the architecture team received feedback from a project team, and the Zabbix monitoring showed that JMX was not available and asked for assistance.

Analysis ideas:

JMX is not available, often due to problems such as long garbage collection pause time, memory overflow, and so on.

The principle of online fault analysis is to take measures to quickly restore the impact of the fault on the business, and then to collect information, analyze and locate the problem, and finally give the solution.

The specific analysis process is as follows.

How to restore business quickly

Usually, online failures will have a significant impact on the business and affect the user experience, so if the online server fails, you should avoid the impact on the business, but you cannot simply restart the server, because you need to keep the site as much as possible. lay the foundation for subsequent problem analysis.

So how can we quickly avoid the impact on the business and keep the site?

The common practice is to isolate the failed server.

Usually, the online server is deployed in a cluster, and a good distributed load scheme will automatically eliminate the faulty machines, thus achieving a highly available architecture, but if it is not eliminated, the operation and maintenance personnel will need to eliminate the faulty server. keep the site for analysis.

Memory leaks are usually caused by the code, which cannot be repaired immediately. It is easy to send a chain reaction to cause one application server to go down one after another, and the fault area will slowly expand. In view of this situation, you should quickly locate the cause of the memory leak and downgrade the service to avoid affecting other services. The simplest way to degrade is to direct this function to a separate cluster according to F5 (Nginx) forwarding policy, isolate it from other traffic, ensure that other businesses are not affected, and provide valuable buffer time for troubleshooting and resolution.

Analyze and solve problems

First of all, you can check the log to determine what kind of memory overflow and where heap memory overflow can occur: Java heap space (heap space), perm space (persistence generation).

Collect memory overflow Dump files

There are two ways to collect Dump files:

Set JVM startup parameters

-XX:+HeapDumpOnOutOfMemoryError

-XX:HeapDumpPath=/opt/jvmdump

Each time a memory overflow occurs, JVM automatically dumps the heap, and the dump file is stored in the path specified by-XX:HeapDumpPath.

Use the jmap command to collect

Through jmap-dump:live,format=b,file=/opt/jvm/dump.hprof pid.

Analyze Dump files

After obtaining the Dump file, you can use the tool MAT (MemoryAnalyzer) for analysis, which you can download through Baidu.

After opening the Dump file with MAT, the screenshot of the home page is as follows:

Tool button description:

Histogram view, which statistics all memory consumption in the heap, as shown in the figure

Memory uses a tree structure, with threads as dimensions and expanded in tree form, as shown in the figure:

Thread stack, whose screenshot is as follows:

According to the figure, it can be made clear that the total size of the heap is 1.9G, which is occupied by all four threads, causing other threads to no longer apply for resources and throwing a heap memory overflow error.

Next, my usual practice is to look directly at this view (using the thread as the basic dimension to find the objects that occupy memory in the thread) to provide the necessary basis for subsequent location troubleshooting.

The following key information points can be obtained from the screenshot above:

Org.apache.ibatis.executor.result.DefaultResultHandler holds a List internally, which is originally java.util.HashMap. From this class, we can basically see that it is related to the query of the database. The result returned by the database is decoded and organized into HashMap.

There are a total of 146033 elements in this List, and it can be preliminarily concluded that too much data was queried from the database in one query, resulting in a memory overflow.

Since HashMap is used to receive the returned fields in the database in the SQL query code, it is impossible to see which query at one time, so can we find out exactly which query, which line of code, or even which SQL statement?

The answer is yes, we can take a look at it from the view.

Warm reminder:

View tips: expand skills: expand along the most frequently used item layer by layer until you find a specific memory-consuming object.

Next we look at the view to find out which method and which SQL statement triggered it.

Specific method: first, fully expand a thread and look up from the bottom of the unfold diagram:

The entry of its thread (control layer code)

Continue to look up, and to find the Mybatis statement, you should find the class related to the SQL processing result set, as shown in the figure:

Then expand boundSql to find the SQL statement:

Then the mouse can be placed in the SQL property, right-click, you can copy the SQL statement.

As the company's code is confidential here, specific SQL statements are not posted here.

According to the following analysis, it turns out that when doing the export function, we do not use paging to query the data and write it to the Excel file by paging, but query all the data at once, resulting in that if the concurrency of the export function exceeds 4, all memory will be exhausted.

Solution:

First of all, the request is imported to a specified server at the operation and maintenance level, so that the export task is isolated from other tasks to avoid affecting other important services.

The project team fixes its code by paging the data and then assigning it to Excel.

On the production environment JVM memory overflow analysis is shared here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.