Analysis of memory overflow in Disruptor 07/16 Update SLTechnology News&Howtos

Analysis of memory overflow in Disruptor

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Disruptor memory overflow example analysis, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.

Preface

I believe many friends have encountered OutOfMemoryError problems, which are difficult to locate and solve compared to common business exceptions (array out of bounds, null pointers, etc.).

The following is the way to locate and solve the problem of an online memory overflow encountered recently.

Mainly from the performance-- > investigation-- > positioning-- > solve the four steps to analyze and solve the problem.

Appearance

Recently, one of our production applications is constantly leaking memory spills, and the frequency is getting higher and higher with the growth of business.

The business logic of the program is very simple, which is to consume data from Kafka and do persistence operations in batches.

The phenomenon is that the more Kafka messages, the faster the frequency of exceptions. Because there was other work at that time, we had to ask the operation and maintenance staff to restart and monitor the heap memory and GC.

Although it is good to restart the Dafa, it still cannot fundamentally solve the problem.

Investigation

So we want to try to determine where the problem is based on the memory data collected by the operation and maintenance staff and the GC log.

It turns out that in the old days, memory usage, even when GC occurred, remained high, and it became higher and higher over time.

Combined with the log of jstat, it is found that even if it happens in the old days of FGC, it can no longer be reclaimed, and the memory has reached its peak.

There are even a few FGC apps that have reached hundreds of times, and the time is frighteningly high.

This shows that there must be a problem with the memory usage of the application, and there are many cheating objects that cannot be recycled all the time.

Positioning

Due to the production of memory dump file is very large, up to dozens of gigabytes. It is also because our memory setting is too large.

So it takes a lot of time to use MAT analysis.

Therefore, we wonder whether we can reproduce it locally, so that we can have a much better position.

In order to reproduce the problem as soon as possible, I set the maximum heap memory of the local application to 150m.

Then at the consumer Kafka, Mock generates data for a while loop all the time.

At the same time, when the application starts, it uses VisualVM to connect to the application to monitor the use of memory and GC in real time.

As a result, there was no problem with memory use after running for more than 10 minutes. As you can see from the figure, GC memory can be effectively reclaimed every time it is generated, so there is no recurrence problem.

It's hard to locate the problem if you can't reproduce it. So we review the code and find that the logic of production is not quite the same as that of cycling Mock data with while.

Looking at the production log, it is found that hundreds of pieces of data are pulled from the Kafka each time, while we can only generate one piece of data at a time when we Mock.

In order to simulate the production situation as much as possible, there is a producer program running on the server, constantly sending data to the Kafka.

Sure enough, I couldn't stand it after only running for more than a minute. Looking at the picture on the left, we found that the frequency of GC is very high, but the recovery of memory is dwarfed by comparison.

At the same time, the background also starts to print memory overflow, which repeats the problem.

Solve

Judging from the current performance, there are many objects in memory that have been strongly referenced so that they cannot be recycled.

So I want to see what objects take up so much memory, and we can immediately dump the memory of the current application by using the HeapDump function of VisualVM.

It turns out that objects of type com.lmax.disruptor.RingBuffer take up nearly 50% of the memory.

When you see this packet, you naturally think of the Disruptor ring queue.

Once again, the review code found that the 700pieces of data taken from Kafka were thrown directly into Disruptor.

This can also explain why the first simulation data did not reproduce the problem.

In the simulation, an object is put into the queue, while in the case of production, 700 pieces of data are put into the queue. The amount of data is 700 times the gap.

Disruptor, as a circular queue, always exists until the object is overwritten.

I also did an experiment to prove that this is true.

I set the queue size to 8 and write 10 pieces of data from 0 to 9. When I write to 8, I will overwrite the position of the previous 0, and so on (similar to HashMap's module positioning).

So in production, assuming that our queue size is 1024, then as the system runs, it will eventually result in 1024 locations full of objects, and each location is 700!

So I looked at the RingBuffer configuration of Disruptor on production, and the result was: 1024-1024.

This order of magnitude is very scary.

To verify if this is the problem, I will locally change the value to 2, a minimum value to try.

The same 128m memory is also used to extract data continuously through Kafka. Through monitoring as follows:

After running for more than 20 minutes, the system was fine, and every time GC recovered most of the memory, it ended up in a jagged shape.

In this way, the problem has been found, but the specific setting of this value in production can only be known according to the business situation, but the original 1024024 can no longer be used.

Although in the end also changed a line of code (not changed, directly modify the configuration), but this troubleshooting process I think is meaningful.

It will also make most of the students who find it difficult to use the black box like JVM have an intuitive feeling.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.