Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to solve the memory leak problem with tomcat and Netty

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly explains "tomcat and Netty how to solve the memory leak problem", the content of the article is simple and clear, easy to learn and understand, now please follow the editor's train of thought slowly in depth, together to study and learn "tomcat and Netty how to solve the memory leak problem" bar!

The troubleshooting process is as follows:

The first step is to get a journal.

The abnormal log given by the division looks like this (in view of the fact that the company forbids taking screenshots, forbids taking pictures and forbids sharing any information, here is a similar error report I found online):

LEAK: ByteBuf.release () was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information. Recent access records: # 1: io.netty.handler.codec.ByteToMessageDecoder.channelRead (ByteToMessageDecoder.java:273) io.netty.channel.CombinedChannelDuplexHandler.channelRead (CombinedChannelDuplexHandler.java:253) io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead (AbstractChannelHandlerContext.java:362) io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead (AbstractChannelHandlerContext.java:348) io.netty.channel.AbstractChannelHandlerContext.fireChannelRead (AbstractChannelHandlerContext.java:340) io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead (DefaultChannelPipeline.java:1434) io. Netty.channel.AbstractChannelHandlerContext.invokeChannelRead (AbstractChannelHandlerContext.java:362) io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead (AbstractChannelHandlerContext.java:348) io.netty.channel.DefaultChannelPipeline.fireChannelRead (DefaultChannelPipeline.java:965) io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read (AbstractNioByteChannel.java:163) io.netty.channel.nio.NioEventLoop.processSelectedKey (NioEventLoop.java:646) io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized (NioEventLoop.java:581) io.netty.channel.nio. NioEventLoop.processSelectedKeys (NioEventLoop.java:498) io.netty.channel.nio.NioEventLoop.run (NioEventLoop.java:460) io.netty.util.concurrent.SingleThreadEventExecutor$5.run (SingleThreadEventExecutor.java:884) java.lang.Thread.run (Thread.java:748)

At a glance, the ByteBuf was not released, resulting in a memory leak.

Step two, look at memory metrics.

Now that you know it is a memory leak, let the OPS take a look at the memory usage, especially the memory usage outside the heap (because of the use of Netty). According to the feedback from the OPS, the in-heap memory is used normally and the out-of-heap memory remains high.

OK, so far you can clearly assert that out-of-heap memory leaks.

At this point, there are two steps, one step is to replace gateway with zuul pressure test observation, and the other is to troubleshoot memory leaks.

The third step is to code.

Asked the person in charge of this project to give me the code, I opened it, dumbfounded, just a simple Spring Cloud Gateway project, which also contains two classes, one is used by AuthFilter for permission verification, and the other is used by XssFilter to prevent attacks.

The fourth step is preliminary doubt.

Quickly scan the code of each class, and you can see the code related to ByteBuf in XssFilter. However, there is no obvious message that ByteBuf has not been released. It is very simple to mask the class first to see if there are any memory leaks.

But how do you detect memory leaks? We can't delete this category and run in production.

The fifth step, parameter and monitoring transformation.

In fact, very simple, students who have seen the Netty source code should be relatively clear that Netty defaults to pooled direct memory implementation of ByteBuf, that is, PooledDirectByteBuf, so, in order to debug, first of all, turn off the pooling function.

Direct memory, or out-of-heap memory.

Why turn off pooling?

Because pooling is a cache of memory, it allocates 16m of memory at a time and will not be released immediately, so it is inconvenient to observe after pooling is enabled, unless it is debugged slowly.

So, how do you turn off pooling?

In Netty, all ByteBuf are created by a name called ByteBufAllocator. There is a default allocator in the interface ByteBufAllocator. Find the default allocator, find where it was created, and you can see the relevant code.

Public interface ByteBufAllocator {ByteBufAllocator DEFAULT = ByteBufUtil.DEFAULT_ALLOCATOR;} public final class ByteBufUtil {static final ByteBufAllocator DEFAULT_ALLOCATOR; static {/ / this article comes from the source code String allocType = SystemPropertyUtil.get ("io.netty.allocator.type", PlatformDependent.isAndroid ()? "unpooled": "pooled"); allocType = allocType.toLowerCase (Locale.US). Trim (); ByteBufAllocator alloc; if ("unpooled" .equals (allocType)) {alloc = UnpooledByteBufAllocator.DEFAULT; logger.debug ("- Dio.netty.allocator.type: {}", allocType);} else if ("pooled" .equals (allocType)) {alloc = PooledByteBufAllocator.DEFAULT Logger.debug ("- Dio.netty.allocator.type: {}", allocType);} else {alloc = PooledByteBufAllocator.DEFAULT; logger.debug ("- Dio.netty.allocator.type: pooled (unknown: {})", allocType);} DEFAULT_ALLOCATOR = alloc;}}

As you can see, it is controlled by the parameter io.netty.allocator.type.

OK, add this parameter to the JVM startup parameter and assign it to unpooled.

After pooling is turned off, what should I do to be able to observe whether there is a real memory leak in real time?

In fact, this is also very simple, Netty's PlatformDependent class will count all direct memory usage.

Recently, I have been studying the source code of Netty, so I have a clear understanding of the various details of Netty. This article comes from reading the source code of Tongge. I am still preparing it recently, and when I am done later, I will begin to write a Netty column.

So, all we have to do is write a timer and print out the statistics on a regular basis. Here, I'll just give you the code:

@ Component public class Metrics {@ PostConstruct public void init () {ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor (); scheduledExecutorService.scheduleAtFixedRate (()-> {System.out.println ("used direct memory:" + PlatformDependent.usedDirectMemory ());}, 1,1, TimeUnit.SECONDS);}}

Just throw it into the same level or lower directory as the startup class.

At this point, the pooling and monitoring are done, and the following is debugging.

Step 6, preliminary debugging

Run the startup class directly and observe the log.

Used direct memory: 0 used direct memory: 0

At first, direct memory is normal, always 0.

Send a random request, reported 404, and observe that the direct memory has not changed, or 0, indicating that it is not enough to simulate a request, which is directly blocked by spring and has not yet reached Netty.

Step 7, modify the configuration

Any request will not work, so it can only simulate normal request forwarding. I quickly started a SpringBoot project and defined a request in it to modify the configuration of gateway so that it can be forwarded:

Spring: cloud: gateway: routes:-id: test uri: http://localhost:8899/test predicates:-Path=/test

Step 8, debug again

After modifying the configuration, start two projects, one gateway and one springboot, and send requests to observe the use of direct memory:

Used direct memory: 0 used direct memory: 1031 used direct memory: 1031 used direct memory: 1031

Sure enough, the memory was not released.

Step 9, delete XssFilter

To verify the previously suspected XssFilter, delete it, start the project again, send a request, and observe the use of direct memory.

Used direct memory: 0 used direct memory: 1031 used direct memory: 1031 used direct memory: 1031

The problem still exists, and it is the same size as the previous leak.

Here's the thing. Netty determines the size of each memory allocation by guess. The initial value of this guess is 1024.

@ Override public ByteBuf allocate (ByteBufAllocator alloc) {return alloc.ioBuffer (guess ());}

Did you not expect that there was such a lovely side of Netty? ahem, I strayed from the subject and forced it back!

Then, there is a 7B that stores line feeds, carriage returns and so on. This 7B will not be released, which adds up to 1031.

Private static final byte [] ZERO_CRLF_CRLF = {'0mm, CR, LF, CR, LF}; / / 2B private static final ByteBuf CRLF_BUF = unreleasableBuffer (directBuffer (2) .writeByte (CR) .writeByte (LF)); / / 5B private static final ByteBuf ZERO_CRLF_CRLF_BUF = unreleasableBuffer (directBuffer (ZERO_CRLF_CRLF.length) .writeBytes (ZERO_CRLF_CRLF))

Well, it's kind of interesting, since it's not XssFilter's problem, is it AuthFilter's problem?

Step 10, kill AuthFilter.

Just do it, kill AuthFiler, restart the project, send requests, and observe direct memory:

Used direct memory: 0 used direct memory: 1031 used direct memory: 1031 used direct memory: 1031

The problem still exists, or the familiar memory size.

At this time, my thinking has not been smooth, the following is the wrong road.

Step 11, think.

After removing XssFilter and AuthFilter, there is only one startup class left and, of course, a new monitoring class.

Could it be that there is something wrong with Spring Cloud Gateway itself? Hey, I seem to have discovered a new world. If I find something wrong with Spring Cloud Gateway, I can brag about it later (inner YY).

Since the memory allocation has not been released, let's find the place where the memory allocation is allocated and make a breakpoint.

From the previous analysis, we already know that the memory allocator used is UnpooledByteBufAllocator, so make a breakpoint in its newDirectBuffer () method, because we have a direct memory leak here.

Step 12, debug step by step

Following the idea in step 11, make a breakpoint in UnpooledByteBufAllocator's newDirectBuffer () method, debug it step by step, and finally come to this method:

/ / io.netty.handler.codec.ByteToMessageDecoder.channelRead @ Override public void channelRead (ChannelHandlerContext ctx, Object msg) throws Exception {if (msg instanceof ByteBuf) {CodecOutputList out = CodecOutputList.newInstance (); try {first = cumulation = = null; / / 1. What is returned is msg itself, and msg is a ByteBuf cumulation = cumulator.cumulate (ctx.alloc (), first? Unpooled.EMPTY_BUFFER: cumulation, (ByteBuf) msg) / / 2. Decoding, this article comes from the source code callDecode (ctx, cumulation, out);} catch (DecoderException e) {throw e;} catch (Exception e) {throw new DecoderException (e);} finally {if (cumulation! = null & &! cumulation.isReadable ()) {numReads = 0 / / 3. Free memory cumulation.release (); cumulation = null;} else if (+ + numReads > = discardAfterReads) {/ / We did enough reads already try to discard some bytes so we not risk to see an OOME. / / See https://github.com/netty/netty/issues/4275 numReads = 0; discardSomeReadBytes ();} int size = out.size (); firedChannelRead | = out.insertSinceRecycled (); / / 4. After reading the remaining values in out, fireChannelRead (ctx, out, size); / / 5. Recycle out out.recycle ();}} else {ctx.fireChannelRead (msg);}}

It took several hours, especially when ChannelPipeLine jumped over accidentally and had to start all over again. It really could only be done step by step.

This method is mainly used to convert ByteBuf into Message,Message, which is a message, which can be understood as a simple Java object, and the main logic is indicated in the above code.

As you can see, there is a cumulation.release (); that is where the memory is released, but it is not released. Before calling this line of code, the reference count of msg (= cumulation) is 4, and after it is released, it is 2, so there is still a count that cannot be recycled.

After taking the following four or five steps, the out has been recycled, but the msg has not been recycled. This must be the problem.

I have been struggling here, including the code in decode over and over again. The transformed object in the unreleased msg is DefaultHttpContent, which represents the body of the Http request, but here is the body of the return value of the Http request.

This is also a point that puzzles me. I tried. The body requested by Http does not seem to have come to this logic. I have been looking for the Body requested by Http again and again for a long time, but there has been no progress.

By 9 o'clock in the evening, the office was empty and the lights were off (during the epidemic, only a few people could go to each department every day), and I packed up and went home.

Step 13, take a taxi home

When I was in the car, I kept thinking about this question and recalling the whole process. Could it be that I was going in the wrong direction?

Spring Cloud Gateway has been out for a long time, and I haven't heard of any memory leaks. At this point, I began to doubt myself.

No, I have to write a project by myself when I get home and try to run it with Spring Cloud Gateway.

Step 14, write a project that uses Spring Cloud Gateway

When I got home, I quickly turned on the computer, wrote a project using Spring Cloud Gateway and a project with SpringBoot, turned on the monitoring, removed the pooling function, started the project, sent requests, and observed the direct memory.

Used direct memory: 0 used direct memory: 0

Nani, Assiba, so far, it's clear that it's not Spring Cloud Gateway's problem, so what's the problem?

Must be the use of the wrong posture, but the company that project, there is nothing else, I deleted all the classes, leaving only the startup class.

Ah, no, pom file.

Turn on the jump machine, log in to the company computer, check pom.xml, and find that it is full of references to SpringBoot or SpringCloud itself.

Well, no, there is a common package, the branch wrote its own common package, click into it, which quoted three jar packages, of which, one is particularly eye-catching, tomcatchers paid attention!

Oh, my God, at this time, I really want to scold my mother, what's the matter?

In fact, I should have thought of the problem of pom when I deleted AuthFilter. At that time, I only cared about the problem of bug in YY Spring Cloud Gateway and plunged into it.

We know that Spring Cloud Gateway uses Netty as a server to receive requests and then forward them to downstream systems. What if tomcat is referenced here? It's really an interesting thing.

Step 15, kill tomcat.

In the pom file, exclude the jar package for tomcat, restart the project, send a request, and observe the direct memory:

Used direct memory: 0 used direct memory: 0

Oh, there's no problem. It's just tomcat.

So, how did tomcat play tricks? Adding tomcat can also respond to the request normally, and the request can be forwarded normally and returned to the client. Moreover, what is more frightening is that the internal Netty is indeed used for the read and write response of the request, which is really a little magic.

Step 16, discover the new world

In order to verify this problem, let's first quit the jumping machine, go back to my own computer, add tomcat to pom, and start the project. Hey, I can really get up, have fun.

Are tomcat and Netty listening on the same port at the same time, and both are up?

Take a look at the project startup log:

Connected to the target VM, address: '127.0.0.1 transport:' socket'. _ _ / / _ _ _ (() _ _ _ |'_ _ _ | |\ _ _ / _ _ _ | | | (_ _ | |) )'| _ |. _ _ | _ | | _ | _ | | _\ _ _ | | / = | _ | = | _ _ / = / _ /:: Spring Boot:: (v2.2.6.RELEASE) 2020-05-19 08 INFO 50 INFO 7896-[main] com.alan.test.Application: No active profile set | Falling back to default profiles: default 2020-05-19 00829 INFO 7896-[main] o.s.cloud.context.scope.GenericScope: BeanFactory id=082e67ca-d4c7-3a8c-b051-e806722fd225 2020-05-19 08V 50 INFO 04.998 INFO 7896-[main] o.s.b.w.embedded.tomcat.TomcatWebServer: Tomcat initialized with port (s): 8080 (http) 2020-05-19 08:50:05 .006 INFO 7896-[main] o.apache.catalina.core.StandardService: Starting service [Tomcat] 2020-05-19 008 INFO 7896-- [main] org.apache.catalina.core.StandardEngine: Starting Servlet engine: [Apache Tomcat/9.0.33] 2020-05-19 0859 INFO 7896-- [main] o.s.c.g.r. RouteDefinitionRouteLocator: Loaded RoutePredicateFactory [After] 2020-05-19 0860 INFO 7896-- [main] o.s.c.g.r.RouteDefinitionRouteLocator: Loaded RoutePredicateFactory [Before] 2020-05-19 0860 INFO 7896-[main] o.s.c.g.r.RouteDefinitionRouteLocator: Loaded RoutePredicateFactory [Between] 2020-05-19 0860 INFO 7896-[ Main] o.s.c.g.r.RouteDefinitionRouteLocator: Loaded RoutePredicateFactory [Cookie] 2020-05-19 0860 INFO 7896-[main] o.s.c.g.r.RouteDefinitionRouteLocator: Loaded RoutePredicateFactory [Header] 2020-05-19 0860 50 o.s.c.g.r.RouteDefinitionRouteLocator 05.860 INFO 7896-main] o.s.c.g.r.RouteDefinitionRouteLocator: Loaded RoutePredicateFactory [Host] 2020-05-19 08 50 Loaded RoutePredicateFactory 05.860 INFO 7896-[main] o.s.c.g.r.RouteDefinitionRouteLocator: Loaded RoutePredicateFactory [Method] 2020-05-19 0850 o.s.c.g.r.RouteDefinitionRouteLocator 05.860 INFO 7896-[main] o.s.c.g.r.RouteDefinitionRouteLocator: Loaded RoutePredicateFactory [Path] 2020-05-19 0850 INFO 7896-[main] o.s.c.g.r. RouteDefinitionRouteLocator: Loaded RoutePredicateFactory [Query] 2020-05-19 0860 INFO 7896-- [main] o.s.c.g.r.RouteDefinitionRouteLocator: Loaded RoutePredicateFactory [ReadBodyPredicateFactory] 2020-05-19 0860 INFO 7896-[main] o.s.c.g.r.RouteDefinitionRouteLocator: Loaded RoutePredicateFactory [RemoteAddr] 2020-05-19 0860 INFO 7896-[ Main] o.s.c.g.r.RouteDefinitionRouteLocator: this article comes from the source code of Tongge read 2020-05-19 0860 INFO 7896-[main] o.s.c.g.r.RouteDefinitionRouteLocator: Loaded RoutePredicateFactory [CloudFoundryRouteService] 2020-05-19 008 INFO 7896-- [main] o.s.b.w.embedded.tomcat.TomcatWebServer: Tomcat Started on port (s): 8080 (http) with context path''2020-05-19 08 INFO 50 with context path 07.304 INFO 7896-[main] com.alan.test.Application: Started Application in 4.271 seconds (JVM running for 5.0)

It turns out that only tomcat was started, so how did it hand over the request to Netty for processing?

Step 17, tomcat-> Netty

Students who have studied NIO should know that NIO divides SocketChannel into two types, one is ServerSocketChannel, the other is SocketChannel, in which ServerSocketChannel is created when the service is started to listen for the arrival of client connections, and SocketChannel represents the connection between the client and the server.

Students who have seen the NIO source code also know that SocketChannel is created through ServerSocketChannel.

Students who have seen the Netty source code also know that Netty divides these Channel into NioXxxChannel, EpollXxxChannel and so on according to different protocols, and Channel for each protocol is also divided into NioServerSocketChannel, NioSocketChannel and so on.

Under the Windows platform, NioXxxChannel is used by default, and it is known from above that NioSocketChannel should be created through NioServerSocketChannel, and if you use Netty normally, this is indeed the case.

The following figure shows the thread stack when NioSocketChannel is created with Netty:

However, our current scene is tomcat + Netty, so what is it?

At this point, hit a breakpoint in the constructor of NioSocketChannel, send a request, find that the breakpoint is in the constructor of NioSocketChannel, and observe the thread stack (from bottom to top):

As you can see, after tomcat- > spring- > reactor- > reactor-netty- > netty, the NioSocketChannel is finally created.

The situation here is a little complicated. We will analyze it in detail when we have time later.

Step 18, memory leak

As you can see from the above, Tomcat finally handed over the processing of the request to Netty, but why is there a memory leak? This is still a problem.

After my comparison, the problem still lies in the code in step 12. When using a normal Netty request, a task called NioEventLoop is added to fireChannelRead ().

MonoSendMany.SendManyInner.AsyncFlush:

Final class AsyncFlush implements Runnable {@ Override public void run () {if (pending! = 0) {ctx.flush ();}

This is used to actually write out the data in the write buffer (read and write out). At the same time, it will also clean up the data in the write buffer, that is, the client will receive the response only when this method is called. When using tomcat + Netty, without performing this task, the data is sent to the client (guess it may be sent through the tomcat connection rather than the NioSocketChannel itself). This is a legacy problem. I'll study it later, but now my mind is a little messy.

Thank you for reading, the above is the content of "tomcat and Netty how to solve the memory leak problem". After the study of this article, I believe you have a deeper understanding of how tomcat and Netty solve the memory leak problem, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report