How to locate from delay Burr problem to Netty performance Statistical Design 07/06 Update SLTechnology News&Howtos

How to locate from delay Burr problem to Netty performance Statistical Design

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Editor to share with you how to locate from the delay burr problem to the Netty performance statistical design, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's learn about it!

1. Background:

Usually, users use Netty as a black box to read and send protocol messages through Netty, as well as codec operations, without paying attention to the underlying implementation details of Netty.

In high concurrency scenarios, it is often necessary to count the key performance KPI data of the system and analyze the fault location combined with logs and alarms. If you do not understand the underlying implementation details of Netty, it will be difficult to obtain which key performance data and the correct way to obtain the data. Wrong or inaccurate data may mislead the way of thinking and direction, resulting in the problem can not be solved correctly.

Second, the arduous course of time delay burr troubleshooting:

Problem phenomenon: during the peak period of business in an e-commerce production environment, there is an occasional problem of service invocation delay, and the service with a sudden increase in delay has no fixed rule. Although the proportion of problems is very low, it has a great impact on the customer's experience. We need to locate the cause of the problem and solve it as soon as possible.

Preliminary analysis of time delay burr problem:

The delay of service invocation increases, but it is not abnormal, so running logs do not print ERROR logs, and traditional logs alone cannot effectively locate problems. The distributed message tracking system is used to demarcate the fault in the distributed environment.

By sorting and filtering the service invocation delay, we find out the details of the service invocation chain with the delay increasing. It is found that the processing of the business server is very fast, but the consumer statistics show that the processing of the server is very slow. What's going on when the data seen at both ends of the call chain are inconsistent?

Based on the analysis of the details of the call chain, it is found that the delay printed by the server is the time consumed by the business service interface invocation, and does not include:

(1) the time it takes for the server to read the request message, decode the message, deliver the internal message and queue the message queue in the thread pool.

(2) response message encoding time, message queue sending queue time, and message writing time to Socket send buffer.

The service invocation chain works as follows:

Figure 1 how the service invocation chain works

Expand the message calling process in the call chain in detail, taking the server reading request and sending response message as an example, as shown in the following figure:

Figure 2 details of the server call chain

The processing on the server side is time-consuming, which should include the processing time of the service framework in addition to the invocation time of the business service itself, as follows:

(1) the decoding (deserialization) time of the request message.

(2) the request message is queued for execution time in the business thread pool.

(3) response message encoding (serialization) time.

(4) the queue time of the response message ByteBuf in the sending queue.

Because the service-side invocation chain only collects the call time of the business service interface and does not include the scheduling and processing time of the service framework itself, it is impossible to demarcate the fault: the server does not count the processing time of the service framework. Therefore, the problem is that the message sending queue or business thread pool queue is overstocked, which leads to the increase of delay.

Service invocation chain improvements:

Optimize the burying point of the service invocation chain, the specific measures are as follows:

(1) contains the time-consuming encoding and decoding of client-side and server-side messages.

(2) contains the queuing time of request and reply messages in the queue.

(3) contains the queuing time of the reply message in the communication thread send queue (array).

At the same time, in order to facilitate the problem location, the performance statistics log of printout Netty is added, which mainly includes:

(1) the total number of links in the current system, and the status of each link.

(2) the total number of bytes received by each link, the number of bytes received by cycle T, and the throughput of message reception.

(3) the total number of bytes sent by each link, the number of bytes sent by cycle T, and the throughput of messages.

After optimizing the service call chain, it has been running online for a period of time. By analyzing and comparing the Netty performance statistics log and the call chain log, it is found that the data of the two sides are not consistent, and the data obtained by the Netty performance statistics log is also inconsistent with that seen by the front-end portal. Therefore, it is suspected that BUG exists in the new performance statistics feature and needs to continue to locate the problem.

It's all caused by synchronous thinking:

In the traditional synchronous service invocation, after initiating the service invocation, the business thread blocks and waits for the response. After receiving the response, the business thread continues to execute and accumulates the sent messages to obtain performance KPI data.

After using Netty, all network Iwrite O operations are performed asynchronously, that is, calling the write method of Channel does not mean that the message is actually sent to the TCP buffer. If the number of bytes sent is counted after the write method is called, the statistical results are not accurate.

Code review the message sending function and find that the code counts the number of bytes of the request message sent directly after calling the writeAndFlush method. The code example is as follows:

Public void channelRead (ChannelHandlerContextctx, Object msg) {

Int sendBytes = (ByteBuf) msg) .readableBytes ()

Ctx.writeAndFlush (msg)

TotalSendBytes.getAndAdd (sendBytes)

}

Calling writeAndFlush does not mean that the message has been sent to the network, it is just an asynchronous message sending operation. After calling writeAndFlush, Netty will perform a series of operations and finally send the message to the network. The related process is as follows:

Fig. 3 writeAndFlush processing flow chart

Through an in-depth analysis of the writeAndFlush method, we find that the performance statistics code ignores the following time-consuming:

(1) execution time of the service ChannelHandler.

(2) the queue time of the asynchronously encapsulated WriteTask/WriteAndFlushTask in the NioEventLoop task queue.

(3) the queue time of ByteBuf in ChannelOutboundBuffer queue.

(4) the time when the JDK NIO class library wrote the ByteBuffer to the network.

Because the performance statistics omit the execution time of the above four key steps, the statistical transmission rate will be higher than the actual value, which will interfere with our problem location thinking.

Correct message delivery rate performance statistics policy:

The correct statistical method of message sending rate performance is as follows:

(1) get the ChannelFuture after calling the writeAndFlush method.

(2) add a message to send ChannelFutureListener and register it in ChannelFuture to listen to the message sending result. If the message is successfully written into SocketChannel, Netty will call back the operationComplete method of ChannelFutureListener.

(3) performance statistics are carried out in the operationComplete method of message sending ChannelFutureListener.

Examples of correct performance statistics codes are as follows:

Public voidchannelRead (ChannelHandlerContext ctx, Object msg) {

Int sendBytes = (ByteBuf) msg) .readableBytes ()

ChannelFuture writeFuture = ctx.write (msg)

WriteFuture.addListener ((f)->

{

TotalSendBytes.getAndAdd (sendBytes)

});

Ctx.flush ()

}

Analyze the relevant source codes for sending Netty messages. When the number of bytes sent is greater than 0, clean up the ByteBuf. The code is as follows:

Protected voiddoWrite (ChannelOutboundBuffer in) throws Exception {

/ / Code omission.

If (localWrittenBytes l: listeners) {

If (l = = null) {

Break

}

NotifyProgressiveListener0 (future,l, progress, total)

}

After the problem is located, the Netty performance statistics code is modified according to the correct practice. After being launched, combined with the call chain log, some of the service delay burr problems that occasionally occur during the business peak period are quickly identified, and the problem is solved after optimizing the configuration of business thread pool parameters.

Common misunderstandings in message delivery performance statistics:

Common misunderstandings in performance statistics in real business are as follows:

(1) after calling the write/ writeAndFlush method, the sending rate is counted.

(2) performance statistics during message encoding: after encoding, get the number of bytes readable by out, and then accumulate. Encoding completion does not mean that the message is written to the SocketChannel, so the performance statistics are not accurate.

Collection of key performance indicators of Netty:

In addition to the message sending rate, there are other important indicators that need to be collected and monitored, whether shown in the details of the call chain or collected, collected and displayed by operation and maintenance. These performance indicators are very helpful for fault demarcation and location.

Netty I Dot O thread pool performance metrics:

Netty I / O thread pool is not only responsible for reading and writing network I / O messages, but also needs to deal with both ordinary tasks and timing tasks, so the backlog of tasks in message queue is an important index to measure the workload of Netty I / O thread pool. Because the Netty NIO thread pool uses a thread pool / group mechanism that contains multiple single-thread thread pools, it is not necessary to count the number of worker threads, maximum threads, and so on, like the native JDK thread pool. The relevant code is as follows:

Publicvoid channelActive (ChannelHandlerContext ctx) throws Exception {

KpiExecutorService.scheduleAtFixedRate ()->

{

IteratorexecutorGroups = ctx.executor (). Parent (). Iterator ()

While (executorGroups.hasNext ())

{

SingleThreadEventExecutorexecutor = (SingleThreadEventExecutor) executorGroups.next ()

Int size = executor.pendingTasks ()

If (executor = = ctx.executor ())

System.out.println (ctx.channel () + "-->" + executor + "pending size in queue is:-- >" + size)

Else

System.out.println (executor+ "pending size in queue is:-- >" + size)

}

}, 0J1000, TimeUnit.MILLISECONDS)

}

The running result is as follows:

Figure 5 Netty Imax O thread pool performance statistics KPI data

Backlog of messages in the Netty send queue:

The backlog of Netty message sending queue can reflect the network speed, the reading speed of the communication peer, and the sending speed of itself, etc., so the fine analysis of service call delay is very helpful for problem location. The example of its collection method code is as follows:

Publicvoid channelActive (ChannelHandlerContext ctx) throws Exception {

WriteQueKpiExecutorService.scheduleAtFixedRate ()->

{

Long pendingSize = ((NioSocketChannel) ctx.channel ()). Unsafe (). OutboundBuffer (). TotalPendingWriteBytes ()

System.out.println (ctx.channel () + "-->" + "ChannelOutboundBuffer's totalPendingWriteBytes is:"

+ pendingSize + "bytes")

}, 0J1000, TimeUnit.MILLISECONDS)

}

The implementation results are as follows:

Figure 6 message delivery queue performance KPI data corresponding to Netty Channel

Because totalPendingSize is volatile, the statistics thread can correctly read its latest value even if it is not an I / O thread of Netty.

Netty message read rate performance statistics:

For the performance statistics of the message reading rate of a Channel, you can add a performance statistics ChannelHandler before decoding the ChannelHandler to count the read rate. The related code example is as follows (ServiceTraceProfileServerHandler class):

Public voidchannelActive (ChannelHandlerContext ctx) throws Exception {

KpiExecutorService.scheduleAtFixedRate ()->

{

Int readRates = totalReadBytes.getAndSet (0)

System.out.println (ctx.channel () + "--> read rates" + readRates)

}, 0J1000, TimeUnit.MILLISECONDS)

Ctx.fireChannelActive ()

}

Public void channelRead (ChannelHandlerContextctx, Object msg) {

Int readableBytes = (ByteBuf) msg) .readableBytes ()

TotalReadBytes.getAndAdd (readableBytes)

Ctx.fireChannelRead (msg)

}

The running result is as follows:

Figure 7 performance statistics of NettyChannel message read rate

These are all the contents of the article "how to locate the delay Burr problem to the performance Statistical Design of Netty". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.