Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Do not change one line of code to locate online performance issues

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

Background

I have had bad luck recently and have been harassed by online problems almost every day. A concurrency problem with HashSet was solved a few days ago, and there was another performance problem on Monday.

The general phenomenon is:

We provide an OpenAPI response that is fast and sometimes slow, fast for tens of milliseconds, and slow for a few seconds.

Try to solve the problem

Since this is not a business problem, it cannot be located directly. So try to reproduce in the test environment, but unfortunately the test environment is fast.

I have no choice but to bite the bullet.

In the middle of the process, I took a fluke and asked the operation and maintenance staff to check the response time of OpenAPI in Nginx and wanted to throw the pot to the network. As a result, it was hit in the face; the log in Nginx also showed that there was a problem with the response time.

In order to understand this problem clearly, I briefly combed the calling process.

The whole process is a common hierarchical architecture:

The client requests to Nginx.

Nginx loads the back-end web service.

The web service invokes the back-end Service service through RPC.

Log Dafa

The first thing we think of is to log and record the processing time at a method or interface that may be slow to determine where there is a problem.

But in terms of the call chain just now, the request process is not short. Adding logs involves a lot of changes and may not be able to locate the problem in case of omission.

Another is to change the code will also involve the release of the online version.

Tool analysis

So the best way is to analyze the problem without changing a single line of code.

At this point, you need an agent tool. We chose Tprofile, which was open source before Ali, to use.

You only need to add-javaagent:/xx/tprofiler.jar to the startup parameters to monitor the method you want to monitor is time-consuming and can give you a report, which is very convenient. It does not have any intrusiveness to the code and has little impact on performance.

Tool use

Let's briefly show how to use this tool.

First of all, the first step is naturally the clone source code and then packaged, I can clone the modified source code.

Because this project has not been maintained for many years, and there is still some bug left, I fixed an impact on the use of bug on its original basis, and made some optimizations.

Just execute the following script.

Git clone https://github.com/crossoverJie/TProfiler

Mvn assembly:assembly

When we get here, we will generate the jar package we want to use in the TProfiler/pkg/TProfiler/lib/tprofiler-1.0.1.jar of the project.

Next, you just need to configure the jar package into the startup parameters, along with a configuration file path.

This configuration file is the official explanation of copy.

# log file name

LogFileName = tprofiler.log

MethodFileName = tmethod.log

SamplerFileName = tsampler.log

# basic configuration items

Start sampling time

StartProfTime = 1:00:00

End sampling time

EndProfTime = 23:00:00

The length of time of sampling

EachProfUseTime = 10

Time interval of each sampling

EachProfIntervalTime = 1

SamplerIntervalTime = 20

Port, don't conflict mainly.

Port = 50000

DebugMode = false

NeedNanoTime = false

Whether to ignore the get set method

IgnoreGetSetMethod = true

# file paths log path

LogFilePath = / data/work/logs/tprofile/$ {logFileName}

MethodFilePath = / data/work/logs/tprofile/$ {methodFileName}

SamplerFilePath = / data/work/logs/tprofile/$ {samplerFileName}

# include & excludes items

ExcludeClassLoader = org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader

Packets to be monitored

IncludePackageStartsWith = top.crossoverjie.cicada.example.action

Packages that do not need to be monitored

ExcludePackageStartsWith = com.taobao.sketch;org.apache.velocity;com.alibaba;com.taobao.forest.domain.dataobject

The final startup parameters are as follows:

-javaagent:/TProfiler/lib/tprofiler-1.0.1.jar

-Dprofile.properties=/TProfiler/profile.properties

In order to simulate and troubleshoot the slow response of the interface, I implemented a HTTP interface with cicada. Two time-consuming methods are called:

So that when I start the application, Tprofile records the method information it collects in the directory I configured.

After I visit the interface http://127.0.0.1:5688/cicada-example/demoAction?name=test&id=10 a few times, it writes the detailed response of each method to tprofile.log.

Each column from left to right is represented as follows:

Thread ID, method stack depth, method number, elapsed time in milliseconds.

But tmethod.log is still empty.

At this point, we only need to execute this command to brush the latest method sampling information into the tmethod.log file.

Java-cp / TProfiler/tprofiler.jar com.taobao.profile.client.TProfilerClient 127.0.0.1 50000 flushmethod

Flushmethod success

The port is the port in the configuration file.

Open tmethod.log again:

Do not change one line of code to locate online performance issues

Information about the method is recorded.

The first line of numbers is the number of the method. You can use this number to query the time spent each time in the tprofile.log (details).

The number at the end of the line is the line number of the last line of the method in the source code.

In fact, most of the performance analysis is to count the average time spent on a method.

So you also need to execute the following command to generate the average time-consuming of each method through tmethod.log tprofile.log.

Java-cp / TProfiler/tprofiler.jar com.taobao.profile.analysis.ProfilerLogAnalysis tprofiler.log tmethod.log topmethod.log topobject.log

Print result success

Opening topmethod.log is the average time spent on all methods.

4 is the number of requests.

205 is the average time consuming.

818 is total time-consuming.

It is consistent with the actual situation.

The details of the method are time-consuming

There may be other requirements; for example, what if I want to query all the details of a method?

It's not officially provided, but it's okay, just a little bit more trouble.

For example, I want to check the time details of selectDB ():

First of all, you need to know the number of this method, which can be found in tmethod.log.

2 top/crossoverjie/cicada/example/action/DemoAction:selectDB:84

The number is 2.

We already know that tprofile.log records the details, so you can check it with the following command.

Grep 2 tprofiler.log

View the details of each execution through the third column method numbered 2.

But this approach is obviously not friendly enough, requires artificial filtering of interference, and there are many steps, so I'm going to add such a feature.

You only need to pass in a method name to query the time-consuming details of all collected methods.

Summary

Going back to the previous question; through the online analysis of this tool, we get the following results.

Some methods do execute quickly and sometimes slowly, but they are all database-related. Due to the great pressure on the database at present, we are going to separate the hot and cold data and separate the tables from the database.

Before the implementation of the first step, change part of the operation to write to the database asynchronously to reduce the response time.

Consider connecting to APM tools such as pinpoint.

There are a lot of tools like Tprofile, so just find the one that suits you.

Before using a distributed tracking tool like pinpoint, you should rely heavily on this tool, so you may also make some customizations, such as adding some visual interfaces, to improve the efficiency of troubleshooting.

Your likes and sharing are the greatest support for me.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report