Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

A troubleshooting process for serious performance problems caused by logs

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Description of a phenomenon

The new system worked normally after it was launched, but suddenly one day the customer reported that the login was very slow. Then first reproduce the customer's problem, in the test environment, a single test system login found that the response time is within 100ms, the data is still good. However, the result of the concurrent test was astonished. Under the concurrent 100 users, the response time soared to about 20s. The result completely exceeded expectations and reproduced the customer's problem. The next step is to troubleshoot and solve the problem.

Two problems troubleshooting

1. The problem is in the case of multi-user concurrency, when there are 100 concurrent users, the throughput is only 4 seconds from the point of view of Jmeter console.

At this time, the cpu utilization of the machine is only about 2%, and no exception has been found on the disk and network. What causes the request response time to reach about 20 seconds? The initial guess is that there is a lock in the database layer under concurrency, so follow this idea to investigate the situation of the database layer.

2. Test concurrently again, and pay attention to whether the locking table occurs in the database. From the testing process, it is found that there is no lock table, and no time-consuming sql statements for verification are found in the awr report, which eliminates the idea that the problem occurs in the database layer.

Since there is no problem in the database layer, we should consider the middleware layer. The system is developed with java and tomcat as the application server. As a result, I naturally want to analyze the process of the system.

3. Concurrency testing again, by observing the running situation of each thread of the java process, it is found that when concurrency occurs, top shows that no thread is running. Vaguely, I already feel closer to the truth.

When concurrency, there are no threads in running, so what are they doing? Therefore, you need to check the status of each thread.

4. Use jstack to output the running status of each thread of the process to the log for later analysis.

Command format: jstack pid > stack.log

Looking at the log, it is important to find that a large number of threads are in the state of blocked, and the reason for blcok is waiting for log-related resources.

At this point, the cause of the problem has been basically determined. Since it is related to the log, first adjust the log level from debug to error, and test to see if there is any change.

5. Adjust the log level from debug to error, and test concurrently again. It is found that the problem does not occur. The tps rises to about 322max s, the response time 90%line is 740ms, and the cpu is used to 40% Mel 50%. Everything starts to be normal.

6. Now that you know that the problem is caused by the log configuration, do some more tests on the log configuration to see what happens.

First, the log4j is used for log output, and the log priority from high to low is ERROR, WARN, INFO, and DEBUG. There was a performance problem with the previous configuration of debug, but now the problem of configuring error is resolved. You try to configure info again, and the test found no problem, that is, the problem occurs only under debug configuration. After the development of the walkthrough code, it is found that the log printed by the thread will only be recorded under debug. This also confirms once again the problems caused by debug log configuration.

The actual system release is configured according to info, but the customer environment is caused by artificial adjustment of debug, so the problem of info is solved again. Although the on-site problem has been temporarily solved, the cause of the problem under debug configuration needs further analysis.

Third, further analysis

From the thread log, we can see that the org.apache.log4j.Category.callAppenders method is called, in which there is a synchronized synchronization lock. Under the condition of concurrency, the synchronization lock will lead to thread competition and thread BLOCKED problems.

As the solution to this problem has not yet been verified, the following solutions to similar problems are provided for reference:

1. To solve this problem using Apache log, the code is as follows:

Private static final Log log = LogFactory.getLog ("xxx")

two。 Modify the log4j configuration file to add a buffer configuration item

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report