Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Thoughts on troubleshooting of excessive CPU occupancy rate of MySQL Services during stress testing

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

0. Experience summary: when you focus on the TPS of the business interface, you should also pay attention to the QPS of the database server. If a business process contains multiple queries, the magnifying effect of the rise of the business interface TPS on the database server QPS will be obvious. If the query result set is small, try to use one query statement to return multiple result sets through a subquery to avoid splitting multiple result sets into multiple database queries, otherwise it will cause too many database connection / query operations, consume IO resources, reduce TPS, and increase CPU occupancy. In business code, try to avoid writing database queries in loop statements. Build the index according to the frequency of the SQL statement, and write the query condition fields according to the field order of the joint index (matching order from left to right). Pay attention to the slow search log 1. Background description

Then we continued to write the last article on Grpc performance tuning. This time, we are testing an API for querying user group list information. This API needs to query all the group information of a user, including the name and number of members of each group.

After the previous optimization of the JVM parameters of the business machine, there are no problems such as frequent GC of the business machine, high CPU occupancy rate, not going to the TPS, and so on. But we have encountered two new problems: when the concurrency of the business interface is about 50 and TPS600, the pressure test interface has a timeout error, and the database server CPU occupancy rate is more than 93%!

Second, the testing process

Data preparation:

Group table of tweak _ group:

T_info_group_member group membership table:

Data convention: in the test data, there are 3000 members in each group; each member has 20 groups.

1. The first pressure test

We first carried out a test, using two pressure testing machines to jointly initiate 300 concurrency, continuous stress testing for 2 minutes. (if only one pressure measuring machine is used to initiate 300 concurrency, due to the limited machine port, no new request can be made when the TPS exceeds 5W, and the pressure measuring machine will report an error.)

(1) configuration of database connection pool

C3p0.initialPoolSize=15

C3p0.minPoolSize=15

C3p0.acquireIncrement=10

C3p0.maxPoolSize=32

(2) Database slow check log and database server monitoring index

The top command shows that the CPU utilization rate is more than 91%, and the slow check log has been brushed all the time!

By analyzing the SQL statement in the slow search log and the corresponding database table, it is found that the query statement has "where aversion.. and breadth.. and caching..", but the field of a federated index is "a, c". According to the matching rules of the federated index, this sql cannot use the index, resulting in a slow search. After replacing the index with "a, b, c", the efficiency of a single sql query is doubled.

2. The second pressure test

Use two pressure testing machines to initiate 50 concurrency and continue the pressure test for 2 minutes.

(1) other configurations will not be changed.

(2) Database slow check log and database server monitoring index

After the above adjustment, the slow log check is gone, but the CPU utilization rate is still more than 90%. This is an intolerable data, and if concurrency continues to improve, the database server will soon burst.

Since other interfaces involving sql query have not experienced such a high CPU occupancy rate during stress testing, we have eliminated the problem of database server installation and configuration.

We began to think about the possible cause of the problem:

Parameters such as database connection pooling are defective, resulting in insufficient connection pooling and new connections are being established all the time. The business code logic may be flawed, causing the query statement to consume too much IO or memory resources. The database server CPU needs to occupy so much when the number of concurrency is high. (this is the most helpless and least willing to accept the ending)

Let's first look at the results of show processlist; and find that many connections are actually idle, which means that the current connection pool is sufficient.

When we try to zoom in on the connection pool, the configuration is changed to:

C3p0.initialPoolSize=20

C3p0.minPoolSize=20

C3p0.acquireIncrement=20

C3p0.maxPoolSize=128

In this configuration, there is no effect on the occupancy of CPU. Then troubleshoot the connection pool configuration.

Is it possible that you need to consume so much CPU under the current concurrency conditions?

We compared another interface that involves SQL queries, which has a TPS of 1100 under 60 concurrency, while the current stress test interface, TPS, is only 600. Then under the condition of the same concurrency, the same query statement complexity and the same amount of query result data, the current stress test interface not only has a low TPS, but also takes up too much CPU, which may not be a database problem. This excludes the assumption that the high concurrency will inevitably lead to more than 90% CPU occupancy.

We continue to look at the monitoring metrics of the database server and use Ali's orzdba script to monitor the MySQL server.

The MySQL server metric data of the current stress test API:

Against the metric data of the API:

From the above two comparison diagrams, we can see that the database QPS of the current pressure test interface is as high as 3000. Comparing the data, we can see some problems:

Current interface:

Concurrency: 6050 TPSVOR 600, database CPU:92%, database QPS:3000

Compare with the API:

Concurrent 60th TPSVO1000, database CPU:20%, database QPS:1400

At present, the processing of the pressure test interface is more time-consuming, which may be due to the fact that many database operations are involved in one interface business.

Then the next step is to troubleshoot the database operations in the business code. Do code review!

Core business pseudo code:

/ / query the user's group list

List groupList = groupDao.selectGroups (userId)

For (Dto dto:groupList) {

/ / query the number of users in each group, it will cycle 20 times!

Int userNumber = groupDao.getGroupNumber (dto.getAsIong (groupId))

}

Why is this code so awkward? The first query has already been used to query the group information, but also to use the for loop to traverse 20 times to count the number of users in each group.

In this way, an interface request will operate many database queries and lead to more network and IO operations. Then the problem of excessive CPU occupancy is likely to be caused by this situation.

Our optimization measure is to optimize the query statement corresponding to groupDao.selectGroups (userId), using the subquery to return the user's group list and the number of users in each group, removing the for loop.

3. The third pressure test

Use two pressure testing machines to initiate 50 concurrency and continue the pressure test for 2 minutes.

(1) other configurations will not be changed.

(2) Database slow check log and database server monitoring index

The database slow check log does not prompt the slow check SQL statement, the database server CPU occupancy rate is stable below 80%, the database QPS has increased to nearly 7000! This optimization effect can be said to be very obvious.

4. The questions left to me

Some people may ask: the QPS of the database after code optimization is higher than before, so the use of CPU should be more, but why does the CPU occupancy rate of the database come down instead? Why is that? In fact, I haven't figured it out either. friends who know the reason are welcome to leave a message for discussion.]

Recently, I have been thinking about this problem and consulting relevant information. Suddenly, we can see from the performance data before and after interface optimization: before optimization, the TPS of the business interface is only 600, and the database QPS is only 3000, while the optimized business interface TPS has reached 1100, which is almost doubled. At the same time, the database QPS has reached 6000 and doubled, but the database CPU utilization has decreased instead. From these data, we can draw such a conclusion: the growth of interface TPS naturally leads to the peer growth of database QPS, indicating that before and after optimization, the pressure bottleneck of the database is far from being reached, so the soaring utilization of database server CPU is not caused by SQL processing!

The resource consumption of the database server, in addition to SQL execution, disk IO, in fact, there are network connections.

In the code before optimization, a TPS will make multiple database connections and submit SQL query requests, and the network connection consumption caused by this operation is also very considerable. In the optimized code, a TPS will make a database connection and submit a SQL query request, which reduces the resource consumption a lot.

Before interface optimization:

Concurrency: 6050 TPSVOR 600, database CPU:92%, database QPS:3000

After the interface is optimized:

Concurrency: 60 TPS CPU 1100, database QPS:6000: less than 80%

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report