Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

[case] the second thread pool performance problem caused by resident query

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

A phenomenon

The number of connections of one of the four instances of a business is much higher than that of the other three instances (normal is 4K, problem instance is 8K +), but the configuration of these four instances is exactly the same. The feedback of business development is that part of the connection failed.

The result of performing show processlist shows:

There are a large number of connections in the Killed state, 6Klinks in the Connect state, and 6 binlog dump connections (if you read the previous article, is it possible that this caused it?)

The result of performing pt-pmp shows:

Mysqld is very idle.

Perform show engine innodb status:

There are no idle big transactions.

Second treatment

According to the knowledge of the previous article, it is preliminarily judged that the database instance encountered that part of the group that is Thread Pool is blocked. (most of the problems that can block the query in the login phase are threadpool scheduling, and of course, it cannot be ruled out that the internal lock waiting in login is caused by logical reasons.)

After adjusting the thread_pool_oversubscribe, all connections in the Connect/Killed state disappear and the number of connections returns to normal.

Analysis of three problems

Although the problem is solved, but there are still a large number of questions, it is obvious that in the case of unknown reasons, if a similar phenomenon occurs unexpectedly at the peak of the business, the consequences are very serious, so we began to dig into the deep-seated reasons.

[twists and turns]

Now that the problem is solved by adjusting thread_pool_oversubscribe, it is clear that group is blocked, so the most important thing is to find out what is blocking Thread Pool.

The most noticeable phenomenon this time is, of course, the 126 connections in Killed state. We know that Killed will be displayed when the connection is running and is in the rollback phase after being kill. Generally speaking, this phase is very short (unless there is a lot of rollback work, but the State information is empty, obviously not in rollback), as the pt-pmp results prove. At first, it was suspected that these Killed connections blocked some group of threadpool, but without a reasonable explanation, a lot of time was wasted here.

[dark willows and bright flowers]

If you can't get through on Killed session, you can only look at other session, and you'll find that the thread id of blocked Connect connections is very regular:

| | 4261587 | unauthenticated user | connecting host | NULL | Connect | NULL | login | NULL |

| | 4261619 | unauthenticated user | connecting host | NULL | Connect | NULL | login | NULL |

| | 4261651 | unauthenticated user | connecting host | NULL | Connect | NULL | login | NULL |

| | 4261683 | unauthenticated user | connecting host | NULL | Connect | NULL | login | NULL |

| | 4261715 | unauthenticated user | connecting host | NULL | Connect | NULL | login | NULL |

| 4261747 | unauthenticated user | connecting host | NULL | Connect | NULL | login | NULL | the interval is 32 increments. It is obvious that one of the group is blocked. After taking the model for 32, it is found that all of them are No. 19 group, so it seems that binlog dump has not run.

After fetching the 32 mode for the thread id of the binlog dump thread, it is found that 4 of the 6 thread is in group 19, while thread_pool_oversubscribe is only 3 (internal limit is 3: 1), so the 19 group is completely blocked.

This completely explains the cause of this congestion. The 126Killed session in this question greatly mislead our judgment.

[in-depth analysis]

Looking back, some people will ask, how did those 126 Killed session come from?

Here we need to talk about the principle of how Thread Pool handles kill:

When a running connection is kill, the sql it executes fails, its thd- > killed is set to THD::KILL_CONNECTION, and Thread Pool is notified (callback function). Thread Pool sends out an io signal in the callback function. Worker needs to capture the event (treated like a normal event) before exiting the session, otherwise you can always see the status of the 126session in show processlist. But in this case, after these 126 session have been kill, there happens to be a binlog dump connection to group 19, which is about to be congested.

| | 4261363 | xxxx | 10.9.6.57 Killed 10843 | xxxx_0133 | Killed | 246196 | | NULL |

| | 4261395 | xxxx | 10.8.9.18 Killed 35401 | xxxx_0133 | Killed | 246186 | | NULL |

| | 4261459 | xxxx | 10.8.2.61 Binlog Dump 60919 | NULL | Binlog Dump | 246110 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL |

| | 4261491 | unauthenticated user | connecting host | NULL | Connect | NULL | login | NULL |

| | 4261502 | xxxx | 10.8.2.41 Sleep 11862 | xxxx_0133 | Sleep | 1 | | NULL |

| | 4261523 | unauthenticated user | connecting host | NULL | Connect | NULL | login | NULL | look at the 4261459 connection immediately following the Killed connection, so that the 19th group is completely blocked and the poor Killed connection has no chance to exit. This is the origin of these 126Killed connections. |

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report