Play processlist, efficiently trace the root cause of the soaring number of MySQL active connections 07/02 Update SLTechnology News&Howtos

Play processlist, efficiently trace the root cause of the soaring number of MySQL active connections

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

I. introduction

One of the problems often encountered in the operation and maintenance of MySQL is the surge in the number of active connections. Once encounter such a problem, according to the processlist information saved in the background, or connect to the MySQL environment, analyze the connection of MySQL. After dealing with many similar failures, I came up with the idea of making a gadget that can quickly analyze and count the connection status in the current MySQL from various dimensions every time I receive such an alarm. For example, the current distribution of connections, activity, and so on.

In addition, when dealing with real faults, it is often not enough to know the distribution of connections, we need to know what the current MySQL is busy doing, that is, what kind of SQL is being implemented. Moreover, sometimes even if we know the current implementation of the SQL, it is difficult to find the root cause, because once the active connection soars, the CPU is basically full, and the load of the IO is also very high, even if the usually fast SQL becomes a slow SQL, not to mention the slow SQL itself. So how do we identify which of these SQL are the culprits and which are just victims?

With these requirements and questions, this article gradually unfolds and analyzes and answers one by one to show the function of our gadget.

Second, connection analysis

The most intuitive way to know the connection information for the current MySQL is to look at MySQL's processlist. If you want to see the complete SQL, you can execute show full processlist, or directly look up the processlist table in information_schema. When the number of connections in MySQL is relatively small, it can also be analyzed by human flesh, but if the number of connections is more, it is difficult to examine the processlist to analyze the problem.

At first, our approach is to write a script that uses the MySQL client to log in to MySQL on the command line, execute show full processlist, and then analyze the output as a text. Originally, this implementation works well on MySQL5.5 and MariaDB, but when running on MySQL5.6, there is a problem, and there will be an extra line of Warning: Using a password on the commandline interface can be insecure in the console output. I believe many students running orzdba have also encountered this situation. This is the security tip of MySQL5.6 itself. When entering a plaintext password, there is no way to avoid it. Ali's classmates have also shared that they have made source code modifications for this, because many of their tasks rely on the command line to execute MySQL commands and capture the results.

Another way to avoid this problem is to use the mysql_config_editor tool, but this requires some additional configuration, as well as security risks. We don't have the ability to transform the source code, but we don't want to use mysql_config_editor, so we use another way, instead of logging in from the command line, we use the information_schema table processlist as the data source and query it to get the processlist information. There is another table performance_schema.threads, which also contains the same results and even richer background thread information, and compared with information_schema.processlist, you do not have to apply for mutex when querying, which has less impact on the system department, but this requires you to open perfomance schema, and interested students can try it on their own.

Once the source of the connection information is determined, the statistical dimension of the information will be analyzed. Take a look at the processlist table. The table structure is as follows (for example, MySQL5.6, MariaDB may have additional information):

ID: thread ID, this information is not very useful for statistics.

USER: the account used for the connection. This is a statistical dimension that is used to count the number of connections from each account.

HOST: the IP/hostname+ network port number of the connecting client, which is also a statistical dimension used to determine the client that initiated the connection

DB: the default database,DB used by the connection usually corresponds to a specific service and can be used to determine the connection distribution of the service. This is a statistical dimension.

COMMAND: the action of a connection actually means which stage the connection is in. The common ones are Sleep, Query, Connect, Statistics and so on. This is also a statistical dimension, which is mainly used to determine whether the connection is idle.

TIME: the time the connection is in the current state (in s). This will be analyzed later and is not included in the statistical dimension of the connection status.

STATE: the status of the connection, indicating what the current MySQl connection is doing. This is a statistical dimension, and there are many possible values. For more information, please see the official documentation.

INFO: connect the executing SQL, which is analyzed in the next section, but is not included in the statistical dimension of connection status.

Through the above analysis, five statistical dimensions of connections are summarized: user, host, db, command and state. With these five statistical dimensions, we can start writing gadgets.

The most basic functional requirement is to query the information_schema.processlist table, and then group the connections in MySQL according to the five statistical dimensions just summarized, and sort them by the number of statistics. The host field of the processlist table needs to be dealt with in some details, because its value is actually a combination of IP/hostname+ network port numbers. We need to cut out the port numbers so that we can count them according to the client, otherwise the port numbers of each client connection are different and cannot be grouped.

The final output is as follows:

With the most basic functions, it can meet the most basic statistical needs. However, when actually troubleshooting and dealing with online problems, you may not care about all the statistical dimensions and only need to count according to some of the above five dimensions; in addition, you may want host to appear before user and give priority to counting according to the IP or hostname of the client. Therefore, this requires that this tool has the function of adding or deleting statistical dimensions flexibly, and can dynamically adjust the order of statistical dimensions.

The final example output is as follows:

At first, we built this tool to analyze active connections, but free connections are included in the statistical results, so free connections need to be excluded from the statistical results. Of course, in addition to idle connections, there may be some connections of MySQL itself, such as binlog dump threads, Slave threads, and so on, which you also want to exclude from the results. This requires a function to exclude according to any statistical dimension. Now that we have the exclusion function, we can also add the inclusion function, that is, we can filter according to any statistical dimension so that connections with fixed conditions can appear in the statistical results.

With this connection statistics, we can understand the current connection status within MySQL and roughly determine which business or module has a problem.

III. SQL analysis

It is not enough to analyze the granularity of the business or module, which interface or which function is the problem? According to the connection status information above, there is no way to answer this question accurately. Let's go further, analyze the SQL in processlist, and go back to the INFO field of the table information_schema.processlist that we temporarily ignored in the previous section, which holds the SQL information that is being executed on each active connection. Through the analysis and statistics of SQL, we really have a clear understanding of the current internal activities of MySQL and what active connections are doing. In this way, we can help RD students quickly locate problems and find problematic interfaces or functional modules.

In fact, it is not easy to count SQL, because SQL is ever-changing, each SQL is not the same, even if it is a unified functional module SQL, the parameters may not be the same. In this case, how to count SQL? Here we draw lessons from the design ideas of pt-toolkit. In pt-query-digest 's analysis, there is a fingerprint field, which is actually a hash value, which represents a type of SQL whose SQL structure is identical except for different parameters. So we introduce this idea into the specific implementation, through regularization, remove all the specific conditions in the SQL, and then count the regular SQL with the same structure of SQL as the same SQL, and then we can do grouping statistics. For example, there are two SQL entries in the application, one of which is as follows:

SELECT * FROM `xxxxxxxxxxxxxxxxxxxx` `t`WHERE `t`.`ucid` = '1000000020018048' LIMIT 1

SELECT * FROM `xxxxxxxxxxxxxxxxxxxxx` `t`WHERE `t`.`ucid` = '1000000020281039' LIMIT 1

These two SQL have the same SQL structure except that the values of the ucid field in the last where condition are different. After regular matching, the value of ucid and the number of rows of limit are removed, and in the final statistical result, the two SQL become the following SQL:

SELECT * FROM `xxxxxxxxxxxxxxxx` `t`WHERE `t`.`ucid` =? LIMIT?

In this way, the grouping statistics of SQL is realized.

The sample output is as follows:

Of course, you can also add some additional information as needed to facilitate the location and analysis of problems, such as user, Host, and so on.

IV. Transaction analysis

With SQL analysis and statistics, in some scenarios, you can basically locate the problem, such as high-frequency well-planned SQL. However, if the response of the whole system becomes slow due to slow SQL, can the above simple SQL statistics effectively and quickly identify the problem? Certainly not, because at this time, simply from the statistical results, it is impossible to tell which is the slow SQL that causes the system response to slow down, and which is the affected SQL. Of course, in the statistical results, the SQL with more times may be slow SQL, but it may also be some high-frequency interface calls, because the system response becomes slow, resulting in the accumulation of requests. Therefore, the best way is to be able to add some other auxiliary information to help determine which requests may be slow queries. What kind of auxiliary information should be added? There are two options.

First of all, let's go back to the Time field of the table information_schema.processlist, which we ignored in the first section, which can be used to roughly determine the execution of SQL on the connection, and the difference between the actual length and the actual length depends on the time spent at each stage of SQL execution. Second, because online tables are InnoDB tables, they can be associated with InnoDB transaction statistics. InnoDB transactions are divided into read-only transactions and read-write transactions, and the information is stored in the information_schema.INNODB_TRX table. For some large transaction scenarios, a transaction contains multiple operations, and the results obtained in this way will be biased. If it is a non-InnoDB engine, this approach does not apply.

When analyzed here, take the read-only transaction, that is, the select statement, as an example. In implementation, we simplify the problem by estimating the execution time of a SQL through the value of the time field in processlist or the execution time of the transaction, so as to determine which requests are slow queries and which are affected slow queries among the large backlog of connections in processlist. When making use of transaction judgment, associate the ID field in processlist with the trx_MySQl_thread_id field in information_schema.INNODB_TRX. The specific SQL is select p. Judge, now ()-t.trx_started as runtime frominformation_schema.processlist p, information_schema.INNODB_TRX t where p.id = t.trx_MySQl_thread_id. Finally, the total execution time and the average execution time of each type of SQL after the regularization are calculated. The longer the execution time, the more likely we are to think that it is the culprit that causes the problem.

The sample output is as follows:

RT: the total execution time of this type of SQL up to now, in S (seconds)

AVGRT: the average execution time of each transaction in S (seconds) for this type of SQL to date.

After adding additional information such as user and Host, the output is as follows:

5. Conclusion

Through the above three dimensions, we have basically dug up almost all the information available in MySQL's processlist. When we investigate and deal with practical problems, we often use this tool. After practical testing, the efficiency of problem location is still relatively efficient.

However, there are still many areas for improvement. For example, in the analysis of SQL statements, different values should be considered as different SQL strictly speaking, because the execution time may vary greatly. In addition, in the SQL execution time analysis, the result of the read-only transaction analysis for a single select statement is very accurate, but for the read-write transaction, how to reduce the error of the result, because the read-write transaction is more complex than the read-only transaction, because some additional cases such as lock waiting may be involved. All of these are already in our improvement plan, if you have any good ideas or ideas, you are welcome to communicate.

We do these things ourselves, in fact, the accumulation and precipitation of daily operation and maintenance experience, if it happens that the ideas and realization of a certain student are similar, it is inevitable.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.