How to understand the problem that CPU occupies 100% 07/19 Update SLTechnology News&Howtos

How to understand the problem that CPU occupies 100%

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "how to understand the problem of 100% occupation of CPU". In daily operation, I believe many people have doubts about how to understand the problem of 100% occupation of CPU. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the question of "how to understand the problem of 100% occupation of CPU"! Next, please follow the editor to study!

How to understand CPU utilization

Take the top command of Linux as an example, the effect is as follows:

Top command

The use of CPU is shown in the% CPU column, and the percentage refers to the overall percentage of time consumed:

% us: indicates the CPU usage time of the user process (not scheduled by nice)

% sy: indicates the CPU usage time of the system process, mainly used by the kernel.

% ni: indicates the elapsed time in the user process through CPU scheduling (nice).

% id: idle CPU time

The amount of time wa:CPU runtime is waiting for IO

% time spent handling hard interrupts by hi:CPU

% time spent processing soft interrupts by si:CPU

% st: CPU time stolen by the virtual machine

In general, the CPU usage we are talking about is too high, referring to the% us metric, which is usually the same value for CPU utilization in monitoring (there are also other methods to calculate, but for simplicity, no other cases are considered). Several other indicators that are too high also represent the abnormal state of MySQL. For simplicity, it mainly refers to the scenario where% us is too high.

MySQL and Thread

MySQL is a single-process, multithreaded structure, which means that only one line of data can be seen with the top command in an exclusive MySQL server.

TOP command effect

What you can see here is the process ID of MySQL. If you want to see the situation of the thread, you need to use top-H

TOP command effect

What you can see here is the ID of each thread of MySQL, and you can see that after MySQL starts, it creates a lot of internal threads to work.

These internal threads include the system threads that MySQL itself uses to brush dirty, read and write data, and also include threads that deal with the user's SQL. Let's call it the user thread. The user thread has a special place: the SQL sent by the program side to the MySQL side will only be executed by one user thread (one-thread-per-connection), so when dealing with complex queries, MySQL will have the awkward phenomenon of "one core is difficult, multi-core onlookers".

Referring to the definition of% us, for Linux systems, the MySQL process and all threads it starts are not kernel processes, so both MySQL system threads and user threads are reflected in the% us indicator of CPU utilization when they are busy.

When will CPU be 100%?

When MySQL does something, CPU will be 100%? From the previous analysis, MySQL is mainly composed of two types of threads: system threads and user threads. Therefore, on MySQL's exclusive server, you only need to pay attention to these two types of threads, and you can Cover most of the problem scenarios.

System thread

In the actual environment, system threads rarely encounter problems. Generally speaking, multiple system threads rarely run full at the same time. As long as the number of available cores of the server is greater than or equal to 4, they generally do not encounter CPU 100%. Of course, some bug may have an impact, such as this:

MySQL BUG

Although the situation is relatively rare, the problem of the system thread also needs to be paid attention to in the process of routine troubleshooting.

User thread

When it is mentioned that users are busy with threads, they will definitely think of slow queries in the first place based on experience. It is true that more than 90% of the time is caused by "slow query", but as a methodology, we still have to draw a conclusion according to the analysis.

Referring to the definition of us%, it refers to how much time the user thread takes up the CPU, which means that the user thread takes up a lot of time.

On the one hand, it is calculating for a long time, such as order by,group by, temporary table, join and so on. This kind of problem may be that the query efficiency is not high, resulting in a single SQL statement taking up CPU time for a long time, or it may be that the simple amount of data is relatively large, resulting in a large amount of computation. On the other hand, the simple QPS pressure is high, so the CPU time is full. For example, a 4-core server is used to support 20k to 30k point queries. Each SQL does not take much CPU time, but because the overall QPS is very high, so the CPU time is full.

The orientation of the problem

After the analysis, the actual combat will begin. Here, according to the previous analysis, some classic CPU 100% scenarios are given, and a brief positioning method is given as a reference.

PS: the scenario skip of the bug of the system thread, which will be analyzed as a detailed case later.

Slow query

After the problem of CPU 100% has occurred, real slow queries and ordinary queries affected by CPU 100% will be mixed, so it is difficult to visually look at processlist or slowlog to find the culprit, so some clear features are needed to identify.

From the previous simple analysis, we can see that slow queries with low query efficiency usually have the following situations:

Full table scan: the value of Handler_read_rnd_next will increase dramatically, and the row_examined value of this type of query will be very high in slowlog.

Index efficiency is not high, the index is wrong: the value of Handler_read_next will increase sharply, but it should be noted that this situation may also be caused by a sudden increase in business volume, which needs to be looked at together with QPS/TPS. It will be troublesome to find this kind of query in slowlog. Generally speaking, the value of row_examined will be obviously different before and after the fault, or it is unreasonably high.

For example, in a data skew scenario, if the row_examined of a small range range query is very high in a particular range, while the row_examined is relatively low in other ranges, then the index may be inefficient.

There are more sorting: queries such as order by,group by are usually difficult to judge directly from the indicators of Handler. If there is no index or the index is not good, resulting in sorting operations are not eliminated, then you can usually see more of this kind of query statements in processlist and slowlog.

Of course, do not want to analyze MySQL indicators in detail or if the situation is more urgent, you can directly use rows_sent and row_examined in slowlog to do a simple division. For example, row_examined/rows_sent > 1000 can be taken out as a "suspect". Generally speaking, this kind of problem can be solved by optimizing the index.

PS:1000 is only an empirical value, depending on the actual business situation.

Large amount of calculation

This kind of problem is usually due to a large amount of data. Even if there is no problem with the index and the execution plan is OK, it will result in CPU 100%. Combined with the characteristics of MySQL one-thread-per-connection, it does not need too much concurrency to run full CPU utilization. This kind of query is actually easier to check, because the execution time is generally longer, which will be very conspicuous in processlist. On the contrary, it may not be found in slowlog, because statements that have not been executed will not be recorded.

Generally speaking, there are three more general solutions to this type of problem:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Separate reading and writing, and put this kind of query into the read-only slave database, which is not usually used in business.

Split the SQL in the program segment, splitting a single large query into multiple small queries.

Use OLAP schemes such as HBASE,Spark to support it.

High QPS

This kind of problem is simply the bottleneck of hardware resources. Whether it is the ratio of row_examined/rows_sent, the index of SQL, the execution plan, or the calculation of SQL, there will be no obvious problems, but the QPS index will be relatively high, and you may not see anything in processlist, for example:

At this point, the study on "how to understand the problem of 100% occupation of CPU" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.