How to calculate the CPU utilization in Linux 07/06 Update SLTechnology News&Howtos

How to calculate the CPU utilization in Linux

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

This article comes from the official account of Wechat: developing Internal skills practice (ID:kfngxl). Author: Zhang Yanfei allen

Hello, everyone. I'm Brother Fei!

When online servers observe the running status of online services, most people like to first use the top command to see the overall cpu utilization of the current system. For example, for a machine that comes with you, the top command displays the following utilization information

The output is simple and simple, and it is not so easy to understand all of it. For example:

How is the utilization information calculated from the question 1:top output, and is it accurate?

The question 2:ni column is nice, which outputs the cost of cpu when it is processing.

The question 3:wa represents io wait, so is cpu busy or free during this time?

Today we have an in-depth study of cpu utilization statistics. Through today's study, you can not only understand the details of cpu utilization statistics, but also have a deeper understanding of nice, io wait and other indicators.

Different from previous articles, today we don't directly enter the Linux implementation, but start with our own thinking!

First, think about putting aside the implementation of Linux, if there are the following requirements, there is a quad-core server with four processes running on it.

Let you design this requirement to calculate the cpu utilization of the entire system, support output such as the top command, and meet the following requirements:

Cpu usage should be as accurate as possible

To be able to reflect the instantaneous cpu state in seconds.

You can stop to read and think for a few minutes.

Okay, end of thinking. After thinking about it, you will find that this seemingly simple requirement is actually a little complicated.

One idea is to add up the execution time of all processes and then divide it by the total system execution time * 4.

There is no problem with this way of thinking, it is possible to use this method to calculate the cpu utilization rate for a long time, and the statistics are accurate enough.

But as long as you have used top, you know that the cpu utilization of top output is not constant for a long time, but will be dynamically updated in 3 seconds by default (this interval can be set using-d). Our plan shows that the total utilization rate is OK, but it is difficult to reflect this instantaneous state. You might think that I can just count once in three seconds? But what time does this three-second time start? The granularity is very difficult to control.

The core of the last train of thought problem is how to solve the instantaneous problem. When it comes to the instantaneous state, you may be thinking again. Then I'll use instantaneous sampling to see how many cores are busy at the moment. If two of the four cores are busy, the utilization rate is 50%.

This way of thinking is also in the right direction, but there are two problems:

The numbers you calculate are all integral multiples of 25%.

This instantaneous value can cause a sharp shock in the display of cpu usage.

For example, the following figure:

In the transient state of T1, there is no doubt that the cpu utilization of the system is 100%, but in T2 time, the utilization rate is 0% again. The train of thought is in the right direction, but it is clear that this rough calculation does not work as gracefully as the top command.

If we improve it again and combine the above two ideas, we may be able to solve our problems. In sampling, we make the period finer, but in calculation we make the period thicker.

We introduce the concept of period, such as sampling every 1 millisecond. If the sampling is instantaneous and the cpu is running, record the 1 ms as used. At this point, you will get an instantaneous cpu usage and save it all.

When counting the cpu usage within 3 seconds, such as the time range of T1 and T2 in the figure above. Then add up all the instantaneous values during this period and take an average. In this way, the above problems can be solved, and the statistics are relatively accurate, avoiding the problem that the instantaneous value fluctuates violently and the grain is too thick (which can only be changed by 25%).

Some students may ask, what if the cpu changes between the two samples, as shown in the figure below.

At the time of the arrival of the current sampling point, process A has just been executed, and there is a little bit of time that has not been counted by the previous sampling point, nor this time. For process B, it only started for a short period of time, and it seems a little too much to write down all 1 ms.

This problem does exist, but because our sampling is 1 ms once, and when we actually look at it and use it at least in seconds, it will include information about thousands of sampling points, so this error will not affect our grasp of the overall situation.

In fact, this is how Linux counts the system cpu utilization. Although there may be errors, it is sufficient to use it as a statistic. In implementation, Linux accumulates all the instantaneous values to a certain data, rather than actually storing many copies of instantaneous data.

Next, let's go to Linux to see its implementation of system cpu utilization statistics.

Where does the top command use the data? the Linux we mentioned in the previous section accumulates the instantaneous value to a certain data, which is exposed to the user state by the kernel through / proc/ stat pseudo file. This is what Linux uses when calculating system cpu utilization.

Overall, the internal details of how the top command works are shown in the figure below.

Top command access / proc/ stat to get each cpu utilization usage value

The kernel calls the stat_open function to handle access to / proc/ stat

The data accessed by the kernel comes from the kernel_cpustat array and is summarized.

Print output to user mode

Next, let's take a closer look at each step.

You can see its calls to the file by using strace to track the various system calls to the top command.

# strace topopenat (AT_FDCWD, "/ proc/stat", O_RDONLY) = 4openat (AT_FDCWD, "/ proc/2351514/stat", O_RDONLY) = 8openat (AT_FDCWD, "/ proc/2393539/stat", O_RDONLY) = 8 in addition to / proc/stat, there is also / proc/ {pid} / stat for each process, which is used to calculate the cpu utilization of each process.

The kernel defines processing functions for each pseudo file, and the processing method for the / proc/ stat file is proc_stat_operations.

/ / file:fs/proc/stat.cstatic int _ init proc_stat_init (void) {proc_create ("stat", 0, NULL, & proc_stat_operations); return 0;} static const struct file_operations proc_stat_operations = {.open = stat_open,}; proc_stat_operations contains the corresponding operation method for the file. When the / proc/ stat file is opened, stat_open will be called. Stat_open in turn calls single_open_size,show_stat to output the data content. Let's take a look at its code:

/ file:fs/proc/stat.cstatic int show_stat (struct seq_file * p, void * v) {U64 user, nice, system, idle, iowait, irq, softirq, steal; for_each_possible_cpu (I) {struct kernel_cpustat * kcs = & kcpustat_cpu (I); user + = kcs- > cpustat [CPUTIME _ USER]; nice + = kcs- > cpustat [CPUTIME _ NICE]; system + = kcs- > cpustat [CPUTIME _ SYSTEM]; idle + = get_idle_time (kcs, I) Iowait + = get_iowait_time (kcs, I); irq + = kcs- > cpustat [CPUTIME _ IRQ]; softirq + = kcs- > cpustat [CPUTIME _ SOFTIRQ];...} / / converted to the number of beats and printed out seq_put_decimal_ull (p, "cpu", nsec_to_clock_t (user)); seq_put_decimal_ull (p, "", nsec_to_clock_t (nice)) Seq_put_decimal_ull (p, ", nsec_to_clock_t (system)); seq_put_decimal_ull (p,", nsec_to_clock_t (idle)); seq_put_decimal_ull (p, ", nsec_to_clock_t (iowait)); seq_put_decimal_ull (p,", nsec_to_clock_t (irq) Seq_put_decimal_ull (p, ", nsec_to_clock_t (softirq));.} in the above code, for_each_possible_cpu is traversing the kcpustat_cpu variable that stores cpu usage data. This variable is a percpu variable that prepares an array element for each logical core. It stores all kinds of events corresponding to the current core, including user, nice, system, idel, iowait, irq, softirq and so on.

In this cycle, each utilization of each core is added up. Finally, these data are output through seq_put_decimal_ull.

Note that the number of nanoseconds is actually recorded at each time in the kernel, but it is uniformly converted to beat units at the time of output. As for how long the beat unit is, we will introduce it in the next section. In short, the output of / proc/ stat is read from the percpu variable kernel_cpustat.

Let's go on to see when the data in this variable is added.

Third, where do the statistics come from? we mentioned earlier that the kernel counts cpu usage by sampling. This sampling period depends on the timer in the Linux time subsystem.

The Linux kernel emits timer interrupt (IRQ 0) every fixed cycle, which is a bit like the concept of beat in sheet music. Every once in a while, a beat is played, and Linux responds and handles something.

The length of a beat is defined by CONFIG_HZ. It defines how many times a second there are timer interrupts. The size of this beat may vary from system to system, usually between 1 ms and 10 ms. You can find its configuration in your own Linux config file.

# grep ^ CONFIG_HZ / boot/config-5.4.56.bsk.10-64CONFIG_HZ=1000 you can see from the above results that my machine beats 1000 beats per second. That is, every 1 ms.

Every time the time interruption comes, update_process_times is called to update the system time. The updated time is stored in the percpu variable kcpustat_cpu that we mentioned earlier.

Let's take a closer look at the source code of the summary process update_process_times, which is located in the kernel / time / timer.c file.

/ / file:kernel/time/timer.cvoid update_process_times (int user_tick) {struct task_struct * p = current; / / perform time accumulation processing account_process_tick (p, user_tick);} whether the parameter user_tick of this function is worth sampling is in kernel state or user state. Next, call account_process_tick.

/ / file:kernel/sched/cputime.cvoid account_process_tick (struct task_struct * p, int user_tick) {cputime = TICK_NSEC;... If (user_tick) / / 3.1Statistical user state time account_user_time (p, cputime); else if ((p! = rq- > idle) | | (irq_count ()! = HARDIRQ_OFFSET)) / / 3.2.Statistical kernel state time account_system_time (p, HARDIRQ_OFFSET, cputime); else / 3.3.Statistical idle time account_idle_time (cputime) } in this function, you first set cputime = TICK_NSEC. A TICK_NSEC is defined as the number of nanoseconds occupied by a beat. Then, according to the judgment result, account_user_time, account_system_time and account_idle_time are executed respectively to calculate the user state, kernel state and idle time.

3.1 user mode time statistics / / file:kernel/sched/cputime.cvoid account_user_time (struct task_struct * p, U64 cputime) {/ / statistics of user CPU usage int index; index = (task_nice (p) 0) in two cases? CPUTIME_NICE: CPUTIME_USER; / / accumulate time into / proc/stat task_group_account_field (p, index, cputime);} account_user_time function mainly includes two statistics:

If the nice value of the process is greater than 0, it will be added to the nice field of the CPU statistical structure.

If the nice value of the process is less than or equal to 0, it is added to the user field of the CPU statistical structure.

See here, the opening question 2 has the answer, in fact, the time of user state is not only the user field, but also nice. The reason for separating nice is to give Linux users a clearer view of how many cpu cycles are occupied by processes that have called nice.

If we want to observe the time spent in the user mode of the system, we should add up the user and nice output in top, instead of just looking at user!

Then call task_group_account_field to add time to the kernel_cpustat kernel variable we used earlier.

/ / file:kernel/sched/cputime.cstatic inline void task_group_account_field (struct task_struct * p, int index, U64 tmp) {_ _ this_cpu_add (kernel_cpustat.cpustat [index], tmp);.} 3.2 Kernel state time statistics. Let's look at how kernel state time is counted and find the account_system_time code.

/ / file:kernel/sched/cputime.cvoid account_system_time (struct task_struct * p, int hardirq_offset, U64 cputime) {if (hardirq_count ()-hardirq_offset) index = CPUTIME_IRQ; else if (in_serving_softirq ()) index = CPUTIME_SOFTIRQ; else index = CPUTIME_SYSTEM; account_system_index_time (p, cputime, index).

If you are currently in the context of hard interrupt execution, count it in the irq field

If you are currently in the context of soft interrupt execution, count it in the softirq field

Otherwise, it will be counted in the system field.

After determining which statistical item to add, call account_system_index_time and task_group_account_field to add this time to the kernel variable kernel_cpustat

/ / file:kernel/sched/cputime.cstatic inline void task_group_account_field (struct task_struct * p, int index, U64 tmp) {_ _ this_cpu_add (kernel_cpustat.cpustat [index], tmp);} 3.3 the accumulation of idle time is true. In the kernel variable kernel_cpustat, not only the usage statistics of various user states and kernel states are counted, but also the idle time is counted.

If the cpu is neither kernel nor user at the moment of sampling, the time of the current beat is added to the idle.

/ / file:kernel/sched/cputime.cvoid account_idle_time (u64 cputime) {u64 * cpustat = kcpustat_this_cpu-cpustat; struct rq * rq = this_rq (); if (atomic_read (& rq-nr_iowait) 0) cpustat [CPUTIME _ IOWAIT] + = cputime; else cpustat [CPUTIME _ IDLE] + = cputime } if cpu is idle, further determine whether you are waiting for IO (such as disk IO). If so, this idle time will be added to iowait, otherwise it will be added to idle. From here, we can see that iowait is actually cpu's free time, just waiting for IO to finish.

See here, the opening question 3 also has a very clear answer, io wait is actually a statistics of the idle state of cpu, but the difference between this state and idle is that cpu is idle because it is waiting for io.

4. Summary this paper deeply analyzes the internal principle of CPU utilization of Linux statistical system. The contents of the full text can be summarized in the following picture:

The timer in Linux samples the usage of each cpu core at a fixed time, such as 1 ms, and then accumulates all the time of the current beat to an item in user / nice / system / irq / softirq / io_wait / idle.

The top command reads the cpu utilization data output from / proc/ stat, which is summarized and output according to kernel_cpustat in the kernel.

Going back to the opening question 1 how is the utilization information calculated from top output? is it accurate?

The / proc/ stat file outputs the number of beats occupied by each metric at a point in time. If you want to output a percentage like top, the calculation process is divided into two time points T1, T2 to get the relevant output in the stat file, and then through a simple arithmetic operation can calculate the current cpu utilization.

I also provided a simple shell code that you can download and use to actually check the cpu utilization of your server. I put it on my github.

Github address: https://github.com/ yanfeizhang / coder-kung-fu / blob / main / tests / cpu / test06 / cpu_stat.sh

Besides, it's accurate. This statistical method is sampling, as long as it is sampling, it is certainly not 100% accurate. However, since we often calculate the usage of cpu for a second or more, which includes a lot of sampling points, it is not a problem to check the overall situation.

In addition, we have also learned from this article that the output cpu time items in top can be roughly divided into three categories:

The first category: user mode elapsed time, including user and nice. If you want to see the consumption in user mode, you should add up user and nice.

The second category: kernel state elapsed time, including irq, softirq and system.

The third category: free time, including io_wait and idle. Io_wait is also the idle state of cpu, just waiting for io to finish. If you just want to see how free cpu is, you should add up io_wait and idle.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.