How to quickly analyze the performance of Linux Server 07/08 Update SLTechnology News&Howtos

How to quickly analyze the performance of Linux Server

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

How to quickly analyze the performance of the Linux server, I believe that many inexperienced people do not know what to do. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

As a linux system operator, the main task is to optimize the system configuration to make the application run in the optimal state. However, due to the complexity and variability of hardware problems, software problems, network environment and so on, the optimization of the system becomes extremely complex.

1 cpu performance evaluation

Cpu is one of the main factors that affect the performance of Linux. Here are a few commands to view the performance of CPU.

1.1 vmstat command

This command can display brief information about the performance related to the various resources of the system, and here we mainly use it to look at a load of CPU.

The following is the output of the vmstat command on a system:

[root@node1] # vmstat 23 procs-memory-----swap- io---system- cpu-- r b swpd free buff cache si so bi bo in cs us sy id wa st 00 0 162240 8304 67032 00 13 21 1007 23 01 98 00 00 01 62240 8304 00 10 1010 20 01 00 00 0 162240 8304 67032 00 1 1009 18 01 99 00

The output of each of the above items is explained as follows:

The  procs  r column represents the number of processes running and waiting for cpu time slices. If this value is longer than the number of system CPU, it means that CPU is insufficient and CPU needs to be increased. The  b column indicates the number of processes waiting for a resource, such as waiting for Imax O, or memory swapping, and so on. The  memory  swpd column represents the amount of memory (in k) that is switched to the memory swap. If the value of swpd is not 0, or relatively large, as long as the value of si and so is 0 for a long time, there is generally no need to worry and will not affect the system performance. The  free column represents the amount of physical memory currently free (in k) the  buff column represents the amount of memory in buffers cache, which is generally buffered for reads and writes to block devices. The  cache column indicates the amount of memory of the page cached. Generally, as a file system cached, frequently accessed files will be cached. If the cache value is high, the number of files in the cached is larger. If the bi in the IO is relatively small, the file system efficiency is better. The  swap  si column represents the amount of memory called into memory from disk, that is, the amount of memory that goes into the memory swap. The  so column represents the amount of memory transferred from memory to disk, that is, the amount of memory swapped into memory.

In general, the values of si and so are 0. If the values of si and so are not 0 for a long time, the system is out of memory. Additional system memory is required.

The  IO entry shows disk read and write status the  Bi column represents the total amount of data read from the block device (that is, read disk) (kb per second). The  Bo column represents the total amount of data written to the block device (that is, write disk) (kb per second)

Here we set the bi+bo reference value as 1000. If it exceeds 1000 and the wa value is high, it means that there is a problem with the system disk IO, and we should consider improving the read and write performance of the disk.

 system displays the number of interrupts that occurred during the acquisition interval. The  in column represents the number of device interrupts per second observed during a time interval. The  cs column represents the number of context switches per second.

The higher the above two values, the more CPU time will be consumed by the kernel.

The  CPU item shows the usage status of CPU, and this column is our focus. The  us column shows the percentage of CPU time consumed by the user process. When the value of us is high, it means that the user process consumes more cpu time, but if the long-term value is greater than 50%, you need to consider optimizing the program or algorithm. The  sy column shows the percentage of CPU time consumed by kernel processes. When the value of Sy is high, the kernel consumes a lot of CPU resources.

As a rule of thumb, the reference value of us+sy is 80%. If us+sy is greater than 80%, there may be insufficient CPU resources.

The  id column shows the percentage of time that CPU has been idle. The  wa column shows the percentage of CPU time spent waiting for IO. The higher the wa value, the more serious the IO wait. According to experience, the reference value of wa is 20%. If the wa exceeds 20%, the IO wait is serious. The cause of IO waiting may be caused by a large number of random reads and writes of the disk, or by the bandwidth bottleneck of the disk or disk controller (mainly block operation).

To sum up, in the evaluation of CPU, it is important to pay attention to the values of the r column of the procs item and the values of the us, sy, and id columns of the CPU item.

1.2 sar command

The second tool for checking CPU performance is that sar,sar is very powerful and can count every aspect of the system separately, but using the sar command will increase the system overhead, but these overhead can be evaluated and will not have much impact on the statistical results of the system.

The following is the CPU statistical output of the sar command for a system:

[root@webserver] # sar-u 3 5 Linux 2.6.9-42.ELsmp (webserver) 11 AM CPU 28 CPU 2008 _ i6868 CPU 11:41:24 AM CPU% user% system% iowait% steal% idle 11:41:27 AM all 0.88 0.00 0.00 0.00 98.83 11:41:30 AM all 0.13 0.00 0.21 0.00 99.50 11:41:33 AM all 0.04 0.00 0.04 0.00 0.00 99.92 11:41:36 AM all 0.29 0.00 0.13 0.00 0.00 99.58 11:41:39 AM all 0.38 0.00 0.17 0.04 0.00 99.41 Average: all 0.34 0.00 0.16 0.05 0.00 99.45

The output of each of the above items is explained as follows:

The % user column shows the percentage of CPU time consumed by the user process. The % nice column shows the percentage of CPU time spent running a normal process. The % system column shows the percentage of CPU time consumed by the system process. The % iowait column shows the percentage of CPU time spent waiting by IO. The % steal column shows the steal operations that pagein forces on different pages in a relatively memory-tight environment. The % idle column shows the percentage of time that CPU has been idle.

This output is a statistic of the overall CPU usage of the system, the output of each item is very intuitive, and the last line of Average is a summary line, which is an average of the above statistics.

It is important to note that the statistics in the first row contain the statistical consumption of sar itself, so the value of the% user column will be a little higher, but this will not have much impact on the statistical results.

In a multi-CPU system, if the program uses a single thread, there will be such a phenomenon that the overall utilization rate of CPU is not high, but the response of the system application is slow. This may be due to the reason that the program uses a single thread, which only uses one CPU, resulting in the CPU occupancy rate of 100%, unable to process other requests, while other CPU is idle, which leads to the low overall CPU utilization rate. The phenomenon of slow application occurs.

To solve this problem, you can query each CPU of the system separately and count the usage of each CPU:

[root@webserver] # sar-P 0 35 Linux 2.6.9-42.ELsmp (webserver) 11 Universe 29 CPU 2008 _ i6868 CPU 06:29:33 PM CPU% user% system% iowait% steal% idle 06:29:36 PM 0 3.00 0.00 0.33 0.00 0.00 96.67 06:29:39 PM 0 0.67 0.33 0.00 0.00 0.00 99.00 06:29:42 PM 0 0.000. 00 0.33 0.00 0.00 99.67 06:29:45 PM 0 0.67 0.00 0.33 0.00 0.00 99.00 06:29:48 PM 0 1.00 0.00 0.33 0.33 0.00 98.34 Average: 0 1.07 0.00 0.33 0.07 0.00 98.53

This output is the statistics of the first CPU of the system. It should be noted that the count of CPU in sar starts from 0, so "sar-P 035" means to count the first CPU of the system, and "sar-P 435" means to count the fifth CPU of the system. And so on. As you can see, the above system has eight CPU.

1.3 iostat command

The iostat instruction is mainly used to count disk IO status, but it can also view the usage information of CPU. Its limitation is that it can only display the average information of all CPU in the system. See the following output:

[root@webserver] # iostat-c Linux 2.6.9-42.ELsmp (webserver) 11 avg-cpu:% user% nice% system% iowait% steal% idle 2.52 0.00 0.30 0.24 0.00 96.96

Here, we use the "- c" parameter, which only displays the statistical information of the system CPU. Each item in the output has exactly the same meaning as the output item of the sar command, and will not go into detail.

1.4 uptime command

Uptime is the most commonly used command to monitor the performance of the system, which is mainly used to count the current operation status of the system. The output information is in the following order: the current time of the system, how long the system has been running since the last boot, how many login users are currently in the system, and the average load of the system within one minute, five minutes and fifteen minutes. Take a look at the following output:

[root@webserver] # uptime 18:52:11 up 27 days, 19:44, 2 users, load average: 0.12,0.08,0.08

It should be noted here that the output value of load average. Generally speaking, the size of these three values cannot be greater than the number of system CPU. For example, the system has 8 CPU in this output. If the three values of load average are greater than 8 for a long time, it means that CPU is very busy and the load is high, which may affect system performance, but occasionally when it is greater than 8, do not worry, generally will not affect system performance. On the contrary, if the output value of load average is less than the number of CPU, it means that CPU still has free time slices, such as the output in this example, CPU is very idle.

1.5 Summary of this section

Four commands for checking CPU usage are described above. What you need to know through these commands is whether the system CPU has a performance bottleneck, that is to say, these commands can only check whether the CPU is busy and whether the load is too heavy, but it is impossible to know why the CPU is overloaded. Therefore, after judging that there is a problem with the system CPU, we should further check which processes cause the CPU overload in combination with top, ps and other commands. The reason for the shortage of CPU resources may be caused by unreasonable applications, or it may be caused by the lack of hardware resources, so it is necessary to analyze specific problems, or optimize applications, or increase system CPU resources.

2 memory performance evaluation

Memory management and optimization is an important part of system performance optimization, whether the memory resources are sufficient or not directly affects the performance of the application system. Before memory optimization, we must be familiar with the memory management mechanism of linux, which we have described in depth in the previous chapter. The focus of this section is how to monitor the memory usage of the linux system through system commands.

2.1 free command

Free is the most commonly used instruction for monitoring linux memory usage. Look at the following output:

[root@webserver] # free-m total used free shared buffers cached Mem: 8111 7185 925 0243 6299-/ + buffers/cache: 643 7468 Swap: 8189 0 8189

"free-m" means to view memory usage in M units. In this output, we should focus on the output values of free column and cached column. As can be seen from the output, this system has a total of 8g of memory and 925m of free memory, of which Buffer Cache occupies 243m and Page Cache occupies 6299m. From this we can see that the system caches a lot of files and directories, while for applications, there is still 7468m of memory available. Of course, this 7468m contains the values of Buffer Cache and Page Cache. As you can see in the swap entry, the swap partition is not yet in use. Therefore, from the application point of view, the memory resources of this system are still very sufficient.

Generally, there is such an empirical formula: when the application available memory / system physical memory is more than 70%, it indicates that the system memory resources are very sufficient and does not affect system performance, and the application available memory / system physical memory

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.