How to diagnose server performance by segmented troubleshooting method 07/02 Update SLTechnology News&Howtos

How to diagnose server performance by segmented troubleshooting method

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

In this issue, the editor will bring you about how to use segmented screening method to diagnose server performance. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.

In the daily performance testing, we need to master some common strategies and it is necessary to understand and master the commands to view the server performance to troubleshoot and diagnose the server performance.

Today, I would like to share with you about the rapid use of commands to diagnose the performance of the server in the process of stress testing.

The method of segmented troubleshooting of Linux server is excluded according to the order of CPU, memory, disk IO and network. The reference flow chart is as follows:

Analysis steps:

Step1 uses top to see an overview of the system:

Focus on the utilization of cpu (sys+us). If this value continues to be greater than 80%, you can observe the process view to see whether non-core application processes take up a lot of CPU resources. If you can rule out the influence of non-core application processes, you can basically determine that the system is facing a shortage of cpu resources. At this time, combined with vmstat monitoring, it is observed that the r value of proc column should be larger; similarly, when the cpu utilization is very low, but the running process queue (r value) is very large, there is blocking on the surface of cpu.

Focus on% idle (percentage of cpu idle time). If this value is high but the system response is slow, CPU may be waiting to allocate memory, so you should pay attention to memory usage (see step3). If the idle time% idle persists at 0 and the system time (cpu sy) is twice as long as the user time (cpu us), the system faces a shortage of CPU resources.

Focus on% wait (percentage of time cpu waits for IO). If the cpu resources are not exhausted, a high value may indicate that there may be a bottleneck in the system storage IO. The reasons for the problem may be: (1) an application problem (the application itself has many IO requests); (2) insufficient physical memory; and (3) inefficient IO O subsystem configuration. At this point, you should first check whether it is an application problem, and then check the memory usage of the system. If you swap too many pages, make sure that the disk IO problem is caused by insufficient physical memory (see step3); if not, check the system disk and combine iostat to verify that this phenomenon is caused by too much application IO (see step4).

Step2 combines vmstat and sar to monitor cpu:

Focus on 4 cpu columns and 2 procs (kernel threads) columns in the report

R: the number of processes waiting on the CPU resource. This data is a better representation of the CPU load than the average load, and does not contain processes waiting for IO. If this value is greater than the number of logical CPU cores in the system, it means that the system is running slowly and there are most processes waiting for CPU, then the CPU resources of the system are saturated.

Us, sy, id, wa, st: these all represent CPU time consumption, which represents user time (user), system (kernel) time (sys), idle time (idle), IO wait time (wait), and stolen time (stolen, which is generally consumed by other virtual machines). These CPU times can quickly tell whether the CPU is busy or not. In general, if the sum of user time and system time is very large, CPU is busy executing instructions. If the IO wait time is long, then the bottleneck of the system may be on disk IO.

Sar-P ALL: query cpu separately, count the usage of each cpu, and check whether the load of multiple cpu is balanced.

Through the first two steps, you can basically determine whether there is a bottleneck in cpu:

If cpu resources are insufficient, you can adjust the application's occupation of CPU, so that the application can use CPU more effectively, and consider adding more CPU at the same time.

If cpu is not the bottleneck, focus on system memory.

Step3 uses vmstat to view memory usage:

Each line will output some system core indicators, which can give us a more detailed understanding of the system status. This is followed by parameter 1, which means that statistics are output once per second, and parameter 2, which means that statistics are output a total of 2 times. The header indicates the meaning of each column, which only introduces some columns related to memory performance tuning:

Memory region

Swpd: indicates the amount of memory switched to the memory swap area, that is, the amount of virtual memory used (in KB). If it is greater than 0, your machine is out of physical memory. If it is not the cause of the program memory leak, then you should upgrade the memory or migrate the memory-consuming tasks to other machines.

Free: represents the current free physical memory (in kilobytes), which can also cause system performance problems if the remaining memory is insufficient.

Buff: indicates the baffers cached memory size, that is, the buffer size, which is generally needed for reading and writing to block devices.

Cache: indicates the memory size of the page cached, that is, the cache size. It is generally buffered as a file system, and frequently accessed files will be cached. If the cache value is very large, there are more cache files. If the bi in the io is relatively small, the file system efficiency is better.

Swap region

Si: indicates that a disk is transferred to memory, that is, the amount of memory that goes into the memory swap area; generally speaking, the size of virtual memory is read from disk every second. If this value is greater than 0, physical memory is insufficient or memory is leaked. Find the memory-consuming process to solve it.

So: indicates the amount of memory entering the disk from memory, that is, the amount of memory that enters memory from the memory swap.

Note: in general, the values of si and so are 0. If the values of si and so are not 0 for a long time, the system memory is insufficient and system memory needs to be increased.

Step4 uses iostat to view disk IO

Tps: the number of transmissions per second for the device (Indicate the number of transfers persecond that were issued to the device.). "one transfer" means "one Icano request". More than one logical request may be merged into an Ihop O request. The size of the one transfer request is unknown.

KB_read/s: the amount of data read from the device (drive expressed) per second

KB_wrtn/s: the amount of data written to the device (drive expressed) per second

KB_read: total amount of data read

KB_wrtn: the total amount of data written; these units are Kilobytes

Pay attention to% iowait. If CPU and memory constraints do not exist, and% iowait is more than 25% for a long time, then IO is considered to have a bottleneck.

Collect disk IO data throughput (iostat-d-k), roughly estimate whether the system data throughput matches the application load, and check whether there are a large number of IO operations that have nothing to do with business.

Step5 uses sar-d to view disk reads and writes:

Where:

Tps: the number of times per second from the physical disk. Multiple logical requests are merged into a single Igamot O disk request, and the size of a transfer is uncertain.

Rd_sec/s: the number of times sectors are read per second.

Wr_sec/s: the number of times sectors are written per second.

Avgrq-sz: the average data size (sector) per device Istroke O operation.

Avgqu-sz: the average length of the disk request queue.

Await: the average elapsed time of each request, including the request queue wait time, in milliseconds (1 second = 1000 milliseconds) from the time the request disk operation is completed to the completion of the system processing.

Svctm: the average time that the system processes each request, excluding the time spent in the request queue.

% util:I/O requests as a percentage of CPU, the higher the ratio, the more saturated it is.

Under normal circumstances, avserv should be less than the avWait value. If the value of avserv is very close to that of avwait, it means that there is almost no avwait waiting, and the disk performance is very good; if the value of avwait is much higher than the value of AvWit, it means that the waiting time of the avserv O queue is too long, the applications running on the system will slow down, and disk IO is the system bottleneck.

Step6 uses netstat to view the network:

1. Check the connectivity of the network through the ping command

2. Check the status of network interface by netstat-nltp combination.

-u (udp) shows only udp related options

-l lists only the status of the service in Listen (monitoring)

-n refuses to display aliases and can show that all numbers are converted into numbers.

-t (tcp) displays only tcp related options

-p displays the name of the program that established the relevant link

3. Detect the routing table information of the system by netstat-r combination.

Common signs of cpu bottleneck:

Slow response time

Cpu idle time is zero

Excessive users take up cpu time

Excessive system takes up cpu time

There are long queues of running processes for a long time

Cpu tuning method:

Balance the system load and run processes at different times to make more efficient use of the 24 hours a day.

Use nice or renice to optimize the scheduler-- running processes can be assigned different priorities to avoid consuming large amounts of cpu resources.

Add more resources, add more cpu

Common signs of memory bottlenecks:

A high page change rate

The utilization of exchange space is very high.

The process enters an inactive state

The activity of all disks in the swap area is very high.

High global system CPU utilization

There is not enough memory. Error

Memory tuning method:

Ensure a reasonable allocation of swap space (allocate enough swap space, each swap space has the same size, each swap space is allocated on a different hard disk)

Parameter adjustment, adjust memory parameter threshold

Increase memory resources

Common signs of IO bottleneck:

Excessive disk utilization

Too long disk waiting queue

The percentage of time waiting for disk Iripple O is too high.

Too high physical Istroke O rate

Too low cache rate

Too long running process queue, but CPU is idle

IO tuning method:

In general, a high% iowait indicates that the system has at least one application problem, a lack of memory, or an inefficient Icano subsystem configuration.

You should check whether the application generates a large number of IO requests properly

Check to see if the IO problem is caused by frequent paging of memory swap space

Check whether the disk configuration is reasonable

For the optimization of disk IO itself, although there are some virtual memory equivalents of the optimization parameters of Imax O, the way to improve the performance of disk Imax O is still to configure the Linux system correctly, not just to optimize the related parameters.

The above is the editor for you to share how to use segmented troubleshooting method to diagnose server performance, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.