How to use the IO wait of linux system monitoring and diagnosis tools 04/10 Update SLTechnology News&Howtos

How to use the IO wait of linux system monitoring and diagnosis tools

2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly shows you "how to use the IO wait of linux system monitoring and diagnosis tools". The content is simple and clear. I hope it can help you solve your doubts. Let me lead you to study and learn how to use the IO wait of linux system monitoring and diagnosis tools.

1. Question:

Recently, when I was doing real-time log synchronization, I had done a single online log stress test before it was launched, and there was no problem with message queue, client and local machine, but unexpectedly, after the second log, the problem came:

The top of a certain machine in the cluster sees a huge load. The hardware configuration of the machines in the cluster is the same, and the deployed software is the same. However, this one alone has a problem with the load. Preliminary guess is that there may be something wrong with the hardware.

At the same time, we also need to find out the culprit with abnormal load, and then find solutions from the software and hardware levels respectively.

2. Troubleshooting:

From the top, you can see that load average is high,% wa is high, and% us is very low:

From the above picture, we can roughly infer that IO has encountered a bottleneck, below we can use the relevant IO diagnostic tools, specific verification and troubleshooting.

PS: if you don't know how to use top, please refer to a blog post I wrote last year:

Top detailed explanation of Monitoring and diagnosis tools for linux system

There are several common combinations as follows:

Check whether it is the bottleneck of CPU with vmstat, sar and iostat.

Use free and vmstat to detect whether it is a memory bottleneck

Use iostat and dmesg to check if it is the bottleneck of disk Izumo.

Use netstat to detect whether it is the bottleneck of network bandwidth

2.1 vmstat

The vmstat command means to display the status of virtual memory ("Viryual Memor Statics"), but it can report the overall running status of the system about processes, memory, iMacro, and so on.

Its related fields are described as follows:

Procs (processes) r: the number of processes in the run queue, which can also determine whether CPU needs to be increased. (long-term greater than 1) b: the number of processes waiting for IO, that is, the number of processes in uninterrupted sleep, shows the number of tasks executing and waiting for CPU resources. When this value exceeds the number of CPU, there will be CPU bottleneck Memory (memory) swpd: use virtual memory size, if the value of swpd is not 0, but the value of SI,SO is 0 for a long time, this situation will not affect system performance. Free: the amount of free physical memory. Buff: the amount of memory used as a buffer. Cache: the amount of memory used for caching. If the value of cache is large, it means that there are a large number of files at cache. If frequently accessed files can be accessed by cache, then the read IO bi of the disk will be very small. Swap si: the amount of memory written from the swap area to memory per second, transferred from disk to memory. So: the amount of memory written to the swap area per second, transferred from memory to disk. Note: when there is enough memory, these two values are both 0. If these two values are greater than 0 for a long time, the system performance will be affected and disk IO and CPU resources will be consumed. Some friends see free memory (free) very little or close to 0, they think that memory is not enough, can not just look at this point, but also combine si and so, if free is very little, but si and so are also very few (mostly 0), then do not worry, system performance will not be affected at this time. IO (the current Linux version of the block size is 1kb) bi: number of blocks read per second bo: number of blocks written per second Note: when reading and writing to random disks, the higher these two values (for example, more than 1024k), the greater the value that you can see that CPU is waiting in IO. System (system) in: interrupts per second, including clock interrupts. Cs: context switches per second. Note: the higher the above two values, the more CPU time you will see consumed by the kernel. CPU (expressed as a percentage) us: when the percentage of user process execution time (user time) us is high, it means that the user process consumes more CPU time, but if the long-term use exceeds 50%, then we should consider optimizing the program algorithm or accelerating. Sy: when the percentage of process execution time (system time) sy of the kernel system is high, it means that the system kernel consumes a lot of CPU resources, which is not a benign performance. We should check the reason. Wa: when the percentage of IO wait time wa is high, the IO wait is serious, which may be caused by a large number of random access to the disk, or there may be a bottleneck on the disk (block operation). Id: percentage of idle time

As can be seen from vmstat, CPU spends most of its time waiting for IO, which may be caused by a large number of random disk access or disk bandwidth. Bi and bo are also more than 1024k, which should be caused by IO bottleneck.

2.2 iostat

Let's use the more professional disk IO diagnostic tool to take a look at the relevant statistics.

Its related fields are described as follows:

Rrqm/s: the number of merge reads per second. That is, delta (rmerge) / s wrqm/s: the number of merge writes per second. That is, delta (wmerge) / s rstroke s: the number of times to read the iUnip O device per second. That is, delta (rio) / s wplink s: the number of times the write Icano device has been completed per second. That is, delta (wio) / s rsec/s: read sectors per second. Delta (rsect) / s wsec/s: the number of sectors written per second. That is, delta (wsect) / s rkB/s: read K bytes per second. Is half the rsect/s because the size of each sector is 512 bytes. (need to calculate) wkB/s: write K bytes per second. It's half of wsect/s. (need to be calculated) avgrq-sz: the average data size (sector) per device Imax O operation. Delta (rsect+wsect) / delta (rio+wio) avgqu-sz: the average length of the queue. That's delta (aveq) / aveq 1000 (because it's in milliseconds). Await: the average wait time (in milliseconds) for each device Istroke O operation. That is, delta (ruse+wuse) / delta (rio+wio) svctm: the average service time (in milliseconds) for each device Icano operation. That is, delta (use) / delta (rio+wio)% util: how many percent of the time in a second is spent on the Icano operation, or how much time in a second the Icano queue is not empty. That is, delta (use) / use 1000 (because it is in milliseconds)

You can see that the utilization rate of sdb in the two hard drives has reached 100%, and there is a serious IO bottleneck. The next step is to find out which process is reading and writing data to this hard disk.

2.3 iotop

According to the results of iotop, we quickly identified the problem with the flume process, resulting in a large number of IO wait.

But as I said at the beginning, the configuration of the machines in the cluster is the same, and the deployed programs are exactly the same as rsync used to be. Is the hard drive broken?

This needs to be checked by the operation and maintenance students. The conclusion of the * * is:

Sdb is double disk raid1, using raid card is "LSI Logic / Symbios Logic SAS1068E", no cache. The IOPS pressure of nearly 400 has reached the hardware limit. The raid card used by other machines is "LSI Logic / Symbios Logic MegaRAID SAS 1078" with 256MB cache, which does not reach the hardware bottleneck. The solution is to replace the machine that can provide larger IOPS.

However, as mentioned earlier, the purpose of our work in terms of hardware and software is to see if we can find the lowest cost solution respectively:

Now that we know the reason for the hardware, we can try to move the read and write operation to another disk, and then see the effect:

3. *: find another way

In fact, in addition to using the above professional tools to locate this problem, we can directly use the process status to find the relevant process.

We know that the process has the following states:

PROCESS STATE CODES D uninterruptible sleep (usually IO) R running or runnable (on run queue) S interruptible sleep (waiting for an event to complete) T stopped, either by a job control signal or because it is being traced. W paging (not valid since the 2.6.xx kernel) X dead (should never be seen) Z defunct ("zombie") process, terminated but not reaped by its parent.

The state of D is generally caused by the so-called "uninterrupted sleep" due to wait IO. We can start from this point and then locate the problem step by step:

For x in `seq 10`; do ps-eo state,pid,cmd | grep "^ D"; echo "-"; sleep 5; done D248 [jbd2/dm-0-8] D 16528 bonnie++-n 0-u 0-r 239-s 478-f-b-d / tmp-D 22 [kdmflush] D 16528 bonnie++-n 0-u 0-r 239-s 478-f-d / tmp-# or while true; do date Ps auxf | awk'{if ($8 million = "D") print $0;}'; sleep 1; done Tue Aug 23 20:03:54 CLT 2011 root 302 0.000? D May22 2:58\ _ [kdmflush] root 321 0.000? D May22 4:11\ _ [jbd2/dm-0-8] Tue Aug 23 20:03:55 CLT 2011 Tue Aug 23 20:03:56 CLT 2011 cat / proc/16528/io rchar: 48752567 wchar: 549961789 syscr: 5967 syscw: 67138 read_bytes: 49020928 write_bytes: 549961728 cancelled_write_bytes: 0 lsof-p 16528 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME bonnie++ 16528 root cwd DIR 252 04096 130597 / tmp bonnie++ 16528 Root 8u REG 252 0 501219328 131869 / tmp/Bonnie.16528 bonnie++ 16528 root 9u REG 252 people 0 501219328 131869 / tmp/Bonnie.16528 bonnie++ 16528 root 10u REG 252, 0 501219328 131869 / tmp/Bonnie.16528 bonnie++ 16528 root 11u REG 252, 0501219328 131869 / tmp/Bonnie.16528 bonnie++ 16528 root 12u REG 252, 0501219328 131869 / tmp/Bonnie.16528 df / tmp Filesystem 1K-blocks Used Available Use% Mounted on / dev/mapper/workstation-root 76671402628608 4653920 37 / fuser-vm / tmp USER PID ACCESS COMMAND / tmp: db2fenc1 1067.... m db2fmp db2fenc1 1071.... m db2fmp db2fenc1 2560.... m db2fmp db2fenc1 5221.... m db2fmp and above is "linux system monitoring, How to use the IO wait of Diagnostic tools? all the contents of this article Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.