What are iostat and iowait? 07/03 Update SLTechnology News&Howtos

What are iostat and iowait?

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what are iostat and iowait". Friends who are interested may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn what iostat and iowait are.

Iostat and iowait explain in detail

% iowait does not reflect disk bottleneck

What iowait actually measures is cpu time:

% iowait = (cpu idle time) / (all cpu time)

This article shows that high-speed cpu will result in a high iowit value, but this does not mean that the disk is the bottleneck of the system. The only way to show that the disk is a system bottleneck is a very high read/write time, which generally exceeds 20ms, which represents abnormal disk performance. Why 20ms? Generally speaking, a read and write is the time to find + a rotation delay + data transmission. Because the modern hard disk data transmission is a matter of microseconds or tens of microseconds, which is far less than the seek time 2~20ms and rotation delay 4~8ms, so only calculate these two times, that is, 15~20ms. As long as it is greater than 20ms, you must consider whether it is given too many times to read and write to the disk, resulting in reduced disk performance.

The author takes the AIX system as an example and uses its tool filemon to detect the average time spent reading and writing to the disk. Under Linux, you can also view disk performance through the iostat command. One of the items, svctm, reflects the load of the disk. If this item is greater than 15ms and the util% is close to 100%, it means that the disk is now the bottleneck of the entire system performance.

Iostat to understand the IO performance of linux hard disk

I haven't been very good at using this parameter before. Now a careful study of iostat, because there happens to be an important server high pressure, so put on the analysis. Here is the server where IO is under too much pressure.

$iostat-x 1

Linux 2.6.33-fukai (fukai-laptop) _ i6862CPU)

Avg-cpu:% user nice% system% iowait% steal% idle

5.47 0.50 8.96 48.26 0.00 36.82

Device: rrqm/s wrqm/s rUnip s wdeband s rsec/s wsec/s avgrq-sz avgqu-sz await svctm% util

Sda 6.00 273.00 99.00 7.00 2240.00 2240.00 42.26 1.12 10.57 7.96 84.40

Sdb 0.00 4.00 0.00 350.00 0.00 2068.00 5.91 0.55 1.58 0.54 18.80

Rrqm/s: the number of merge reads per second. Namely delta (rmerge) / s

Wrqm/s: the number of merge writes per second. Namely delta (wmerge) / s

Rdyne s: the number of times the read of the Iripple O device was completed per second. Namely delta (rio) / s

Wthumb s: the number of writes completed by the I _ swap O device per second. Namely delta (wio) / s

Rsec/s: read sectors per second. Namely delta (rsect) / s

Wsec/s: the number of sectors written per second. Namely delta (wsect) / s

RkB/s: read K bytes per second. Is half the rsect/s because the size of each sector is 512 bytes. (need to calculate)

WkB/s: write K bytes per second. It's half of wsect/s. (need to calculate)

Avgrq-sz: the average data size (sector) per device Istroke O operation. Delta (rsect+wsect) / delta (rio+wio)

Avgqu-sz: average Imax O queue length. That's delta (aveq) / aveq 1000 (because it's in milliseconds).

Await: the average wait time (in milliseconds) for each device Istroke O operation. That is, delta (ruse+wuse) / delta (rio+wio)

Svctm: the average service time (in milliseconds) for each device Istroke O operation. That is, delta (use) / delta (rio+wio)

% util: how many percent of the time in a second is spent on the Imax O operation, or how much time in a second the Ipicuro queue is not empty. That is, delta (use) / use 1000 (because it is in milliseconds)

If the% util is close to 100%, it means that too many requests have been generated, the system is fully loaded, and there may be a bottleneck on the disk.

When the idle is less than 70%, the IO pressure is higher, and generally the read speed has more wait.

At the same time, you can check parameter b (number of processes waiting for resources) and wa parameter (percentage of CPU time spent by IO waiting) in conjunction with vmstat. IO pressure is high when it is higher than 30%.

In addition, the parameters of await should also be referred to by svctm. If the difference is too high, there must be a problem with IO.

Avgqu-sz is also a place to pay attention to when doing IO tuning, which is the size of the data directly for each operation. If there are many times, but the data is small, the IO will also be very small. If the data is large, the data of IO will be high. It can also be obtained by avgqu-sz × (r _ or _ s _ or _ _ s) = rsec/s or wsec/s. In other words, the reading speed is determined by this.

In addition, you can also refer to

Svctm is generally less than await (because the waiting time of requests waiting at the same time is calculated repeatedly), the size of svctm is generally related to disk performance, the load of CPU/ memory will also affect it, and too many requests will indirectly lead to the increase of svctm. The size of the await generally depends on the service time (svctm) as well as the length of the Imax O queue and the mode in which the Imax O request is issued. If svctm is close to await, it means that there is almost no waiting time for await; if await is much larger than svctm, the queue is too long, and the response time of the application becomes slower. If the response time exceeds the range allowed by users, you can consider replacing faster disks, adjusting kernel elevator algorithm, optimizing applications, or upgrading CPU.

Queue length (avgqu-sz) can also be used as an index to measure the load of the system, but because the avgqu-sz is according to the average per unit time, it can not reflect the instantaneous flood.

A good example of someone else (vs O system). Supermarket queue)

For example, when we wait in line for checkout in the supermarket, how do we decide which cashier to go to? The first thing to do is to look at the number of people in the queue. Five people are faster than 20 people, right? In addition to counting the heads, we often look at the number of things purchased by the people in front of us. If there is an aunt who has been shopping for food for a week, then we can consider changing the queue. And then there is the speed of the cashier. If you meet a novice who doesn't even know the money, you will have to wait. In addition, timing is also important. the cashier, which was full five minutes ago, is now empty, and it's nice to pay at this time, of course, provided that what you have done in the past five minutes is more meaningful than waiting in line (but I haven't found anything more boring than waiting in line).

The Icano system also has many similarities with supermarket queues:

R/s+w/s is similar to the total number of payers

The average queue length (avgqu-sz) is similar to the average number of people in queue per unit time.

The average service time (svctm) is similar to the cashier's speed of collection.

The average waiting time (await) is similar to the average waiting time per person.

The average avgrq-sz is similar to what the average person buys.

The% util is similar to the percentage of time someone queued in front of the cashier.

Based on these data, we can analyze the pattern of the Ipicuro request, as well as the speed and response time of the Ithumb O.

The following is an analysis of the output of this parameter written by others

# iostat-x 1

Avg-cpu:% user% nice% sys% idle

16.24 0.00 4.31 79.44

Device: rrqm/s wrqm/s rUnip s wdeband s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm% util

/ dev/cciss/c0d0

0.00 44.90 1.02 27.55 8.16 579.59 4.08 289.80 20.57 22.35 78.21 5.00 14.29

The above iostat output shows that there are 28.57 device Istroke O operations in a second: total IO (io) / s = rpactack s (read) + wamp s (write) = 1.02 times 27.55 = 28.57 times per second, of which the write operation is dominant (wr = 27:1).

On average, only 5ms is needed for each device Icano operation, but each Icano request needs to wait for 78ms. Why? Because there are too many Icano requests (about 29 requests per second), assuming these requests are made at the same time, the average wait time can be calculated as follows:

Average waiting time = single Ithumb O service time * (1 + 2 + … + Total number of requests-1) / total number of requests

Apply to the above example: average waiting time = 5ms * (1 waiting 2 +... + 28) / 29 = 70ms, which is very close to the average waiting time of 78ms given by iostat. This in turn indicates that Imax O was initiated at the same time.

There are a large number of Iamp O requests per second (about 29), but the average queue is not long (only about 2), which indicates that the arrival of these 29 requests is uneven, and most of the time Iamp O is idle.

14.29% of the time in a second, there were requests in the Iamp O queue, that is, 85.71% of the time there was nothing to do by the Ipicuro system, and all 29 Ipicuro requests were processed within 142 milliseconds.

Delta (ruse+wuse) / delta (io) = await = 78.21 = > delta (ruse+wuse) / s = 78.21 * delta (io) / s = 78.21 * delta (io) / s = 78.21 requests 28.57 = 2232.8, indicating that the total number of 2232.8ms requests per second needs to wait. So the average queue length should be 2232.8ms/1000ms = 2.23, while the average queue length (avgqu-sz) given by iostat is 22.35. Why?! Because there are bug,avgqu-sz values in iostat that should be 2.23 instead of 22.35

At this point, I believe you have a deeper understanding of "what iostat and iowait are". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.