How to share the Code principle of Linux Kernel iowait time 07/15 Update SLTechnology News&Howtos

How to share the Code principle of Linux Kernel iowait time

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article introduces how to share the code principle of Linux Kernel iowait time. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.

When iowait occurs in task, the kernel handles them by switching out the task and letting the runnable task run first, and before switching out, its in_iowait is set to 1, and the in_iowait is set to the original value when it is woken up again. Correlation function io_schedule,io_schedule_timeout,mutex_lock_io,mutex_lock_io_nested.

For example:

Thus in_iowait indicates whether the task is in iowait or not.

It is also important to note that except that mutex_lock_io,mutex_lock_io_nested sets the task running state to TASK_UNINTERRUPTIBLE, the kernel sets the task running state TASK_UNINTERRUPTIBLE before calling io_schedule,io_schedule_timeout.

When the process switching function _ _ schedule switches the task, if the in_iowait of the switched out task is true, it will add 1 to the nr_iowait in the run queue rq structure of the CPU.

Because the front face task has been set to TASK_UNINTERRUPTIBLE, task needs to be awakened, and the reduction of nr_iowait is also done in the task wake-up function.

Thus it can be seen that nr_iowait can indicate whether there is a task in iowait on a CPU, and the number.

Because the task in iowait is TASK_UNINTERRUPTIBLE, it is not in the ready queue, so it is not possible for CPU to load balance to other CPU, so nr_iowait does not need to deal with load balancing.

When accumulating system idle time, if the nr_iowait of CPU is true, that is, the current cpu has task waiting for iowait, it is recorded as iowait time.

In the kernel that opens NO_HZ, the relevant code is in update_ts_time_stats.

Those that are not opened are in account_idle_time.

When the relevant / proc/stat interface is accessed, the get_iowait_time accesses this time and returns.

To sum up, iowait time is CPU idle time, but at this time there is not no TASK to run on the CPU, but one or more of the dormant task is iowait task.

Of course, when idle and iowait, there is idle task on CPU.

Finally, an article by Ali Kernel Group is recommended as an extension to Kernel Documents/new iowait calculation (http://link.zhihu.com/?target=http%3A//kernel.taobao.org/index.php%3Ftitle%3DKernel_Documents/new_iowait_calculation))

It's more interesting here:

+ wait_event_interruptible_hrtimeout (ctx- > wait)

+ aio_read_events (ctx, min_nr, nr, event, & ret), until)

No matter what the timeout value until is, wait_event_interruptible_hrtimeout will be called. Although the real-time performance of hrtimer is already very high, in the macro _ _ wait_event_hrtimeout used to actually process wait, you can see that hrtimer initialization uses:

Hrtimer_start_range_ns (& _ _ t.timer, timeout,\

Current- > timer_slack_ns,\

HRTIMER_MODE_REL);\

The third parameter current- > timer_slack_ns is the trigger range passed to hrtimer. Because hrtimer has high real-time performance, but the frequent trigger system obviously can't stand it, each hrtimer trigger will dispose of all the timer in the time range (see _ _ hrtimer_run_queues). So timeout+current- > timer_slack_ns is the last trigger time of the set hrtimer. The default value of current- > timer_slack_ns is 50000, which represents 50000 nanoseconds. That is, the clock will be triggered after 50000 nanoseconds at most, and it may also be triggered by the previous hrtimer.

So in wait_event_interruptible_hrtimeout, once ctx- > wait is not ready, even if the timeout is set to 0, schedule is likely to be called, which results in a big difference in iowait time and greatly harms the performance.

This problem was also fixed by 5f785de588735306ec4d7c875caf9d28481c8b21, and the code was changed to:

-wait_event_interruptible_hrtimeout (ctx- > wait)

-aio_read_events (ctx, min_nr, nr, event, & ret), until)

+ if (until.tv64 = = 0)

+ aio_read_events (ctx, min_nr, nr, event, & ret)

+ else

+ wait_event_interruptible_hrtimeout (ctx- > wait)

+ aio_read_events (ctx, min_nr, nr, event, & ret)

+ until)

Thus, when until is 0, aio_read_events is called directly. There should be no more obvious iowait problems, and as a result, this fix has improved the performance of io_getevents calls more than a hundredfold.

Of course, there is still a reason why this iowait is not accurate enough, and once a task switch occurs, there will still be a problem of inaccuracy.

On how to share the principle of Linux Kernel iowait time code sharing here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.