What is the parsing of Windows IO performance in virtualized environment 09/20 Update SLTechnology News&Howtos

What is the parsing of Windows IO performance in virtualized environment

2025-09-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

The purpose of this article is to share with you what the parsing of Windows IO performance in a virtualized environment is like. The editor thinks it is very practical, so I share it with you. I hope you can get something after reading this article.

Preface

With the development and progress of cloud computing technology and services, more and more customers choose to deploy their business to the cloud. However, due to the introduction of the virtualization layer, IO problems are often encountered in the process of business deployment, and it is usually not easy to debug. The following mainly describes the use of perf, systemtap and other tools to help a hosting cloud customer debug IO performance problems to analyze the performance of Windows IO in the virtual environment.

Problems arise

On one occasion, hosting cloud customers built their own virtualized environment, created windows 2008 R2 and Centos6.5 virtual machines on the same host, and tested their random read performance with fio. The IOPS of windows 2008 R2 is about 18K, while the IOPS of Linux is about 100K.

Fio configuration for customer testing

[global]

Ioengine=windowsaio

Direct=1

Iodepth=64

Thread=1

Size=20g

Numjobs=1

[4k]

Bs=4k

Filename=d:test.img

Rw=randread

Test result

Win_fio1

CVM IO stack

Io stack

In the CVM environment, the whole IO stack is relatively long, involving the application layer / file system / Block layer and driver layer, virtualization layer, host OS file system / Block layer and driver layer in Guest OS. Because there are many aspects involved, problems in any of these links will lead to performance degradation, which also increases the difficulty of doing IO Tracing.

From the information obtained this time, we first eliminate the problems of the host file system and Block layer as well as the driver layer, because there is no problem with the Linux system in the same configuration.

So at present, it is mainly focused on two points.

Guest OS (Windows system)

Fio program

File system / Block layer

The VirtIO Block-driven virtual machine provides Virtio Block devices for Guest OS

QEMU

How to eliminate QEMU as a suspect?

For performance issues with IOPS, it is easy to think of two possibilities:

IO latency is too high

Device support IO queue is too short

In terms of queuing problems, the Virtio Block devices corresponding to Linux and Windows virtual machines are the same, so you need to confirm the latency problem.

How long did it take QEMU to complete Block IO?

Fortunately, Stefan Hajnoczi has added Tracing features to QEMU, so it is easy to count the time it takes for QEMU to receive an IO request to complete.

From the above statistics, the average IO completion time is in 130us, thus temporarily eliminating the impact of high latency caused by the QEMU layer. In addition, if you look at the overhead of this dynamic Tracing, it is roughly close to 20% from the test observation.

Eliminate queues and latency issues, and Guest OS is the only one that may have an impact.

What is the problem with VirtIO Block driver?

At least updated to the latest stable version of the Virtio-Win driver, there is still the same problem.

Problems with the Windows file system / Block layer?

The native Windows system has not made any configuration changes after confirmation.

Problems with the fio test program

Why is there no problem with fio on Linux?

Two possibilities

In the process of performance checking, it is always easy to get into a dead end, and you often ask what went wrong. As a result, all the possible factors do not seem to have changed. From experience, most performance problems can be divided into two possibilities:

On cpu

Off cpu

Looking back at this issue, after basically eliminating the problem of IO delay, there are still two possibilities for the corresponding problem:

CPU is extremely busy, but most of the time he is not doing IO processing.

CPU is often idle, so the corresponding is not mainly dealing with IO.

Note: the impact of IO latency cannot be ruled out so far because only the possible effects of the QEMU Block layer are excluded, but there is also Guest OS (this time ignoring Guest OS).

First look at the CPU consumption of the virtual machine during the test.

Top-H-p 36256

Win_fio1

From the above figure, the cpu load of the QEMU main thread has reached more than 90%, which seems to be in line with the on cpu class problem. In general, the best way to solve such problems is to sample with the perf process and then generate a flame graph, because it is a good choice to see exactly where CPU is consumed first.

Perf record-a-g-p 36256 sleep 20

Generate a flame diagram:

Win2008-bad

You can clearly see that most of the cpu consumption is the operation of KVM, of which the main consumption is vmx_handle_exit. The real flame picture is a vector graph, which can be easily confirmed by browsing it. There are two main points that cause vmx_handle_exit here:

Visit IO Port (handle_pio)

Visit MMIO (handle_apic_access)

Since the KVM module accounts for the majority, it is more desirable to know the real behavior of KVM during testing, which can be achieved through another tool (kvm_stat).

Kvm_pio

Apart from VM Entry and VM Exit events, the highest ones are kvm_pio and kvm_mmio, indicating that Windows does have a large number of IO Port and MMIO operations, which validates the conclusion drawn on the flame diagram.

In virtualization, either IO Port or MMIO can cause VM Exit or even Heavy Exit. If you need to improve performance, you will generally try to avoid this situation, at least to avoid Heavy Exit.

Specific access to which IO Port and MMIO caused VM Exit?

For this problem, the KVM module has added a lot of trace event, and the above kvm_stat also uses these trace event, but does not print out the specific trace event information. In order to get information about trace-event, there are many front-end tools, such as trace-cmd, perf, are good choices.

View the trace event of all kvm modules

[xs3c@devhost1] # trace-cmd list-e | grep kvm

Kvmmmu:kvm_mmu_pagetable_walk

Kvmmmu:kvm_mmu_paging_element

Kvmmmu:kvm_mmu_set_accessed_bit

Kvmmmu:kvm_mmu_set_dirty_bit

Kvmmmu:kvm_mmu_walker_error

Kvmmmu:kvm_mmu_get_page

Kvmmmu:kvm_mmu_sync_page

Kvmmmu:kvm_mmu_unsync_page

Kvmmmu:kvm_mmu_zap_page

Kvm:kvm_entry

Kvm:kvm_hypercall

Kvm:kvm_pio

Kvm:kvm_cpuid

Kvm:kvm_apic

Kvm:kvm_exit

Kvm:kvm_inj_virq

Kvm:kvm_inj_exception

Kvm:kvm_page_fault

Kvm:kvm_msr

Kvm:kvm_cr

Kvm:kvm_pic_set_irq

Kvm:kvm_apic_ipi

Kvm:kvm_apic_accept_irq

Kvm:kvm_eoi

Kvm:kvm_pv_eoi

Kvm:kvm_write_tsc_offset

Kvm:kvm_ple_window

Kvm:kvm_vcpu_wakeup

Kvm:kvm_set_irq

Kvm:kvm_ioapic_set_irq

Kvm:kvm_ioapic_delayed_eoi_inj

Kvm:kvm_msi_set_irq

Kvm:kvm_ack_irq

Kvm:kvm_mmio

The KVM module adds many trace event dots, and here only two of them-- kvm:kvm_pio and kvm:kvm_mmio-- are grabbed.

Trace-cmd-pio-mmio

Through statistics, it is found that the main visitors are:

IO Port is 0x608 and 0xc050.

MMIO is 0xFEE003xx.

Through the qemu info mtree command, you can view the specific devices corresponding to IO Port 608, c050, and FEE003xx, respectively.

IO Port

0000000000000608-000000000000060b (prio 0, RW): acpi-tmr 000000000000c040-000000000000c07f (prio 1, RW): virtio-pci

MMIO

00000000fee00000-00000000feefffff (prio 4096, RW): icc-apic-container

C050 can be ignored, this is used by Virtio Block to do VM Exit.

So far, it can be judged that wnidows reads a lot of ACPI Power Manager Timer and accesses APIC registers, resulting in too much vm exit generation and consuming a lot of CPU resources, so two issues can be discussed in detail:

1. How to reduce the VM Exit caused by reading ACPI PM Timer registers

two。 How to reduce the VM Exit caused by accessing APIC MMIO.

How to reduce the VM Exit caused by reading ACPI PM Timer?

From the idea of virtualization layer optimization, reducing the VM Exit caused by IO Port usually considers whether Paravirtulization can be used to replace Full-virtualization to achieve the goal, to see how Windows does it in this respect.

Starting from Windows 7, Microsoft has specially done a lot of virtualization enhancements for Windows systems in order to make Windows operating systems have better performance in HyperV, including HyperV Timer, which is similar to kvmclock in Linux.

Judging from the current support situation:

Windows 7

Windows 7 SP1

Windows Server 2008 R2

Windows Server 2008 R2 SP1/SP2

Windows 8/8.1/10

Windows Server 2012

Windows Server 2012 R2

These Windows systems include virtualization enhancements, and more information is available on Microsoft's official website.

In 2014, RedHat engineers Vadim Rozenfeld and Peter Krempa added HyperV Timer support for qemu and libvirt, respectively, so HyperV Timer can be enabled directly through libvirt.

In addition, HyperV Timer is also supported in KVM a long time ago, but the customer's host kernel version does not support this feature, so you need to upgrade the kernel version maintained by UCloud for the customer.

How to reduce APIC ACCESS and cause VM Exit?

Intel CPU also supports apic-v, which is also upgraded to the kernel version maintained by UCloud itself.

Final effect

Win-fio-good

Win-good

As can be seen from this case, compared with the physical environment, when the Windows IO performance is poor in the virtualized environment, it is not necessarily the problem of the IO path, but some virtualization performance problems have a great impact on the IO performance.

The above is what the parsing of Windows IO performance is like in a virtualized environment, and the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.