What is Network IO Virtualization in the principle of KVM Virtualization 07/08 Update SLTechnology News&Howtos

What is Network IO Virtualization in the principle of KVM Virtualization

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Many novices are not very clear about the network IO virtualization in the principle of KVM virtualization. In order to help you solve this problem, the following editor will explain it in detail. People with this need can come and learn. I hope you can get something.

Introduction to IO Virtualization

The previous article introduced the startup process of KVM, CPU virtualization, memory virtualization principles. As a complete wind Neumann computer system, there must be input, calculation and output steps. Traditional IO includes network device IO, block device IO, character device IO and so on. In the exploration of KVM virtualization principle, we mainly introduce network device IO and block device IO. In fact, their principles are very similar, but they are separated in the virtualization layer, which is why network device IO virtualization and block device IO virtualization should be talked about separately. This chapter introduces IO virtualization of network devices, and the next chapter introduces IO virtualization of block devices.

Traditional network IO process

The tradition here is not a real tradition, but rather an introduction to the IO process of network devices in a non-virtualized environment. The Linux versions we usually use, such as Debian or CentOS, are standard Linux TCP/IP protocol stacks. The bottom layer of the protocol stack provides driver abstraction layer to adapt to different network cards. The most important thing in virtualization is device virtualization, but it will be easier to understand virtualization after understanding the whole network IO process.

Standard TCP/IP structure

In the user layer, we interact with Kernel through socket, including creating ports, receiving and sending data and so on.

In the Kernel layer, the TCP/IP protocol stack is responsible for encapsulating our socket data into TCP or UDP packets, then entering the IP layer, adding IP address port information, entering the data link layer, adding Mac address and other information, writing to the network card through the driver, and then sending the data out. As shown in the following figure, the more subjective picture.

In the TCP/IP protocol stack of Linux, each packet is described by the skb_buff structure of the kernel. As shown in the following figure, when the socket sends the packet, it enters the kernel, and the kernel allocates a skb_buff from the pool of skb_buff to carry data traffic.

Both the sending and receiving data driver layer adopts DMA mode. When the driver is loaded, it will map the memory for the network card and set the description state (in the register), that is, the starting bit, length, remaining size of the memory and so on. When sending, the data is put into the mapped memory, and then the network card register is set to generate an interrupt, which tells the network card that there is data, and the network card processes the data corresponding to the memory after receiving the interrupt. After processing, an interrupt is generated to the CPU to tell the CPU that the data transmission is completed, the CPU interrupt processing process notifies the upper layer driver that the data transmission is complete, and the driver returns to the upper layer in turn. In this process, for driver, the transmission is synchronous. The process of receiving data is almost the same as sending data, so I won't go into details here. The mode of DMA is important for later IO virtualization.

image_1aqger4b915nf19k11gjv1lc21atm9.png-46.6kB

KVM Network IO Virtualization

To be exact, KVM only provides some basic CPU and memory virtualization solutions, and the real IO implementation is done by qemu-kvm, but we all default to qemu-kvm and KVM as a system in our articles introducing KVM, so we don't divide them so carefully. In fact, network IO virtualization is accomplished by qemu-kvm.

KVM fully virtualized IO

Remember in the demo in our first chapter, our "mirror" called the out instruction to produce an IO operation, and then because this operation is a sensitive device access type operation, it cannot be performed in VMX non-root mode, so VM exits, the simulator takes over the IO operation.

Switch (kvm- > vcpus- > kvm_run- > exit_reason) {case KVM_EXIT_UNKNOWN: printf ("KVM_EXIT_UNKNOWN\ n"); break; / / the virtual machine performs the IO operation, and the CPU in the virtual machine mode pauses the virtual machine and / / transfers the execution power to emulator case KVM_EXIT_IO: printf ("KVM_EXIT_IO\ n") Printf ("out port:% d, data:% d\ n", kvm- > vcpus- > kvm_run- > io.port, * (int *) ((char *) (kvm- > vcpus- > kvm_run) + kvm- > vcpus- > kvm_run- > io.data_offset); break;.

The virtual machine exited and learned that the reason was KVM_EXIT_IO, and the simulator learned that the device generated an IO operation and exited, so it took the IO operation and printed out the data. In fact, we minimize the simulation of a virtual IO process, and the simulator takes over the IO.

In the IO process of full virtualization of qemu-kvm, the principle is the same. KVM captures IO interrupts, and qemu-kvm takes over the IO. Due to the use of DMA mapping, qemu-kvm registers the mmio information of the device at startup, so that the mapped memory and control information of the DMA device can be obtained.

Static int pci_e1000_init (PCIDevice * pci_dev) {e1000_mmio_setup (d); / / set mmio space pci_register_bar (& d-> dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, & d-> mmio) for PCI devices; pci_register_bar (& d-> dev, 1, PCI_BASE_ADDRESS_SPACE_IO, & d-> io) D-> nic = qemu_new_nic (& net_e1000_info, & d-> conf, object_get_typename (OBJECT (d)), d-> dev.qdev.id, d); add_boot_device_path (d-> conf.bootindex, & pci_dev- > qdev, "/ ethernet-phy@0");}

For PCI devices, when a continuous piece of physical memory is mapped between the device and the CPU, the CPU access to the PCI device only needs to be accessed as memory. There are usually two modes for IO devices, one is port mode, and the other is MMIO mode. The former is the in/out instruction in our demo, and the latter is the DMA access mode of PCI devices. The operations of both modes can be captured by KVM.

So qemu-kvm completes this operation instead of Guest and executes the corresponding "callback", that is, an interrupt is generated to vCPU to tell IO to complete and return to Guest to continue execution. VCPU interrupts are the same as CPU interrupts, which are triggered when the corresponding register is set.

In a fully virtualized environment, IO in Guest is taken over by qemu-kvm. A network card device seen in Guest is not a real network card, but a tap device generated by a physical machine. Knowledge adds some features supported by tap devices to the driver registration information of Guest during driver registration, so you can see network devices in Guest.

As shown in the figure above, qemu takes over the IO operation from Guest, and the real scene certainly needs to send the data out again instead of printing it out like demo. After the Mac address encapsulated by layer 2 of the packet in Guest, the qemu layer does not need to disassemble and parse the data. Instead, it only needs to write the data to the tap device. After the interaction between the tap device and the bridge is completed, the bridge will send it directly to the network card. Bridge (actually NIC is bound to Bridge) turns on promiscuous mode, which allows all requests to be received or sent.

The following quotes from this article

When a TAP device is created, a corresponding char device will be generated in the Linux device file directory, and the user program can open the file as if it were a normal file for reading and writing. When the write () operation is performed, the data enters the TAP device, which for the Linux network layer is equivalent to receiving a packet of data from the TAP device and requesting the kernel to accept it, just as an ordinary physical network card receives a packet of data from the outside world, except that the data actually comes from a user program on the Linux. After receiving this data, Linux will carry out subsequent processing according to the network configuration, thus completing the function of user programs to inject data into the network layer of the Linux kernel. When the user program executes the read () request, it is equivalent to querying the kernel whether there is any data that needs to be sent out on the TAP device, or taking it into the user program to complete the data transmission function of the TAP device. An analogy for TAP devices is that an application that uses a TAP device is equivalent to another computer, and a TAP device is a local network card, and they are connected to each other. The application communicates with the native network core through read () / write () operation.

An operation like this

Fd = open ("/ dev/tap", XXX) write (fd, buf, 1024); read (fd, buf, 1024)

Bridge may be a Linux bridge or an OVS (Open virtual switch). When it comes to network virtualization, you usually need to take advantage of the VLAN tag capabilities provided by bridge.

The above is the network fully virtualized IO process of KVM. We can also see the shortcomings of this process. For example, when the network traffic is very large, it will generate too much VM switching and too many data copy operations. We know that copy is a waste of CPU clock cycles. So in the process of development, qemu-kvm has realized the virtio driver.

KVM Virtio driver

Virtio-based virtualization is also called paravirtualization because the requirement to add a virtio driver to Guest means that Guest knows it is running in a virtual environment.

Vhost-net bypasses QEMU and communicates directly between front-end and backend of Guest, reducing the copy of data, especially from user state to kernel state. Performance has been greatly enhanced, in terms of throughput, vhost-net can basically run full bandwidth of a physical machine.

Vhost-net requires kernel support. It starts after Redhat 6.1 and is enabled by default.

KVM's network device IO virtualization has gone through the evolution of full virtualization-> virtio- > vhost-net, and its performance is getting closer and closer to the real physical network card, but there is still a gap in packet processing, but it is no longer a system bottleneck. We can see that after years of development, the performance of KVM is getting stronger and stronger, which is one of the important reasons why it is ahead of other virtualization.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.