What are the basic principles of KVM virtualization? 10/19 Update SLTechnology News&Howtos

What are the basic principles of KVM virtualization?

2025-10-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

What is the basic principle of KVM virtualization? I believe many inexperienced people are at a loss about it. Therefore, this article summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

X86 operating systems are designed to run directly on bare hardware devices, so they automatically assume that they fully occupy computer hardware. The x86 architecture provides four privilege levels for operating systems and applications to access hardware. Ring refers to the running level of CPU, Ring 0 is the highest level, Ring1 is the second, Ring2 is the second. In the case of Linux+x86

The operating system (kernel) needs direct access to hardware and memory, so its code needs to run on the highest runlevel Ring0, so it can use privileged instructions to control interrupts, modify page tables, access devices, and so on.

The code of the application runs on the lowest running level ring3 and cannot do controlled operations. If you want to do this, for example, to access the disk and write files, you have to execute the system call (function). When executing the system call, the running level of CPU will switch from ring3 to ring0, and jump to the corresponding kernel code location for execution, so that the kernel completes the device access for you, and then returns ring3 from ring0. This process is also known as the switching between user mode and kernel state.

So, virtualization encounters a problem here, because the host operating system works on ring0, so the guest operating system cannot also be on ring0, but it doesn't know this, what instructions were executed in the past and what instructions are still executed now, but it can go wrong without permission to execute. So the hypervisor (VMM) needs to avoid this at this time. How can virtual machines access hardware through Guest CPU through VMM? there are three implementation techniques according to their different principles:

1. Full virtualization

two。 Paravirtualization

3. Hardware-assisted virtualization

1.1 full Virtualization based on binary Translation (Full Virtualization with Binary Translation)

1.2. Hypervirtualization (or paravirtualization / operating system assisted virtualization Paravirtualization)

The idea of paravirtualization is to modify the operating system kernel, replace instructions that cannot be virtualized, and communicate directly with the underlying virtualization layer hypervisor through super calls (hypercall). Hypervisor also provides super call interfaces to meet other key kernel operations, such as memory management, interrupts and time retention.

This eliminates capture and simulation in full virtualization and greatly improves efficiency. So for paravirtualization technologies like XEN, the client operating system has a custom kernel version, which is equivalent to kernel versions such as x86, mips, and arm. In this way, there will be no process of catching exceptions, translation, and simulation, and the performance loss is very low. This is the advantage of a paravirtualized architecture like XEN. This is why XEN only supports virtualized Linux, not virtualized windows. Microsoft does not change the code.

And the two modes of operation can be converted to each other. VMM running in VMX root operation mode switches to VMX non-root operation mode by explicitly calling VMLAUNCH or VMRESUME instructions, the hardware automatically loads the context of Guest OS, and Guest OS gets run, which is called VM entry. When Guest OS encounters events that need to be handled by VMM, such as external interrupts or page fault exceptions, or actively calls VMCALL instructions to call VMM services (similar to system calls), the hardware automatically suspends Guest OS, switches to VMX root operation mode, and resumes the operation of VMM, which is called VM exit. The behavior of software in VMX root operation mode is basically the same as that on processors without VT-x technology, while VMX non-root operation mode is very different, the main difference is that VM exit occurs when certain instructions are run or certain events are encountered at this time.

In other words, the hardware layer makes some distinction, so that under full virtualization, those implementations that rely on "catch exception-translate-emulate" are not needed. And CPU manufacturers, more and more efforts to support virtualization, hardware-assisted full virtualization technology performance gradually approaching semi-virtualization, coupled with full virtualization does not need to modify the customer operating system this advantage, full virtualization technology should be the future development trend.

(1) A KVM virtual machine is a Linux qemu-kvm process, which is scheduled by the Linux process scheduler like other Linux processes.

(2) the KVM virtual machine includes the virtual memory, the virtual CPU and the virtual machine Icano device, in which the virtualization of memory and CPU is realized by the KVM kernel module, and the virtualization of the CPU device is realized by QEMU.

(3) the memory of the KVM household system is part of the address space of the qumu-kvm process.

(4) vCPU of the KVM virtual machine runs as a thread in the context of the qemu-kvm process.

New features have been added to CPU that support virtualization. Take Intel VT technology as an example, it adds two operation modes: VMX root mode and VMX nonroot mode. Generally speaking, the host operating system and VMM run in VMX root mode, while the client operating system and its applications run in VMX nonroot mode. Because both modes support all ring, the client can run in the ring it needs (OS runs in ring 0, the application runs in ring 3), and VMM runs in the ring it needs (for KVM, QEMU runs on ring 3 KVM runs on ring 0). The switching of CPU between the two modes is called VMX handoff. Enter nonroot mode from root mode, called VM entry;, enter root mode from nonroot mode, and call it VM exit. As you can see, CPU switches between the two modes, taking turns executing VMM code and Guest OS code.

For KVM virtual machine, VMM running under VMX Root Mode executes VMLAUNCH instruction to convert CPU to VMX non-root mode when it needs to execute Guest OS instruction, and starts to execute client code, that is, VM entry process; when Guest OS needs to exit the mode, CPU automatically switches to VMX Root mode, that is, VM exit process. It can be seen that the KVM client code runs directly on the physical CPU under the control of VMM. QEMU only controls the virtual machine's code through KVM to be executed by CPU, but they do not execute their code themselves. In other words, CPU is not really virtualized into a virtual CPU for client use.

The host Linux treats a virtual as a QEMU process, which includes the following threads:

Icano threads are used to manage analog devices

VCPU threads are used to run Guest code

Other threads, such as those dealing with event loop,offloaded tasks, etc.

In my test environment (RedHata Linux as Hypervisor):

The number of threads set by smp is 48

1 main thread (Ithumb O thread), 4 vCPU threads, 3 other threads

6101 main threads (Ithumb O threads), 6 vCPU threads, 3 other threads

To schedule threads within the client to a physical CPU, you need to go through two processes:

The client thread is scheduled to the client physical CPU, namely KVM vCPU, which is the responsibility of the client operating system, and each client operating system is implemented differently. On KVM, vCPU looks like a physical CPU to the client system, so its scheduling approach is no different.

The vCPU thread is scheduled to the physical CPU, that is, the host physical CPU, and the scheduling is handled by Hypervisor, that is, Linux.

KVM uses the standard Linux process scheduling method to schedule vCPU processes. In Linux system, the difference between thread and process is that the process has independent kernel space, and the thread is the execution unit of the code, that is, the basic unit of scheduling. In Linux, threads are lightweight processes, that is, processes that share some resources (address space, file handles, semaphores, etc.), so threads are scheduled according to the way the process is scheduled.

How to set up vCPU

Let's assume that a host has two socket and each socket has four core. The main frequency is 2.4G MHZ, then the total available resource is 214G / 2.4G = 19.2 G MHZ. Suppose that there are three VM,VM1 running on the host and VM2 is set to 1 socketbook 1 core VM3 is set to 1socket*2core. So VM1 and VM2 each have one vCPU, while VM3 has two vCPU. Assume that the other settings are the default.

Then the three VM get the host CPU resource allocation as follows: VM1:25%; VM2:25%; VM3:50%

Assuming that the application running on VM3 supports multithreading, then the application can make full use of the unallocated CPU resources. The setting of 2vCPU is appropriate. Assuming that the application running on VM3 does not support multithreading, it is impossible for the application to use two vCPU. At the same time, the CPU Scheduler of the VMkernal tier must wait for two free pCPU in the physical tier before starting provisioning to meet the needs of the two vCPU. In the case of only 2vCPU, there will not be much negative impact on the performance of the VM. But if you allocate 4vCPU or more, this resource scheduling burden is likely to have a significant negative impact on applications running on that VM.

Steps to determine the number of vCPU. If we are going to create a VM, the following steps can help determine the appropriate number of vCPU

1 understand the application and set the initial value

Whether the application is a key application, whether there is a Service Level Agreement. It is important to have an in-depth understanding of whether applications running on virtual machines support multithreading. Ask the provider of the application if it supports multithreading and SMP (Symmetricmulti-processing). Refer to the number of CPU required for the application to run on the physical server. If there is no reference information, you can set 1vCPU as the initial value, and then closely monitor resource usage.

(2) the use of observation resources

Determine a time period to observe the resource usage of the virtual machine. The time period depends on the characteristics and requirements of the application, which can be days or even weeks. Observe not only the CPU utilization rate of the VM, but also the CPU occupancy rate of the application within the operating system. In particular, it is necessary to distinguish between the average CPU usage and the peak CPU usage.

If there are 4 vCPU assigned, if the CPU of the application on that VM

Peak usage is equal to 25%, that is, only 25% of all CPU resources can be used at most, indicating that the application is single-threaded and can only use one vCPU (4 * 25% = 1).

If the average value is less than 38% and the peak value is less than 45%, consider reducing the number of vCPU

If the average value is more than 75% and the peak value is more than 90%, consider increasing the number of vCPU

3 change the number of vCPU and observe the results

Try to make as few changes as possible. If you may need 4vCPU, first set whether 2vCPU is acceptable in observing performance.

KVM implements client memory by using mmap system calls to declare a continuous size of space in the virtual address space of the QEMU main thread for client physical memory mapping.

(2) after merger:

Intel's x86 CPU usually uses 4Kb memory pages. When configured, you can also use giant pages (huge page): (4MB on x86'32, 2MB on x86'64 and x86'32 PAE)

Using large pages, the page tables of KVM's virtual machines will use less memory and will improve the efficiency of CPU. In the highest case, it can improve the efficiency by 20%!

Large pages and transparent large pages (THP)

X86 CPU usually handles memory in 4kB pages, but you can use larger 2MB or 1GB pages, namely huge page (large pages). Large page memory can support KVM aircraft deployment and improve performance by increasing the CPU cache of the Click-to-convert backup buffer (TLB).

The kernel feature will be enabled by default in Red Hat Enterprise Linux 7, and large pages can greatly improve performance, especially for large memory and memory-intensive loads. Red Hat Enterprise Linux 7 can increase the page size by using large pages to effectively manage large amounts of memory.

Process 7.1. Enable large 1GB pages for airliners

The Red Hat Enterprise Linux 7.1 system supports large 2MB or 1GB pages, and assignments will be made at startup or run time. Page sizes can be released at run time. For example, to assign 4 large pages of 1GB and 1024 large pages of 2MB at startup, use the following command line:

'default_hugepagesz=1G hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=1024'

In addition, large pages can be allocated at run time. Runtime allocation allows system administrators to choose which NUMA mode to allocate pages from. However, due to the existence of memory fragmentation, page allocation at run time is more likely to cause allocation failure than allocation at startup. The following runtime allocation example shows a large page that allocates 4 1GB from node1 and a large page that allocates 1024 2MB from node3:

# echo 4 > / sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages# echo 1024 > / sys/devices/system/node/node3/hugepages/hugepages-2048kB/nr_hugepages

Next, mount the large pages of 2MB and 1GB to the host:

# mkdir / dev/hugepages1G# mount-t hugetlbfs-o pagesize=1G none / dev/hugepages1G# mkdir / dev/hugepages2M# mount-t hugetlbfs-o pagesize=2M none / dev/hugepages2M

The default 1GB large page is now not available for airliners. To use 1G large memory pages in the client, you need to configure as follows:

In the following example, the airliner NUMA node 0-5 (excluding NUMA node 4) will use a large page of 1 GB, and the airliner NUMA node 4 will use a large page of 2 MB, regardless of where the airliner NUMA node is located in the host.

Transparent large pages (THP,transparent huge page) will automatically optimize system settings for performance. Improve performance by allowing all free memory to be used as a cache. Once / sys/kernel/mm/transparent_hugepage/enabled is set to always, large transparent pages will be used by default. Run the following command to disable transparent large pages: # echo never > / sys/kernel/mm/transparent_hugepage/enabled transparent large page support does not prevent the use of hugetlbfs. However, when hugetlbfs is not in use, KVM will use transparent large pages to replace the regular 4KB page size example: usage, you need three parts: mkdir / dev/hugepages mount-t hugetlbfs hugetlbfs / dev/hugepages # to reserve some memory for the giant page sysctl vm.nr_hugepages=2048 (when using the x86 pages 64 system This is equivalent to reserving 2048 x 2m = 4GB from physical memory for the virtual machine) # passing parameters hugepagesqemu-kvm-qemu-kvm-mem-path / dev/hugepages to kvm can also be added in the configuration file: verification mode, when the virtual machine starts normally, check the physical machine: cat / proc/meminfo | grep-I hugepages after reading the above, have you mastered the basic principles of KVM virtualization? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.