In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
In this issue, the editor will bring you about the principles of virtualization and the startup process of QEMU. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.
The first step in the virtual machine startup process is to get the kvm handle kvmfd = open ("/ dev/kvm", O_RDWR); the second step is to create a virtual machine and get the virtual machine handle. Vmfd = ioctl (kvmfd, KVM_CREATE_VM, 0); the third step is to map memory for the virtual machine, as well as other PCI, initialization of signal processing. Ioctl (kvmfd, KVM_SET_USER_MEMORY_REGION, & mem); the fourth step is to map the virtual machine image to memory, which is equivalent to the boot process of the physical machine, mapping the image to memory. The fifth step is to create the vCPU and allocate memory space for the vCPU. Ioctl (kvmfd, KVM_CREATE_VCPU, vcpuid); vcpu- > kvm_run_mmap_size = ioctl (kvm- > dev_fd, KVM_GET_VCPU_MMAP_SIZE, 0); step 5, create the number of threads of vCPU and run the virtual machine. Ioctl (kvm- > vcpus- > vcpu_fd, KVM_RUN, 0); in the sixth step, the thread enters the loop, captures the reason for the exit of the virtual machine, and deals with it accordingly. The exit here is not necessarily a virtual machine shutdown. If a virtual machine encounters an IO operation, accessing hardware devices, page faults, etc., the virtual machine will exit execution. Exiting execution can be understood as returning the CPU execution context to QEMU. Open ("/ dev/kvm") ioctl (KVM_CREATE_VM) ioctl (KVM_CREATE_VCPU) for (;;) {ioctl (KVM_RUN) switch (exit_reason) {case KVM_EXIT_IO: / *... / case KVM_EXIT_HLT: / *. * /}}
With regard to the description of KVM_CREATE_VM parameters, the created VM has no cpu and memory, and requires the QEMU process to map a piece of memory to VM using the mmap system call, which is actually the process of creating memory for VM.
KVM ioctl interface documentation
Let's start with a KVM API appetizer
Here is a simple demo for KVM, which is designed to load code and run using KVM.
This is a 8086 assembly of at&t. Code 16 indicates that it is a 16-bit, of course, it cannot be run directly. In order to get it running, we can use the API provided by KVM to treat this program as the simplest operating system and run it.
The purpose of this assembly is to output the value of the al register to the 0x3f8 port. For x86 architectures, it is accessed through IN/OUT instructions. The PC architecture has a total of 65536 8bit I 8bit O ports, which make up the 64KI/O address space and are numbered from 0~0xFFFF. Two consecutive 8bit ports can form a 16bit port, and four consecutive ports form a 32bit port. The CPU address space of an 32bit is 4G, and the physical address space of CPU is 64K.
The ideal output of the final program should be that the value of al,bl is assigned when KVM is initialized.
4\ n (not directly output\ n, but with a new line), the hlt instruction indicates that the virtual machine exits
.globl _ start .code 16 _ start: mov $0x3f8,% dx add% bl,% al add $'0mm,% al out% al, (% dx) mov $'\ nforth,% al out% al, (% dx) hlt
Let's compile this assembly and get a binary file for Bin.bin.
As-32 bin.S-o bin.old-m elf_i386-- oformat binary-N-e _ start-Ttext 0x10000-o Bin.bin bin.o
Check the binary format
➜demo1 hexdump-C bin.bin00000000 ba f8 03 00 d8 04 30 ee b00a ee f4 | .0. | 0000000c corresponds to the following code array This eliminates the need to load the bytecode directly from the file with const uint8_t code [] = {0xba, 0xf8, 0x03, / * mov $0x3f8,% dx * / 0x00, 0xd8, / * add% bl,% al * / 0x04, '0upload, / * add $' 0,% al * / 0xee, / * out% al (% dx) * / 0xb0,'\ nforth, / * mov $'\ nforth,% al * / 0xee, / * out% al, (% dx) * / 0xf4, / * hlt * /} # include # include int main (void) {int kvm, vmfd, vcpufd, ret Const uint8_t code [] = {0xba, 0xf8, 0x03, / * mov $0x3f8,% dx * / 0x00, 0xd8, / * add% bl,% al * / 0x04,% al * / 0xee, / * out% al, (% dx) * / 0xb0,'\ n' / * mov $'\ nfarmer,% al * / 0xee, / * out% al, (% dx) * / 0xf4, / * hlt * /} Uint8_t * mem; struct kvm_sregs sregs; size_t mmap_size; struct kvm_run * run; / / get kvm handle kvm = open ("/ dev/kvm", O_RDWR | O_CLOEXEC); if (kvm =-1) err (1, "/ dev/kvm"); / / make sure it is the correct API version ret = ioctl (kvm, KVM_GET_API_VERSION, NULL) If (ret = =-1) err (1, "KVM_GET_API_VERSION"); if (ret! = 12) errx (1, "KVM_GET_API_VERSION% d, expected 12", ret); / / create a virtual machine vmfd = ioctl (kvm, KVM_CREATE_VM, (unsigned long) 0); if (vmfd = =-1) err (1, "KVM_CREATE_VM") / / request memory for this virtual machine and load the code (image) into the virtual machine memory mem = mmap (NULL, 0x1000, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS,-1, 0); if (! mem) err (1, "allocating guest memory"); memcpy (mem, code, sizeof (code)) / / Why start with 0x1000, because the first 4K of the page tablespace is reserved for the page table directory struct kvm_userspace_memory_region region = {slot = 0, .guest _ phys_addr = 0x1000, .memory _ size = 0x1000, .userspace _ addr = (uint64_t) mem,}; / / sets the memory region of KVM ret = ioctl (vmfd, KVM_SET_USER_MEMORY_REGION, & region) If (ret = =-1) err (1, "KVM_SET_USER_MEMORY_REGION"); / / create virtual CPU vcpufd = ioctl (vmfd, KVM_CREATE_VCPU, (unsigned long) 0); if (vcpufd = =-1) err (1, "KVM_CREATE_VCPU"); / / get the size of KVM runtime structure ret = ioctl (kvm, KVM_GET_VCPU_MMAP_SIZE, NULL) If (ret =-1) err (1, "KVM_GET_VCPU_MMAP_SIZE"); mmap_size = ret; if (mmap_size)
< sizeof(*run)) errx(1, "KVM_GET_VCPU_MMAP_SIZE unexpectedly small"); // 将 kvm run 与 vcpu 做关联,这样能够获取到kvm的运行时信息 run = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, vcpufd, 0); if (!run) err(1, "mmap vcpu"); // 获取特殊寄存器 ret = ioctl(vcpufd, KVM_GET_SREGS, &sregs); if (ret == -1) err(1, "KVM_GET_SREGS"); // 设置代码段为从地址0处开始,我们的代码被加载到了0x0000的起始位置 sregs.cs.base = 0; sregs.cs.selector = 0; // KVM_SET_SREGS 设置特殊寄存器 ret = ioctl(vcpufd, KVM_SET_SREGS, &sregs); if (ret == -1) err(1, "KVM_SET_SREGS"); // 设置代码的入口地址,相当于32位main函数的地址,这里16位汇编都是由0x1000处开始。 // 如果是正式的镜像,那么rip的值应该是类似引导扇区加载进来的指令 struct kvm_regs regs = { .rip = 0x1000, .rax = 2, // 设置 ax 寄存器初始值为 2 .rbx = 2, // 同理 .rflags = 0x2, // 初始化flags寄存器,x86架构下需要设置,否则会粗错 }; ret = ioctl(vcpufd, KVM_SET_REGS, ®s); if (ret == -1) err(1, "KVM_SET_REGS"); // 开始运行虚拟机,如果是qemu-kvm,会用一个线程来执行这个vCPU,并加载指令 while (1) { // 开始运行虚拟机 ret = ioctl(vcpufd, KVM_RUN, NULL); if (ret == -1) err(1, "KVM_RUN"); // 获取虚拟机退出原因 switch (run->Exit_reason) {case KVM_EXIT_HLT: puts ("KVM_EXIT_HLT"); return 0 / / assembler calls the out instruction, which is not allowed in vmx mode, so / / the operation right is switched to the host, and the context is saved to the VMCS register / / later the CPU virtualization will talk about this part / / because the memory host of the virtual machine can read directly to the Therefore, the output of / / virtual machine (out instruction) is obtained directly on the host, which is also a basis for the virtualization of PCI devices. PCI devices in DMA mode case KVM_EXIT_IO: if (run- > io.direction = = KVM_EXIT_IO_OUT & & run- > io.size = = 1 & & run- > io.port = = 0x3f8 & & run- > io.count = = 1) putchar (* ((char *) run) + run- > io.data_offset)) Else errx (1, "unhandled KVM_EXIT_IO"); break; case KVM_EXIT_FAIL_ENTRY: errx (1, "KVM_EXIT_FAIL_ENTRY: hardware_entry_failure_reason = 0x%llx", (unsigned long long) run- > fail_entry.hardware_entry_failure_reason) Case KVM_EXIT_INTERNAL_ERROR: errx (1, "KVM_EXIT_INTERNAL_ERROR: suberror = 0x%x", run- > internal.suberror); default: errx (1, "exit_reason = 0x%x", run- > exit_reason);}}
Compile and run this demo
Gcc-g demo.c-o demo ➜demo1. / demo4KVM_EXIT_HLT another simple QEMU emulator demo
Xu, a classmate of IBM, has made an introduction. On this basis, I will introduce the startup process of qemu-kvm in detail.
.globl _ start .code 16 _ start: xorw% ax,% ax # clears the ax register loop1: out% ax, $0x10 # outputs the contents of ax like the port of 0x10, and the operands compiled by at&t are the opposite of those of Intel. Inc% ax # ax value plus a jmp loop1 # to continue the loop
The purpose of this assembly is to continuously output a byte of value to the 0x10 port.
Start with the main function
Int main (int argc, char * * argv) {int ret = 0; / initialize the kvm structure struct kvm * kvm = kvm_init (); if (kvm = = NULL) {fprintf (stderr, "kvm init fauilt\ n"); return-1;} / / create VM and allocate memory space if (kvm_create_vm (kvm, RAM_SIZE)
< 0) { fprintf(stderr, "create vm fault\n"); return -1; } // 加载镜像 load_binary(kvm); // only support one vcpu now kvm->Vcpu_number = 1; / create execution site kvm- > vcpus = kvm_init_vcpu (kvm, 0, kvm_cpu_thread); / / start virtual machine kvm_run_vm (kvm); kvm_clean_vm (kvm); kvm_clean_vcpu (kvm- > vcpus); kvm_clean (kvm);}
The first step is to initialize the kvm structure by calling kvm_init (). Let's take a look at how to define a simple kvm.
Struct kvm {int dev_fd; / / dev/kvm handle int vm_fd; / / GUEST handle _ _ U64 ram_size; / / GUEST memory size _ _ U64 ram_start The memory starting address of / / GUEST. / / this address is the address int kvm_version; struct kvm_userspace_memory_region mem; / / slot mapped by qemu emulator through mmap. It is filled with user space and / / allows segmentation of guest addresses. Compose multiple slot into linear address struct vcpu * vcpus; / / vcpu array int vcpu_number; / / vcpu number}
Initialize the kvm structure.
Struct kvm * kvm_init (void) {struct kvm * kvm = malloc (sizeof (struct kvm)); kvm- > dev_fd = open (KVM_DEVICE, O_RDWR); / / Open / dev/kvm to get kvm handle if (kvm- > dev_fd)
< 0) { perror("open kvm device fault: "); return NULL; } kvm->Kvm_version = ioctl (kvm- > dev_fd, KVM_GET_API_VERSION, 0); / / get kvm API version return kvm;}
Step 2 + step 3, create a virtual machine, get the handle to the virtual machine, and allocate memory to it.
Int kvm_create_vm (struct kvm * kvm, int ram_size) {int ret = 0; / / call the API KVM_CREATE_KVM to get the vm handle kvm- > vm_fd = ioctl (kvm- > dev_fd, KVM_CREATE_VM, 0); if (kvm- > vm_fd
< 0) { perror("can not create vm"); return -1; } // 为 kvm 分配内存。通过系统调用. kvm->Ram_size = ram_size; kvm- > ram_start = (_ _ u64) mmap (NULL, kvm- > ram_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE,-1,0); if ((void *) kvm- > ram_start = = MAP_FAILED) {perror ("can not mmap ram"); return-1 } / / kvm- > mem structure needs to be passed to the KVM_SET_USER_MEMORY_REGION interface after initialization / / there is only one memory slot kvm- > mem.slot = 0; / / guest physical memory starting address kvm- > mem.guest_phys_addr = 0; / / Virtual machine memory size kvm- > mem.memory_size = kvm- > ram_size / / the user space address on host exists in the virtual machine. Here, bind memory to guest kvm- > mem.userspace_addr = kvm- > ram_start; / / call KVM_SET_USER_MEMORY_REGION to allocate memory for the virtual machine. Ret = ioctl (kvm- > vm_fd, KVM_SET_USER_MEMORY_REGION, & (kvm- > mem)); if (ret
< 0) { perror("can not set user memory region"); return ret; } return ret;} 接下来就是load_binary把二进制文件load到虚拟机的内存中来,在第一个demo中我们是直接把字节码放到了内存中,这里模拟镜像加载步骤,把二进制文件加载到内存中。 void load_binary(struct kvm *kvm) { int fd = open(BINARY_FILE, O_RDONLY); // 打开这个二进制文件(镜像) if (fd < 0) { fprintf(stderr, "can not open binary file\n"); exit(1); } int ret = 0; char *p = (char *)kvm->Ram_start; while (1) {ret = read (fd, p, 4096); / / load the mirror content into the memory of the virtual machine if (ret vcpu_id = 0; / / call KVM_CREATE_VCPU to get the vCPU handle and associate it to kvm- > vm_fd (returned by KVM_CREATE_VM) vcpu- > vcpu_fd = ioctl (kvm- > vm_fd, KVM_CREATE_VCPU, vcpu- > vcpu_id) If (vcpu- > vcpu_fd)
< 0) { perror("can not create vcpu"); return NULL; } // 获取KVM运行时结构大小 vcpu->Kvm_run_mmap_size = ioctl (kvm- > dev_fd, KVM_GET_VCPU_MMAP_SIZE, 0); if (vcpu- > kvm_run_mmap_size)
< 0) { perror("can not get vcpu mmsize"); return NULL; } printf("%d\n", vcpu->Kvm_run_mmap_size); / / map the memory of vcpu_fd to the vcpu- > kvm_run structure. Equivalent to an associated operation / / so that information such as the return value of vCPU can be obtained when the virtual machine exits vcpu- > kvm_run = mmap (NULL, vcpu- > kvm_run_mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, vcpu- > vcpu_fd, 0); if (vcpu- > kvm_run = = MAP_FAILED) {perror ("can not mmap kvm_run"); return NULL } / / set thread execution function vcpu- > vcpu_thread_func = fn; return vcpu;}
As the last step, after the above work is ready, start the virtual machine.
Void kvm_run_vm (struct kvm * kvm) {int I = 0; for (I = 0; I
< kvm->Vcpu_number; iThread +) {/ / starts the thread to execute vcpu_thread_func and passes the kvm structure as a parameter to the thread if (pthread_create (& (kvm- > vcpus- > vcpu_thread), (const pthread_attr_t *) NULL, kvm- > vcPUX [I] .vcpu _ thread_func, kvm)! = 0) {perror ("can not create kvm thread"); exit (1) } pthread_join (kvm- > vcpus- > vcpu_thread, NULL);}
To start a virtual machine is to create a thread and execute the corresponding thread callback function.
The thread callback function is passed in kvm_init_vcpu
Void * kvm_cpu_thread (void * data) {/ / get the parameter struct kvm * kvm = (struct kvm *) data; int ret = 0; / set the parameter kvm_reset_vcpu of KVM (kvm- > vcpus); while (1) {printf ("KVM start run\ n"); / / start the virtual machine, which already has memory and CPU, and is ready to run. Ret = ioctl (kvm- > vcpus- > vcpu_fd, KVM_RUN, 0); if (ret
< 0) { fprintf(stderr, "KVM_RUN failed\n"); exit(1); } // 前文 kvm_init_vcpu 函数中,将 kvm_run 关联了 vCPU 结构的内存 // 所以这里虚拟机退出的时候,可以获取到 exit_reason,虚拟机退出原因 switch (kvm->Vcpus- > kvm_run- > exit_reason) {case KVM_EXIT_UNKNOWN: printf ("KVM_EXIT_UNKNOWN\ n"); break; case KVM_EXIT_DEBUG: printf ("KVM_EXIT_DEBUG\ n"); break / / the virtual machine performs the IO operation, and the CPU in the virtual machine mode pauses the virtual machine and / / transfers the execution power to emulator case KVM_EXIT_IO: printf ("KVM_EXIT_IO\ n") Printf ("out port:% d, data:% d\ n", kvm- > vcpus- > kvm_run- > io.port, * (int *) ((char *) (kvm- > vcpus- > kvm_run) + kvm- > vcpus- > kvm_run- > io.data_offset)); sleep (1); break / / the virtual machine performed the memory map IO operation case KVM_EXIT_MMIO: printf ("KVM_EXIT_MMIO\ n"); break; case KVM_EXIT_INTR: printf ("KVM_EXIT_INTR\ n"); break; case KVM_EXIT_SHUTDOWN: printf ("KVM_EXIT_SHUTDOWN\ n") Goto exit_kvm; break; default: printf ("KVM PANIC\ n"); goto exit_kvm;}} exit_kvm: return 0;} void kvm_reset_vcpu (struct vcpu * vcpu) {if (vcpu- > vcpu_fd, KVM_GET_SREGS, & (vcpu- > sregs))
< 0) { perror("can not get sregs\n"); exit(1); } // #define CODE_START 0x1000 /* sregs 结构体 x86 struct kvm_sregs { struct kvm_segment cs, ds, es, fs, gs, ss; struct kvm_segment tr, ldt; struct kvm_dtable gdt, idt; __u64 cr0, cr2, cr3, cr4, cr8; __u64 efer; __u64 apic_base; __u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64]; }; */ // cs 为code start寄存器,存放了程序的起始地址 vcpu->Sregs.cs.selector = CODE_START; vcpu- > sregs.cs.base = CODE_START * 16; / / ss is the stack register, which stores the starting position of the stack vcpu- > sregs.ss.selector = CODE_START; vcpu- > sregs.ss.base = CODE_START * 16; / / ds is the data segment register and stores the data start address vcpu- > sregs.ds.selector = CODE_START; vcpu- > sregs.ds.base = CODE_START * 16 / / es is the additional segment register vcpu- > sregs.es.selector = CODE_START; vcpu- > sregs.es.base = CODE_START * 16; / / fs, gs is also the segment register vcpu- > sregs.fs.selector = CODE_START; vcpu- > sregs.fs.base = CODE_START * 16; vcpu- > sregs.gs.selector = CODE_START / / set the value if of the above registers for vCPU (ioctl (vcpu- > vcpu_fd, KVM_SET_SREGS, & vcpu- > sregs)
< 0) { perror("can not set sregs"); exit(1); } // 设置寄存器标志位 vcpu->Regs.rflags = 0x000000000000000002UL; / / rip indicates the starting pointer of the program, the address is 0x0000000 / / when loading the image, we read binary directly to the starting bit of the memory of the virtual machine / / so the virtual machine will run binary vcpu- > regs.rip = 0 directly at the beginning of the virtual machine; / / rsp is vcpu- > regs.rsp = 0xfffffff at the top of the stack; / / rbp is vcpu- > regs.rbp= 0 at the bottom of the stack If (ioctl (vcpu- > vcpu_fd, KVM_SET_REGS, & (vcpu- > regs))
< 0) { perror("KVM SET REGS\n"); exit(1); }} 运行一下结果,可以看到当虚拟机执行了指令 out %ax, $0x10 的时候,会引起虚拟机的退出,这是CPU虚拟化里面将要介绍的特殊机制。 宿主机获取到虚拟机退出的原因后,获取相应的输出。这里的步骤就类似于IO虚拟化,直接读取IO模块的内存,并输出结果。 ➜ kvmsample git:(master) ✗ ./kvmsampleread size: 712288KVM start runKVM_EXIT_IOout port: 16, data: 0KVM start runKVM_EXIT_IOout port: 16, data: 1KVM start runKVM_EXIT_IOout port: 16, data: 2KVM start runKVM_EXIT_IOout port: 16, data: 3KVM start runKVM_EXIT_IOout port: 16, data: 4... 虚拟机的启动过程基本上可以这么总结: 创建kvm句柄->Create vm- > allocate memory-> load image to memory-> start the thread to execute KVM_RUN. From the demo of this virtual machine, we can see that the memory of the virtual machine is mapped to the virtual machine by the host through the mmap call, and vCPU is a thread of the host. After the thread specifies the program load address of the virtual machine by setting the register of the corresponding vCPU, it starts to run the instructions of the virtual machine. When the virtual machine executes the IO operation, the CPU captures the interrupt and returns the execution power to the host.
Of course, the real qemu-kvm is much more complicated than this, including setting up the MMIO of many IO devices, setting up signal processing, and so on.
The above is the virtualization principle and QEMU startup process that Xiaobian shared for you. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.