In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
It is believed that many inexperienced people are at a loss about what block device IO virtualization is like in the principle of KVM virtualization. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.
Introduction to IO Virtualization of Block Devices
As another important virtualization resource, the virtualization of block device IO is also very important. Similar to network IO virtualization, block device IO also has full virtualization and virtio virtualization (virtio-blk). The working mode of modern block devices is based on DMA, so the way of full virtualization is similar to that of network devices, and the same way of virtio-blk virtualization is the same as the design of virtio-net, except that there are differences on the virtio backend side.
Traditional block equipment architecture block device IO protocol stack
As shown in the figure above, we think of the process of block device IO as a TCP/IP protocol stack, starting at the top.
In the Page cache layer, if it is an indirect IO, the write operation will be returned after it is written to this level if there is enough memory. In the IO process, it belongs to the writeback mode. There are two options when persistence is needed, one is to call the flush operation displayed, so that the cache of this file (in files) will be synchronized to disk, and the other is to wait for the system to automatically flush.
VFS, which is what we usually call the virtual file system layer, provides us with a unified system call for the upper layer. After our commonly used create,open,read,write,close is converted into system call, it all interacts with the VFS layer. VFS not only provides a unified interface for upper system calls, but also organizes the file system structure and defines the file data structure, such as finding dentry and finding the corresponding file information according to inode, and finding the data structure struct file that describes a file. A file is actually a description of a pile of scattered data stored on disk. On Linux, a file is represented by an inode. Inode is the unique identity of each file in the eyes of the system administrator, and in the system, inode is a structure that stores most of the information about the file. This data structure has several callback operations that are provided to different file systems for adaptation. The lower file system needs to implement several interfaces of file_operation, do specific data read and write operations and so on.
Struct file_operations {/ / File read operation ssize_t (* read) (struct file *, char _ _ user *, size_t, loff_t *); / / File write operation ssize_t (* write) (struct file *, const char _ _ user *, size_t, loff_t *); int (* readdir) (struct file *, void *, filldir_t) / / File opening operation int (* open) (struct inode *, struct file *);}
Further down is for different file system layers, such as our ext3,ext4 and so on. The interfaces that need to be implemented for several file systems that we say in the VFS layer are really implemented at this layer. The file system of this layer does not manipulate files directly, but interacts with the lower layer of common block devices, so why abstract a layer of general block devices? Our file system is suitable for different device types, such as an SSD disk or a USB device. Different devices have different drivers. There is no need for the file system to adapt to each different device. It only needs to be compatible with the interface of the general fast device layer.
Below the file system layer is the general fast device layer, which belongs to the interface layer in programming, which is used to shield the abstractions made by different fast devices at the bottom and provide a unified interface for the upper file system.
The lower layer of the general fast equipment is the IO dispatching layer. You can see the IO scheduling algorithm of the system with the following command.
➜~ cat / sys/block/sda/queue/schedulernoop deadline [cfq]
Noop, which can be thought of as FIFO (first-in, first-out queue), does some simple merging of IO, such as merging operations on the same file. This algorithm is suitable for block devices such as SSD disks that do not need to seek.
Cfq, completely fair queue. The design of this algorithm is guaranteed at the process level, that is to say, the fair object is each process. The system allocates N queues for this algorithm to save requests from different processes. When a process has IO requests, it will be hashed to different queues. The hashing algorithm is consistent, and requests from the same process are always hashed to the same queue. Then the system completes the actual disk read and write according to the IO requests of the N queues according to the time slice rotation.
Deadline, based on Linux's elevator scheduling algorithm, adds two queues to handle IO requests that are about to time out or time out. These two queues have a higher priority than other queues, so it avoids the occurrence of IO hunger.
The block device driver layer is the real driver layer for different block devices. The block device driver layer completes the memory mapping of the block device, handles the interrupt of the block device, and completes the read and write of the block device.
Block devices are real storage devices, including SAS,SATA,SSD, and so on. Block devices may also have cache, generally known as Disk cache, for the driver layer, the existence of cache is very important, for example, in writeback mode, the driver layer only needs to write to the Disk cache layer to return, and the block device layer ensures data persistence and consistency.
Usually block devices with Disk cache have battery management, so make sure that the contents of cache can be maintained for a period of time when the power is off, and write the contents of cache to disk the next time you boot.
Block device IO process
The read and write operations of the application layer are completed by the system call read,write and the system call interface provided by Linux VFS, which shields the complex operations of the lower block devices. Write operations can be divided into direct IO and indirect IO (buffer IO). Write operations of indirect IO are directly written to page cache and returned. Subsequent data depends on the flush operation of the system. If a power outage occurs when the flush operation is not completed, some data may be lost. Direct IO (Direct IO), bypassing the page cache, the data must reach the disk before the operation is returned to IO.
The logic of the read and write process of Icano is complicated. Here, the write process is briefly described as follows:
As shown in the figure above, when the IO device is initialized, a portion of the physical memory is allocated to the IO device, which can be managed as shared memory by the MMU of the CPU and the IOMMU connected to the IO bus. Take a read operation as an example. When CPU needs to read something from a block device, CPU will tell the device the memory address, the size and the address of the block device to be read through an interrupt, and then CPU returns. After the block device has finished reading the data, it writes the data to the shared memory, and notifies CPU IO of the completion of the process by interrupt, and sets the memory address, and then CPU reads the data directly from memory.
Write requests are similar, all through shared memory, which frees up CPU, eliminates the need for CPU synchronization to wait for IO to complete and does not require CPU to do too many operations.
Because the virtualization of block device IO needs to go through two IO protocol stacks, one Guest and one HV. So we need to make the block device IO protocol stack a little more specific.
At this point, the basic introduction of the IO layer of Linux block equipment is complete, the above content is only to do a simple introduction, this part of the content can be very in-depth understanding, in this limited space, do not do too much introduction.
Block device IO Virtualization
The full virtualization of block devices is similar to that of DMA devices in the network IO, so I won't talk too much about it here, mainly about virtio-blk.
As shown in the figure above, blue indicates writethrough, yellow indicates none, and red indicates writeback. The dotted line indicates which level to write to and then the write call returns.
Cache=writethrough (blue line)
It means that v-backend uses an indirect IO+flush operation when opening the image file and writing it, that is, each time it is written to Page cache and flush, until the write call returns after the data is actually written to disk, which will inevitably lead to slower data writing, but the advantage is higher security.
Cache=none (yellow line)
Cache for none mode indicates that v-backend uses DIRECT_IO when writing files, which will bypass HV's Page cache and write directly to disk. If the disk has Disk cache, writing Disk cache will be returned. The advantage of this mode is that it can ensure the security of data while ensuring performance, in the case of using Disk cache battery. However, for read operations, there is a performance impact because the HV Page cache is not written.
Cache=writeback (red line)
This mode means that v-backend uses indirect IO and returns after writing to the Page of HV, which may result in data loss.
The virtualization mode of block device IO is also a unified virtio-x mode, but virtio-blk needs to go through the IO protocol stack twice, which brings unnecessary overhead.
After reading the above, have you mastered the method of block device IO virtualization in the principle of KVM virtualization? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.