In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
How to achieve linux block equipment IO stack analysis, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.
Analysis of IO Stack of linux Block equipment
Block storage is simply the use of block devices to provide storage services for the system. There are many types of block storage, including stand-alone block storage, network storage (such as NAS,SAN, etc.), distributed block storage (such as AWS's EBS, Qingyun's cloud disk, Aliyun's cloud disk, NetEase cloud disk, etc.). Usually the representation of block storage is a device, and what the user sees is a logical device like sda,sdb. This paper mainly introduces the Linux block device, and analyzes the Linux block device Imax O stack.
1. Basic concepts of block equipment
Block devices store information in fixed-size blocks, each with its own address. For the operating system, the block device is represented by the appearance of the character device, such as / dev/sda. Although the character device can be accessed in bytes, it is actually in blocks (the smallest 512byte, that is, a sector) on the block device, and the conversion between them is implemented by the operating system.
Here are some basic concepts of block devices:
1) sector: a sector-shaped area on a disk that logicalizes data and facilitates the management of disk space. It is the basic unit of data transmission for hardware devices. General 512Byte
2) Block: block is the basic unit of VFS and file system data transfer. It must be an integral multiple of the sector. When formatting a file system, you can specify the block size (generally 5121.1024,2048,4096 bytes).
3) Segment: a memory page or part of a memory page that contains data from some adjacent disk sectors; each I RAM O operation of the disk is to transfer the contents of some adjacent sectors to each other between the disk and some RAM units. In most cases, the disk controller uses DMA for data transfer. If the corresponding page frames of different segments are contiguous in RAM and the corresponding blocks of data are contiguous on disk, they can be merged at the generic block layer to create a larger area of memory, which is called physical segment.
Typically, we access the block device through the file system, or we can directly use the bare device to read and write the bare device by specifying the offset and size.
The common block storage device is the physical disk. Under the Linux system, it also provides logical devices based on other block devices, such as Device Mapper, soft RAID and so on.
two。 Basic Concepts of Block device I Pot O Stack 2.1
Before we introduce the Istroke O stack of block devices, let's take a look at a few basic concepts of the block Ithumb O stack.
1) bio:bio is the data structure of the general block layer Ibio:bio O request, which represents the Imax O request submitted by the upper layer. A bio contains multiple page, and these page must correspond to a contiguous space on the disk. Since the files are not stored continuously on disk, it is very likely that the file Imax O will be split into multiple bio structures before it is submitted to the block device.
2) request: indicates the block device driver layer Icano request, which will be sent to the block device driver layer for processing after it is converted by the Icano scheduling layer.
3) request_queue: maintain the queue of device driver layer I request O requests, all request are inserted into this queue, and each disk device has only one queue (there is only one for multiple partitions)
The relationship between these three structures is shown below: a request_queue contains multiple request, each request may contain multiple bio, and the requested merge is to add multiple bio to the same requesst according to various principles.
2.2 how the request is handled
As shown in the figure, this is the block device's Imax O stack, where the red text represents a function of the key Imax O path.
In order to describe the Direct O processing flow, let's first introduce the difference between a cache ID O and a cache ID O:
1) Direct Direct O bypasses page cache, while the cache Imax O is written to page cache to indicate that the write request is completed, and then the file system's dirty page mechanism brushes the data to disk. Therefore, with the use of the cache Istroke O, it is possible that the dirty data in the page cache is not brushed to the disk when the power is off, resulting in data loss.
2) in the DMA mechanism, the data can be read directly from disk to page cache or written back to disk directly from page cache, instead of being transferred directly between the application address space and disk. In this case, the data needs to be copied multiple times between the application address space and page cache during the transfer process. The CPU and memory overhead caused by these data copy operations is very large. The advantage of Direct I Pot O is that it reduces the use of CPU and the occupation of memory bandwidth when reading and writing files by reducing the number of data copies in the kernel buffer and application address space of the operating system. However, the read operation of Direct I Zero O can not get data from page cache, which will directly read from the disk, resulting in performance loss. It is common to use Direct I _ peg O in combination with asynchronous I _ pare O to improve performance.
3) Direct iCandle O requires buffer alignment in user mode.
4) Direct iCandle O is generally used for applications that need to manage their own cache, such as database systems.
The logic of the read and write process of Icano is complicated. Here, the write process is briefly described as follows:
1) the user calls the system call write to write a file, which will be called to the sys_write function
2) through the VFS virtual file system layer, call vfs_write. If it is a cache write mode, write to page cache, and then return, followed by the process of scrubbing dirty pages; if it is the way of Direct I hand O, you will go to the process of do_blockdev_direct_IO.
3) if the operating device is a logical device such as a LVM,MDRAID device, it will enter the corresponding kernel module processing function for some processing. Otherwise, the bio request will be constructed directly, and the submit_bio will be called to send the request to the specific block device. The submit_bio function forwards the bio,generic_make_request through generic_make_request is a loop, which interacts with the block device through the Q-> make_request_fn function registered under each block device.
4) the request is sent to the underlying block device, and the block device request handling function _ _ make_request is called for processing. In this function, blk_queue_bio is called, which is to merge bio into request, that is, the specific implementation of the bio O scheduler: if the areas to be read and written by bio are continuous, merge them into a request;, otherwise create a new request and hang yourself under this request. There is also a limit to merging bio requests. If the merged request exceeds the threshold (set in / sys/block/xxx/queue/max_sectors_kb), it can no longer be merged into a request, but a new request will be allocated.
5) the next scsi_request_fn O operation is related to the specific physical device and is handled by the corresponding block device driver. Here, taking the scsi device as an example, the handling function Q-> request_fn of the queue queue is driven by the scsi function. The request is constructed to send scsi instructions to the scsi device for processing. After the processing is completed, the callback functions of each layer will be called in turn to process the completed state. Finally, it is returned to the upper-level user.
2.3 request-based and bio-based
In the Istroke O processing flow of block devices, two different processing methods are involved:
1) request-based: in this way, the process of merging bio into request (that is, Imax O scheduling merge) will be carried out, and then the request will be sent to the physical device. The physical disks currently used are all request-based devices.
2) bio-based: process it in the request processing function make_request_fn defined by the logic device itself, and then call generic_make_request to send it to the underlying device. Ramdisk devices, most Device Mapper devices, virtio-blk are bio-based
The following figure illustrates the difference between request-based and bio-based processing processes from the perspective of Device Mapper.
It is important to note that at present, only the multipath plug-in in Device mapper is request-based, and others, such as linear,strip, are all bio-based, so if it is a file system created on a linear DM device, the files in this file system will be read and written, and when using cache Icando O, even if the request is continuous when brushing dirty pages, it will not be merged on the DM device, only on the devices at the bottom (such as / dev/sdb).
Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.