What is the IO path and scheduling strategy in Linux block devices? 07/03 Update SLTechnology News&Howtos

What is the IO path and scheduling strategy in Linux block devices?

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Most people do not understand the knowledge points of this article "what is the IO path and scheduling strategy in Linux block equipment", so the editor summarizes the following content, detailed content, clear steps, and has a certain reference value. I hope you can get something after reading this article. Let's take a look at this article "what is the IO path and scheduling strategy in Linux block equipment".

When the file system is submitted to the IO through submit_bio, the request goes into the common block layer. The general block layer will preprocess the IO in order to ensure that the request can be sent to the underlying disk device more reasonably and ensure the performance as much as possible. The most important one is the IO scheduling module. You may have heard of CFQ, and before that, there are DeadLine and Noop, which are disk scheduling algorithms. Among them, CFQ scheduling algorithm is the most used.

If you ignore the cascading structure and various mappings of block devices, the simplified structure has about three layers, as shown in figure 1. The three layers here are not all software, but also hardware. There is no need to say much about the general block layer, which mainly completes the merging and scheduling of IO. Below it is the driver layer, which is the driver of the hardware, which is used to convert the IO request into the operation of the hardware register (Note: different block devices are different, so it is inevitable that the iSCSI device will not have register operation). The program of this driver layer varies with different physical devices. For example, for SAS directly connected disks, the program of this driver layer is the SAS driver, while if it is the FC-SAN connected to the FC-HBA card, then this driver layer is the FC driver (such as the Qlogic driver).

Figure 1 device layering

The lowest layer is the device layer, which is usually a hardware device. There are many kinds of hardware here, such as SAS card, SATA card, FC- HBA card, iSCSI-HBA card and so on. But sometimes it may not be a hardware device, for example, for iSCSI, this layer may be a device layer simulated by software, and its request may be sent to the target side through the network card.

Main data structure and flow

The vast majority of programs are composed of data structure and algorithm, the data structure is equivalent to the skeleton of the program, and the algorithm is the muscle and flesh of the program. The data structure is associated with the algorithm to form a complete whole. The law of human cognition is from concrete to abstract, from simple to complex, so we start with the data structure. By understanding the key data structures of the data, we can more easily understand the whole logic of the block device IO.

The most critical data structure in the block device IO is request_queue, which is the request queue. The schematic diagram of the data structure is shown in figure 2. The data structure itself is very complex, and we have simplified it here, leaving only some key members. In the color part of the picture, there are two function pointers, which are used to receive and process requests, respectively.

Figure 2 request queue data structure

To make it easier to understand, let's give an example here. Take the NBD block device as an example, make_request_fn is initialized to blk_queue_bio,request_fn and initialized to do_nbd_request when the block device is initialized. For SCSI block devices, request_fn is initialized to scsi_request_fn.

With the knowledge of the above data structure and the results of the initialization of the key members, we can then analyze the details of the whole process of the block device. The entry of the block device request is submit_bio, which is called after a simple check.

From the above code, you can see that the entry function that IO handles is actually the function pointer make_request_fn, which we know is actually the function blk_queue_bio. Therefore, requests for block devices are processed by the blk_queue_bio function.

Disk scheduling strategy

The Linux kernel provides great flexibility in designing disk scheduling strategies. The disk scheduling strategy is to register the plug-in into the kernel, that is, the user is free to choose the disk scheduling strategy.

The idea of the scheduling algorithm is actually very simple, mainly through sorting, merging and batch processing of IO to optimize the processing time of disk seek and requests. It is worth explaining that the current scheduling algorithm is actually more aimed at mechanical disks, because the positioning time of mechanical disk heads accounts for a large proportion of the total IO processing time. Of course, for SSD disks, the scheduling algorithm also has some help, which needs to be specific to the characteristics of IO.

Fig. 3 scheduling policy structure

The structural definition of the disk scheduling policy is shown in figure 3, and the meaning of each variable is relatively clear, so I will not repeat it in this article. This paper mainly takes a look at the variable ops of elevator_ops type, which is the specific functional implementation of the scheduling policy, and any scheduling algorithm has to implement some of these functions.

The scheduling strategy is implemented through these callback functions. In order to understand what the function set of the scheduling strategy does, this paper collates a table, and we first take a look at what each function does as a whole. For the scheduling strategy, not every function here has to be implemented, and only those with * in the following table are necessary.

In short, the function of the above callback function is to determine whether the request can be merged, perform merge, request dispatch, and so on. There are many callback functions mentioned above, and the usage scenarios are complex, and the specific use is scattered in many processes of the scheduler. Therefore, it is difficult for us to introduce all the scenarios at once. In order to more intuitively understand the role of the above callback function, we take the Deadline scheduling strategy as an example for a simple introduction.

As shown in figure 4, the callback function initialized by Deadline, you can see that not all callback functions are initialized here, but only 9 of the 16 callback functions are initialized.

Figure 4 Deadline callback function

Let's analyze the call scenario of the function in detail. earlier, we introduced the elevator_merge_fn function to query requests that can be merged with bio. The entire call stack is shown in figure 5, and the entry is blk_queue_bio, which is the entry of the scheduler, which we introduced earlier. This function calls elv_merge to find out if there are any requests that can be merged and returns. The elv_merge function calls the callback function provided by the official Deadline scheduler. After completing the judgment, the function will return the request (or not found or not returned) and the direction in which it can be merged (such as forward merge, backward merge, etc.) according to the actual situation, and the subsequent process is to carry out specific merge operations.

Figure 5 function call stack

The above is the content of this article on "what is the IO path and scheduling strategy in Linux block devices". I believe we all have a certain understanding. I hope the content shared by the editor will be helpful to you. If you want to know more about the relevant knowledge, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.