How to introduce Kernel into Linux Block layer Multi-queue 07/11 Update SLTechnology News&Howtos

How to introduce Kernel into Linux Block layer Multi-queue

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article focuses on "how to introduce the kernel into Linux block-layer multi-queue". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "how to introduce the kernel into the Linux block layer multi-queue".

First, take a look at the multi-queue architecture:

Take read IO as an example, single queue and multiple queues have the same execution path:

Read_pages () {... Blk_start_plug () / * process prepares current storage * / mapping- > BLK_MAX_REQUEST_COUNT-> readpages () / * current storage * / blk_finish_plug () / * process starts to drain * /...} io_schedule () process waits for io to complete (in the blk_mq_make_request () function, the number of request is greater than or equal to 16 request_count > = BLK_MAX_REQUEST_COUNT does not need to call io_schedule () Directly drain to the block device driver)

Mapping- > astatops-> readpages () will always call Q-> make_request_fn ()

Generic_make_request () Q-> make_request_fn () calls blk_queue_bio () or multi-queue blk_queue_make_request () _ _ elv_add_request ()

Why multiple queues are introduced: compared with single queues, there is a soft queue (represented by blk_mq_ctx structure) on each cpu to avoid using spinlock locks when inserting request, and today's high-speed storage devices, such as ssd that supports nvme (I just bought one, is really fast), the access latency is very small. And the hardware itself supports multi-queues (the introduced multi-queues replace request_queue- > delay_work with each hardware queue hctx- > delayed_work) the previous single-queue architecture can no longer drain its performance and has become an encumbrance. When a single queue inserts request and leaks to a block device driver, there is always a global spinlock lock on the request_queue, which makes people want to directly bypass the block layer.

The global spinlock lock on request_queue is used when a single queue is inserted into request

Blk_queue_bio () {... Spin_lock_irq (Q-> queue_lock); elv_merge () spin_lock_irq (Q-> queue_lock);...}

The global spinlock lock on request_queue is also used when a single queue is leaked to a block device driver:

Struct request_queue * blk_alloc_queue_node () INIT_DELAYED_WORK (& Q-> delay_work, blk_delay_work); blk_delay_work () _ blk_run_queue () Q-> request_fn (Q)

The _ _ blk_run_queue () function must be in the queue lock, that is, spin_lock_irq (Q-> queue_lock)

Blk_run_queue-run a single device queue 282 * @ Q: The queue to run 283 * 284 * Description: 285 * See @ blk_run_queue. This variant must be called with the queue lock 286 * held and interrupts disabled. 287 * / 288 void _ blk_run_queue (struct request_queue * Q) 289 {290 if (unlikely (blk_queue_stopped (Q) 291 return; 292 293 _ blk_run_queue_uncond (Q); 294}

The spinlock lock is not used when multiple queues are inserted into request:

Blk_mq_insert_requests () _ blk_mq_insert_request () struct blk_mq_ctx * ctx = rq- > mq_ctx; (blk_mq_ctx on each cpu) list_add_tail (& rq- > queuelist, & ctx- > rq_list)

The multi-queue drain to the block device driver also does not use spinlock locks:

Static int blk_mq_init_hw_queues () INIT_DELAYED_WORK (& hctx- > delayed_work, blk_mq_work_fn); 708 static void blk_mq_work_fn (struct work_struct * work) 709 {710 struct blk_mq_hw_ctx * hctx; 711712 hctx = container_of (work, struct blk_mq_hw_ctx, delayed_work.work) 713 _ _ blk_mq_run_hw_queue (hctx); 714} _ blk_mq_run_hw_queue () has no spinlock lock Q-> mq_ops- > queue_rq (hctx, rq); execute the-> queue_rq () callback function on multiple queues

The performance improvement of the system after using multiple queues can be seen from the following figure:

(I have not tested the performance myself, so it should be consistent with the following figure in terms of objective imagination:)

At this point, I believe you have a deeper understanding of "how to introduce the kernel into Linux block-layer multi-queue". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.