In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly shows you the "sample analysis of reading and writing of linux block devices", which is easy to understand and well-organized. I hope it can help you solve your doubts. Let the editor lead you to study and study the "sample analysis of reading and writing of linux block devices".
1. The user mode program opens the specified block device through open (), falls into the kernel through the systemcall mechanism, and executes the blkdev_open () function, which is registered on the open in the file system method (file_operations). The bd_acquire () function is called in the blkdev_open function, and the bd_acquire function completes the conversion from the file system inode to the block device bdev. The specific conversion method is realized by hash lookup. After getting the bdev of the specific block device, call the do_open () function to complete the operation of opening the device. The open method of the block device driver registration is called in the do_open function as follows: gendisk- > fops- > open (bdev- > bd_inode, file).
2. The user program reads and writes the device through the read and write functions, and the file system will call the corresponding methods, usually calling the following two functions: generic_file_read and blkdev_file_write. Many strategies are adopted in the process of reading and writing. firstly, the reading process is analyzed.
3. The read function is called in the user mode, and the kernel executes generic_file_read. If it is not in direct io mode, the do_generic_file_read- > do_generic_mapping_read () function is called directly. In the do_generic_mapping_read function (the function is located in filemap.c), the first step is to find whether the data hits the Cache. If so, the data is returned directly to the user mode. Otherwise, a real read request is initiated through the address_space- > astatops-> readpage function. In the readpage function, construct a buffer_head, set the bh callback function end_buffer_async_read, and then call submit_bh to initiate the request. In the submit_bh function, the bio is constructed according to buffer_head, the callback function end_bio_bh_io_sync of bio is set, and the bio request is sent to the specified fast device through submit_bio.
4. If a write function is called in the user mode, and the kernel executes the blkdev_file_write function, if it is not in the direct io operation mode, then execute the buffered write operation procedure and call the generic_file_buffered_write function directly. The Buffered write operation method writes the data directly to Cache and performs the Cache replacement operation. During the replacement operation, you need to operate on the actual fast device. Address_space- > a_ops provides a method for block device operation. When the data is written to Cache, the write function can return, and most of the subsequent asynchronous write tasks are left to pdflush daemon (some of which are done during replacement).
5, data flow operation to this stage, we have a clear understanding of how the user's data to the kernel. The closest method to the user is file_operations, which is defined for each device type (because Linux treats all devices as files, file manipulation methods are defined for each type of device, for example, def_chr_fops for character devices, def_blk_fops for block devices, and bad_sock_fops for network devices). The underlying operation method for each device type is different, but the differentiation of the device type is masked by the file_operations method, which is why Linux can understand all devices as files. At this point, another question is raised: in this case, how should the differentiation of equipment be reflected? In the file system layer, the method of accessing the device of the file system is defined, which is called address_space_operations, and the file system can access the specific device through this method. For character devices, the address_space_operations method is not implemented, and it is not necessary, because the interface of the character device is the same as that of the file system. During the open operation of the character device, just replace the file_operations pointed to by inode with the file_operations pointed to by cdev. In this way, the user-level read-write character device can directly call the file_operations method in cdev.
6. Up to step (4), the read operation initiates the block device read request through the readpage function in the address_space_operations method without hitting the Cache; the write operation initiates the block device request when replacing Cache or Pdflush wake-up. The process of initiating block device requests is the same. First, the bio structure is built according to the requirements. The bio structure contains read and write address, length, destination device, callback function and other information. After the bio is constructed, the request is forwarded to the specific block device through a simple submit_bio function. As you can see here, the block device interface is simple, the interface method is submit_bio (the lower-level function is generic_make_request), and the data structure is struct bio.
7. The submit_bio function forwards bio,generic_make_request through generic_make_request is a loop that interacts with the block device through the Q-> make_request_fn function registered under each block device. If the accessed block device is a device with queue, the system's _ _ make_request function is registered with Q-> make_request_fn; otherwise, the block device registers a private method. In the private method, because there is no queue queue, the specific request will not be processed, but the bio will be forwarded by modifying the method in the bio. In the private make_request method, it will often return 1, telling the generic_make_request to continue forwarding than bio. There may be two kinds of execution contexts for Generic_make_request, one is the user context, and the other is the kernel thread context where the pdflush resides.
8. Through the continuous forwarding of generic_make_request, the final request must go to a block device with a queue queue, assuming that the final block device is a scsi disk (/ dev/sda). When generic_make_request forwards the request to sda, it calls _ _ make_request, which is the block device request handling function provided by Linux. An extremely important operation is implemented in this function, which is commonly referred to as IO Schedule. In this function, you try to transfer the forwarded bio merge to an existing request, and if you can merge, mount the new bio request to an existing request. If you cannot merge, assign a new request and add bio to it. After all this is done, it shows that the bio forwarded through generic_make_request has arrived at request, a site of the kernel, and found a temporary home. At this point, the operation of the physical device has not really been started. Before _ _ make_request exits, a sync tag in bio is judged. If the tag is valid, it means that the requested bio is a real-time operation and cannot stay in the kernel, so the _ _ generic_unplug_device function is called, which will trigger the next stage of operation. If the flag is invalid, the request needs to stay in the queue queue for a while until the queue queue triggers the alarm clock before triggering the next phase of the operation. The _ _ make_request function returns 0, telling generic_make_request that there is no need to forward bio, and bio forwarding is over.
9. So far, the bio sent from the file system (pdflush or address_space_operations) has been merge to the request queue. If it is sync bio, call _ _ generic_unplug_device directly, otherwise you need to execute Q-> unplug_fn in the soft interrupt context of unplug timer. The processing method of subsequent request should be related to specific physical devices, but how to reflect the differences between different physical devices on standard block devices? This difference is reflected in the queue queue method, different physical devices, queue queue method is different. The sda in the example is a scsi device where scsi middle level registers the scsi_request_fn function with the request_fn method of the queue queue. The specific handling function of the request queue, Q-> request_fn, is called in the Q-> unplug_fn (specific method: generic_unplug_device) function. Ok, to this point, has actually linked the block device layer with the scsi bus driver layer, and their interface method is request_fn (the specific function is scsi_request_fn).
10. After understanding point (9), the next process is actually related to the specific scsi bus operation. In the scsi_request_fn function, the request queue is scanned and a request is obtained from the queue through the elv_next_request function. In the elv_next_request function, the concrete request is converted into the scsi command that can be recognized by the scsi driver through the Q-> prep_rq_fn registered in the scsi bus layer (the scsi layer is registered as scsi_prep_fn). After getting a request, the scsi_request_fn function directly calls the scsi_dispatch_cmd function to send the scsi command to a specific scsi host. At this point, there is a question: which scsi host does the scsi command forward to? The secret is that in Q-> queuedata, when assigning queue queues to sda devices, you have specified the relationship between the sda block device and the underlying scsi device (scsi device), and their relationship is maintained through request queue.
11. In the scsi_dispatch_cmd function, send scsi command to scsi host through the interface method queuecommand of scsi host. Usually, the queuecommand method of scsi host will hang the received scsi command into the queue maintained by itself, and then start the DMA process to send the data in the scsi command to the specific disk. After the DMA is finished, the DMA controller interrupts the CPU, tells the CPU DMA that the process is over, and sets the lower part of the interrupt where the DMA ends in the interrupt context. When the DMA interrupt service program returns, the soft interrupt is triggered and the lower part of the SCSI interrupt is executed.
12. In the second half of the SCSi interrupt, call the callback function at the end of the scsi command. This function is usually scsi_done. When the scsi_done function calls the blk_complete_request function to end the request request, each request maintains a bio chain, so the bio callback function in each request is called back to end the specific bio. Bio also has buffer head generation of the file system, so at the end of bio, the callback handler function bio- > bi_end_io (registered as end_bio_bh_io_sync) of buffer_head is called back. Since then, a series of callbacks caused by interrupts have ended. To sum up, the callback process is as follows: scsi_done- > end_request- > end_bio- > end_bufferhead.
13. After the callback ends, the read and write operation caused by the file system ends.
The above is all the contents of the article "sample Analysis of linux Block device Reading and Writing". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.