Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the principle of direct iDUBO in Linux

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

The main purpose of this article is to show you "what is the principle of direct Linux". It is easy to understand and well organized. I hope it can help you solve your doubts. Let me lead you to study and learn this article "what is the principle of direct Iphazo in Linux".

What is Buffered iCandle O (cache iUnip O)

The default operation for most file systems is cache Icano, which is also known as the standard Icano. In Linux's cache iMab O mechanism, the operating system caches the data in the page cache (page cache) of the file system, that is, the data is copied into the buffer of the operating system kernel before it is copied from the buffer of the operating system kernel to the address space of the application. The process of writing is the reverse direction of the data flow. Caching Iripple O has the following advantages:

The cache Iswap O uses the operating system kernel buffer, which separates the application space from the actual physical device to some extent.

Caching Iripo can improve performance by reducing the number of disk reads.

For read operations: when the application is about to read a piece of data, if the data is already in the page cache, it is returned. It does not need to be read by the hard disk. If this piece of data is not in the page cache, you need to read the data from the hard disk to the page cache.

For write operations: the application writes the data to the page cache first, and whether the data is written to disk immediately depends on the write mechanism used:

Synchronization mechanism, the data will be written to disk immediately, and the write interface will not return until the data has been written.

Delay mechanism: the write interface returns immediately, and the operating system periodically brushes the data in the page cache to the hard disk. So there is a risk of data loss in this mechanism. Imagine that when the write interface returns, the data cached on the page has not yet been brushed to the hard disk, just as the power is off. For applications, it is assumed that the data is already on the hard disk.

Cache the write operation of IPUBO

In the mechanism of caching I _ UGO, for example, the data is copied from the user mode to the page cache in kernel mode, and then written to disk from the page cache. The CPU and memory overhead caused by these copy operations is very large.

For some special applications, better performance can be achieved by bypassing kernel buffers, which is the point of the emergence of direct Igamo.

Direct Icano write operation

Direct Icano introduces that data is usually transmitted by direct Ipicuro, and the data is written directly from the user-state address space to the disk, skipping the kernel buffer directly. For some applications, such as databases. They prefer their own caching mechanism, which can provide a better buffering mechanism to improve database read and write performance. The direct Imap O write operation is shown in the figure above.

Design and implementation of Direct Icano in a block device, the process must set the access mode to O_DIRECT when opening the file, which is tantamount to telling the operating system process to use read () or write () system call to read and write files using the direct Icando O mode, and the transferred data does not pass through the operating system kernel cache space. You must pay attention to the buffer alignment (buffer alignment) and the size of the buffer, that is, the second and third parameters of the read () and write () system calls. The alignment here refers to the alignment of the file system block size, and the buffer size must also be an integral multiple of the block size.

Here are three main functions: open (), read (), and write (). There is a variety of file access in Linux, so these three functions define different processing methods for dealing with different file access methods. This article mainly introduces their functions and functions related to the direct Icano mode. First of all, let's take a look at the open () system call. The function prototype is as follows:

Int open (const char * pathname, int oflag, … / *, mode_t mode * /)

When an application needs to access a file directly without going through the operating system page cache, it needs to specify the O_DIRECT identifier when it opens the file.

The kernel function in the operating system kernel that handles open () system calls is sys_open (), and sys_open () calls do_sys_open () to handle the main open operation. It mainly does three things:

Call getname () to read the pathname of the file from the process address space

Do_sys_open () calls get_unused_fd () to find a free file table pointer from the process's file table, and the corresponding new file descriptor is stored in the local variable fd.

The function do_filp_open () performs the corresponding open operation based on the parameters passed in.

One of the main function diagrams that handle open () system calls in the operating system kernel is listed below.

Sys_open () |-do_sys_open () |-getname () |-get_unused_fd () |-do_filp_open () |-nameidata_to_filp () |-_ _ dentry_open ()

The function do_flip_open () will call the function nameidata_to_filp () during execution, and nameidata_to_filp () will eventually call the _ _ dentry_open () function. If the process specifies the O_DIRECT identifier, the function will check whether the direct I.Unip O operation can act on the file. The code in the _ _ dentry_open () function related to the direct Istroke O operation is listed below.

If (f-> f_flags & O_DIRECT) {if (! f-> flips mapping-> a_ops | (! f-> flips mapping-> astatops-> direct_IO) & & (! f-> flips mapping-> astatops-> get_xip_page) {fput (f); f = ERR_PTR (- EINVAL);}}

When the O_DIRECT identifier is specified when the file is opened, the operating system will know that the next read or write to the file is to use the direct IBO mode.

Let's take a look at what the system does when the process reads a file with the O_DIRECT identifier set through the read () system call. The prototype of the function read () is as follows:

Ssize_t read (int feledes, void * buff, size_t nbytes)

The entry function that handles the read () function in the operating system is sys_read (), and its main calling function diagram is as follows:

Sys_read () |-vfs_read () |-generic_file_read () |-generic_file_aio_read () |-generic_file_direct_IO ()

After getting the file descriptor and the current operation location of the file from the process, the function sys_read () calls the vfs_read () function to perform the specific operation, and the vfs_read () function finally calls the relevant operations in the file structure to complete the file read operation, that is, the generic_file_read () function is called. The code is as follows:

Ssize_t generic_file_read (struct file * filp, char _ user * buf, size_t count, loff_t * ppos) {struct iovec local_iov = {.iov _ base = buf, .iov _ len = count}; struct kiocb kiocb; ssize_t ret; init_sync_kiocb (& kiocb, filp); ret = _ _ generic_file_aio_read (& kiocb, & local_iov, 1, ppos) If (- EIOCBQUEUED = = ret) ret = wait_on_sync_kiocb (& kiocb); return ret;}

The function generic_file_read () initializes the iovec and the kiocb descriptor. The descriptor iovec is mainly used to store two contents: the address of the user address space buffer used to receive the read data and the size of the buffer; and the descriptor kiocb is used to track the completion status of the Imax O operation. After that, the function generic_file_read () uses the function _ _ generic_file_aio_read (). This function checks whether the user address space buffer described in iovec is available, then checks the access mode, and executes the code associated with direct O_DIRECT if the access mode descriptor is set. The code in the function _ _ generic_file_aio_read () related to the direct Icano is as follows:

If (filp- > f_flags & O_DIRECT) {loff_t pos = * ppos, size; struct address_space * mapping; struct inode * inode; mapping = filp- > frankmapping; inode = mapping- > host; retval = 0; if (! count) goto out; size = i_size_read (inode); if (pos)

< size) { retval = generic_file_direct_IO(READ, iocb, iov, pos, nr_segs); if (retval >

0 & &! is_sync_kiocb (iocb) retval =-EIOCBQUEUED; if (retval > 0) * ppos = pos + retval;} file_accessed (filp); goto out;}

The above code snippet mainly checks the value of the file pointer, the size of the file, and the number of bytes requested to read, etc., after that, the function calls generic_file_direct_io () and passes it the operation type READ, descriptor iocb, descriptor iovec, the value of the current file pointer and the number of user address space buffers specified in the descriptor io_vec as parameters. When the generic_file_direct_io () function completes, the function _ _ generic_file_aio_read () continues to complete the subsequent operations: update the file pointer, set the timestamp to access the file I node; when all these operations are completed, the function returns. The function generic_file_direct_IO () takes five parameters, and the meaning of each parameter is as follows:

Rw: operation type, which can be READ or WRITE

Iocb: pointer to the kiocb descriptor

Iov: pointer to an array of iovec descriptors

Offset:file structure offset

The number of iovec in the nr_segs:iov array

The code for the function generic_file_direct_IO () is as follows:

Static ssize_t generic_file_direct_IO (int rw, struct kiocb * iocb, const struct iovec * iov, loff_t offset, unsigned long nr_segs) {struct file * file = iocb- > ki_filp; struct address_space * mapping = file- > fenestration mapping; ssize_t retval; size_t write_len = 0; if (rw = = WRITE) {write_len = iov_length (iov, nr_segs) If (mapping_mapped (mapping)) unmap_mapping_range (mapping, offset, write_len, 0);} retval = filemap_write_and_wait (mapping); if (retval = 0) {retval = mapping- > apocops-> direct_IO (rw, iocb, iov, offset, nr_segs); if (rw = = WRITE & & mapping- > nrpages) {pgoff_t end = (offset + write_len-1) > > PAGE_CACHE_SHIFT Int err = invalidate_inode_pages2_range (mapping, offset > > PAGE_CACHE_SHIFT, end); if (err) retval = err;}} return retval;}

The function generic_file_direct_IO () does some special handling for the type of WRITE operation. In addition, it mainly calls the direct_IO method to perform direct read or write operations. Brush the relevant dirty data in the page cache back to the disk before performing the direct Ihammer O read operation, which ensures that the data is read from the disk. The direct_IO method here eventually corresponds to the _ _ blockdev_direct_IO () function. The code for the _ _ blockdev_direct_IO () function is as follows:

Ssize_t _ blockdev_direct_IO (int rw, struct kiocb * iocb, struct inode * inode, struct block_device * bdev, const struct iovec * iov, loff_t offset, unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io, int dio_lock_type) {int seg; size_t size; unsigned long addr; unsigned blkbits = inode- > iTunblkbits; unsigned bdev_blkbits = 0; unsigned blocksize_mask = (1 ki_filp- > f_mapping) If (dio_lock_type! = DIO_OWN_LOCKING) {mutex_lock (& inode- > i_mutex); release_i_mutex = 1;} retval = filemap_write_and_wait_range (mapping, offset, end-1); if (retval) {kfree (dio); goto out;} if (dio_lock_type = = DIO_OWN_LOCKING) {mutex_unlock (& inode- > i_mutex); acquire_i_mutex = 1 } if (dio_lock_type = = DIO_LOCKING) down_read_non_owner (& inode- > i_alloc_sem);} dio- > is_async =! is_sync_kiocb (iocb) & ((rw & WRITE) & & (end > i_size_read (inode); retval = direct_io_worker (rw, iocb, inode, iov, offset, nr_segs, blkbits, get_block, end_io, dio) If (rw = = READ & & dio_lock_type = = DIO_LOCKING) release_i_mutex = 0; out: if (release_i_mutex) mutex_unlock (& inode- > i_mutex); else if (acquire_i_mutex) mutex_lock (& inode- > i_mutex); return retval;}

This function splits the data to be read or written and checks the buffer alignment. When introducing the open () function earlier, this article pointed out that we must pay attention to the problem of buffer alignment when using direct I _ blockdev_direct_IO O to read and write data. As can be seen from the above code, the buffer alignment check is carried out in the _ _ buffer () function. The buffer of the user address space can be determined by the iovec descriptor in the iov array. The read or write operations of the direct read O are synchronized, that is, the function _ _ blockdev_direct_IO () will not return until all the operations are finished, so once the application read () system call returns, the application can access the buffer containing the corresponding data in the user's address space. However, this approach cannot close the application until the application read operation is complete, which will cause the application to be closed slowly.

The advantage of direct Icano is that it reduces the number of copies of the operating system buffer and the user address space. Reduce the overhead of CPU, and memory bandwidth. It's a boon for some applications and will greatly improve performance.

Direct IO does not always make people happy. Direct IO is also very expensive, and the application does not control read and write properly, which will lead to inefficient read and write on disk. The read and write of the disk is through the switching of the magnetic head to different tracks to read and write data. if the data need to be written far apart from each other on the disk, it will greatly increase the time of seek and greatly reduce the efficiency of write reading.

The above is all the contents of this article entitled "what is the principle of Direct Istroke O in Linux". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report