Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the optimization methods of file system based on fuse

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "what are the file system optimization methods based on fuse". In the daily operation, I believe that many people have doubts about the file system optimization methods based on fuse. The editor has consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "what are the file system optimization methods based on fuse?" Next, please follow the editor to study!

At present, many file systems are based on Fuse development. After in-depth study of Fuse code, the author summarizes the optimization scheme that can be considered when developing this kind of file system, and discusses it with you. If there is any inaccuracy, I hope you will not hesitate to give us advice. Before reading this article, I assume that you know enough about Fuse (at least knowing that Fuse has two modules: Fuse Kernel and LibFuse, and knowing how an application invocation behavior is passed to our own Fuse-based file system). Otherwise, please move.

Optimization 1: extend the effective time of metadata

Each open file in Linux has two kinds of metadata information in the kernel: struct dentry and struct inode, which are the basis of the file in the kernel. All operations on the file need to get the file before continuing, and these two structures are constructed and populated by the specific file system. The following two points explain the need for metadata optimization:

1)。 When the application calls the file system operating system interface, the passed parameter is generally the file path, such as open ("a/b/c/d.txt"). The kernel needs to parse the pathname, starting from the root directory, getting its dentry and inode according to each component of the path, and then parsing the next component of the path until the inode and dentry of the destination file are parsed. If the dentry in the pathname component is not cached in memory It needs to be read from a specific file system (which takes a lot of time).

2)。 Many applications like to call the stat interface to get the file attributes, but the kernel implementation actually finds the file inode and gets the file attributes from the inode. If the inode is not cached, it needs to be obtained from a specific file system (which can be time-consuming).

Because Fuse's kernel module is just a bridge between the application and the file system we developed based on Fuse. So, according to reason, every time you get the inode and dentry of a file / directory, the Fuse kernel module should go to LibFuse and our file system.

But the disadvantage is obvious: the IO path is longer, the efficiency becomes less efficient, and if the file system we develop based on fuse is a network file system (such as NOS, etc.), it may increase the pressure on the back-end server.

With this in mind, the authors of Fuse have added metadata caching to the Kernel Fuse module, including dentry and inode caches. Compared to the local file system, we must always be wary of one problem: cache effectiveness. Therefore, how to improve the performance while ensuring correctness as far as possible is a thorny problem.

When mounting our own file system using fuse, we can specify the valid time of dentry and inode attributes. Of course, this valid time has to be set for specific questions, and there is no unified answer.

Optimization method: specify-o entry_timeout=T-o attr_timeout=T for fuse mount

Optimization suggestion: five stars

Optimization 2: expand the number of pages per write

Every time an application writes a file to a file system based on Fuse, it must go through the Kernel Fuse module. In fact, Kernel Fuse has a lot of authority to decide when to write data to the user-mode file system. The more often you write, the lower the efficiency, but the consistency may be better, and controlling the write frequency is actually a tradeoff.

If you are a little familiar with Kernel, you may know that the kernel IO is actually in units of Page. The kernel divides the application's write requests into multiple page according to PAGE_SIZE, and then IO the page, which is concise and graceful.

If it is not optimized, Kernel Fuse will call a write operation of the user-mode file system for each page of the application, so that if our user-mode 64KB write request, according to the default PAGE_SIZE (4KB) may trigger 16 user-mode writes, the actual IO times are magnified and the efficiency is seriously reduced. If optimization is adopted, Kernel Fuse will trigger user-mode file system write calls only once per 128KB by default. Of course, you can also specify the threshold to trigger write calls.

Optimization method: specify-o big_write-o max_write=N for fuse mount

Optimization suggestion: five stars

Optimization 3: turn on kernel read cache

Linux file system implementation makes full use of memory to cache file data, so that applications often only need to copy data from kernel buffers to user-mode buffers, without having to start disk IO at all.

Because of the particularity of Fuse, we need to strictly control the behavior of data caching (see the metadata caching we mentioned earlier), because maybe the Fuse-based file system we implemented is actually a network file system, so if you use kernel caching, you may read dirty data, because as a user, it is very difficult for you to control the behavior of the kernel.

However, the author of Fuse is very considerate, it provides a variety of mount options to control caching behavior, but a friendly reminder: once you choose to turn on caching, please be responsible for the expired data that you may read.

Optimization method: specify-o kernel_cache-o auto_cache for fuse mount

By the way: what we are talking about above is the behavior of the parameter kernel_cache, but there is no description of the behavior of auto_cache. I leave it to readers to study carefully and remind you that this option is an optimization strategy for kernel cache validity detection based on file modification time.

Optimization suggestion: three stars

Optimization 4: expand the pre-reading window

Pre-reading is an interesting thing. The Linux kernel changes the original reading behavior of the application through pre-reading. For example, the application initiates a 16KB read request, and the kernel may read 64KB data inexplicably. Of course, it must have a reason to do so, simply put: everything for performance, everything for performance. In addition, I will launch a pre-reading related article in the near future, detailing the pre-reading mechanism, please follow.

Fuse allows you to specify the pre-read window size when mounting a user-mode file system. Fuse will use this setting as the * pre-read window size. If it is not specified, it will use the default * pre-read window size 128KB of Linux. But in fact, if you set Fuse's pre-read window to exceed the default 128KB of Linux, it is also futile, because VFS does not allow the pre-read window to exceed the 128KB limit, so generally speaking, the optimization does not make much sense.

Optimization method: fuse mount specifies-o max_readahead = N

Optimization suggestion: one star

Optimization 5: replace BufferIO with DirectIO

In some cases, applications want to bypass OS's cache and manage their own cache (such as a database), which requires the file system to implement the DIRECTIO method.

Similarly, considerate Fuse authors also provide us with directIO reading and writing. Compared with the BufferIO method, the advantage of DirectIO is that it reduces the overhead of copying data from the application buffer to the kernel state, and the performance may be improved for a large number of sequential write scenarios.

Of course, if you use DirectIO, I'm afraid the problem is that read cannot use kernel caching, which is often unbearable. Often, file system read requests are far more than writes, so think twice before optimization.

Optimization method: fuse mount specifies-o direct_io

Optimization suggestion: one star

At this point, the study of "what are the optimization methods based on fuse file system" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report