Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the meaning of virtual file system in Linux

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly shows you "what is the meaning of virtual file system in Linux", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "what is the meaning of virtual file system in Linux"?

The virtual file system is a magical abstraction that makes the "everything is a file" philosophy possible in Linux.

What is a file system? According to Robert Love, an early Linux contributor and author, "a file system is a hierarchical storage of data that follows a specific structure." However, this description also applies to VFAT (virtual file allocation table Virtual File Allocation Table), Git, and Cassandra (a NoSQL database). So how do you distinguish between file systems?

Basic concepts of file system

The Linux kernel requires that the file system must be an entity. It must also implement the open (), read (), and write () methods on persistent objects, and these entities must have names associated with them. From the perspective of object-oriented programming, the kernel regards the general file system as an abstract interface. These three functions are "virtual" and have no default definition. Therefore, the default file system implementation of the kernel is called virtual file system (VFS).

If we can open (), read (), and write (), it is a file, as shown in this console session.

VFS is the basis of the concept of "everything is a file" in the famous Unix-like system. Let's see how strange it is. The little demo above shows how the character device / dev/console actually works. The figure shows an interactive Bash session on the virtual teletype console (tty). Sending a string to the virtual console device causes it to appear on the virtual screen. VFS has even more bizarre properties. For example, it can be addressed in it.

File systems that we are familiar with, such as ext4, NFS, and / proc, all provide definitions of three major functions in a C language data structure called file_operations. In addition, individual file systems extend and overwrite VFS functionality in a familiar object-oriented manner. As Robert Love points out, the abstraction of VFS makes it easy for Linux users to copy files to (or from) external operating systems or abstract entities such as pipes, without worrying about their internal data formats. On the user space side, through system calls, a process can copy from a file into the kernel's data structure using one of the file system methods, read (), and then use another file system method, write (), to output data.

Function definitions that belong to the basic type of VFS can be found in the fs/*.c file of the kernel source code, while the subdirectory of fs/ contains a specific file system. The kernel also contains filesystem-like entities, such as cgroup, / dev, and tmpfs, which are needed early in the boot process and are therefore defined in the kernel's init/ subdirectory. Note that cgroup, / dev, and tmpfs do not call the three major functions of file_operations, but read and write directly to memory.

The following figure outlines how user space accesses various types of file systems that are typically mounted on Linux systems. Structures such as pipes, dmesg, and POSIX clocks are not shown in this figure, they also implement struct file_operations, and their access is also through the VFS layer.

How userspace accesses various types of filesystems

VFS is a "gasket layer" between system calls and specific file_operations implementations, such as ext4 and procfs. The file_operations function can then communicate with device-specific drivers or memory accessors. Tmpfs, devtmpfs, and cgroup do not use file_operations but access memory directly.

The existence of VFS promotes code reuse because the basic methods related to file systems do not need to be reimplemented by each file system type. Code reuse is a widely accepted software engineering practice! Alas, but if the reused code introduces serious errors, all implementations that inherit common methods will be affected.

/ tmp: a tip

An easy way to find VFS on your system is to type mount | grep-v sd | grep-v: /, which on most computers lists all mounted file systems that do not reside on disk and are not NFS. One of the listed VFS mounts must be / tmp, right?

Everyone knows it's crazy to put / tmp on a physical storage device! Picture: https://tinyurl.com/ybomxyfo

Why is it not advisable to leave / tmp on the storage device? Because the files in / tmp are temporary (! And the storage device is slower than memory, so a file system such as tmpfs is created In addition, frequent writes by physical devices are more likely to wear out than memory. Files in *, / tmp may contain sensitive information, so it is a feature to make them disappear every time you restart.

Unfortunately, by default, installation scripts for some Linux distributions still create / tmp on the storage device. If this happens to your system, please don't despair. Just follow the simple instructions on the always-good Arch Wiki to solve the problem, keeping in mind that the memory allocated to tmpfs cannot be used for other purposes. In other words, a large tmpfs containing large files may cause the system to run out of memory and crash.

Another hint: when editing the / etc/fstab file, be sure to end with a newline character, otherwise the system will not start. Guess how I know. )

/ proc and / sys

Apart from / tmp, the VFS that most Linux users are most familiar with are / proc and / sys. / dev relies on shared memory and has no file_operations structure. Why are there two kinds? Let's look at more details.

Procfs provides user space with a snapshot of the transient state of the kernel and the processes it controls. In / proc, the kernel publishes information about the facilities it provides, such as interrupts, virtual memory, and schedulers. In addition, / proc/sys is a place where settings that can be configured through the sysctl command are stored and can be accessed by user space. The status and statistics of individual processes are reported in the / proc/ directory.

/ proc/meminfo is an empty file, but still contains valuable information.

The behavior of the / proc file shows that VFS can be different from the file system on disk. On the one hand, / proc/meminfo contains information that can be displayed by the command free. On the other hand, it is still empty! How did this happen? This situation is reminiscent of an article written by Cornell University physicist N. David Mermin in 1985 entitled "does no one see the moon?" Reality and quantum theory. The fact is that the kernel collects statistics about memory when the process requests data from / proc, and when no one is looking at it, the files in / proc actually have nothing. As Mermin said, "this is a basic quantum theory, and generally speaking, measurements do not reveal the pre-existing value of the properties being tested." The answers to the questions about the moon are reserved for practice. )

When no processes access them, the files in / proc are empty. (source)

The empty file for procfs makes sense because the information available there is dynamic. The case of sysfs is different. Let's compare the number of files in / proc and / sys that are not empty.

Procfs has only one file that is not empty, the exported kernel configuration, which is an exception because it only needs to be generated once per boot. / sys, on the other hand, has many larger files, most of which consist of one page of memory. Typically, sysfs files contain only a number or string, in sharp contrast to tables of information generated by reading files such as / proc/meminfo.

The purpose of sysfs is to expose the readable and writable properties of the kernel called "kobject" to user space. The sole purpose of kobject is reference counting: when you delete a reference to kobject, the system reclaims the resources associated with it. However, / sys constitutes the kernel's famous "stable ABI to user space", most of which no one can "destroy" under any circumstances. However, this does not mean that files in sysfs are static, which is contrary to the reference count of volatile objects.

The stable ABI of the kernel limits what may appear in / sys, rather than what actually exists at any given moment. List the permissions of files in sysfs to learn how to set or read configurable and tunable parameters for devices, modules, file systems, and so on. Logically emphasize the conclusion that procfs is also part of the kernel stable ABI, although the kernel documentation is not clear.

The file in sysfs describes exactly each attribute of the entity and can be readable, writable, or both. The "0" in the file represents a storage device that SSD is not removable.

A peek into the interior of VFS with eBPF and bcc tools

The easiest way to learn how the kernel manages sysfs files is to watch it run, and the easiest way to watch it on ARM64 or x86 movies 64 is to use eBPF. EBPF (extended Berkeley data * filter extended Berkeley Packet Filter) consists of virtual machines running in the kernel, and privileged users can query from the command line. The kernel source code tells the reader what the kernel can do; running the eBPF tool on a booted system shows what the kernel actually does.

Happily, it is easy to get started with eBPF through the bcc tools, which are available in major Linux distributions and are well documented by Brendan Gregg. Bcc tools are Python scripts with small embedded C language snippets, which means that anyone familiar with both languages can easily modify them. According to current statistics, there are 80 Python scripts in bcc/tools, making it possible for a system administrator or developer to find existing scripts relevant to his or her needs.

To learn how VFS works on a running system, try using a simple vfscount or vfsstat script, which shows that dozens of calls to vfs_open () and related ones occur every second.

Vfsstat.py is a Python script with embedded C snippets that simply counts VFS function calls.

As a less important example, let's take a look at what happens in sysfs when a USB memory stick is inserted on a running system.

Use eBPF to observe what happens when you insert a USB memory stick / sys, simple and complex examples.

In the simple example above, the trace.py bcc tool script prints a message whenever the sysfs_create_files () command is run. We see that sysfs_create_files () is started by a kworker thread in response to the insert event of the USB rod, but what file does it create? The second example illustrates the power of eBPF. Here, trace.py is printing kernel backtracking (the-K option) and the name of the file created by sysfs_create_files (). The code snippet within the single quotation marks is some C source code, including an easily recognizable format string, and the provided Python script introduces the LLVM just-in-time compiler (JIT) to compile and execute it within the kernel virtual machine. The full signature of the sysfs_create_files () function must be reproduced in the second command so that the format string can refer to one of the parameters. An error in this C fragment will result in a recognizable C compiler error. For example, if the-I parameter is omitted, the result is "BPF text cannot be compiled." Developers familiar with C # or Python will find bcc tools easy to extend and modify.

After inserting the USB memory stick, kernel backtracking shows that PID 7711 is a kworker thread that creates a file called events in sysfs. Making the corresponding call with sysfs_remove_files () indicates that deleting the USB memory stick causes the events file to be deleted, which is consistent with the idea of reference counting. Observing sysfs_create_link () in eBPF during USB rod insertion (not shown) indicates that no less than 48 symbolic links have been created.

What is the purpose of the events file, anyway? Use the cscope lookup function _ _ device_add_disk () to show that it calls disk_add_events (), and you can write "mediachange" or "ejectrequest" to the file. Here, the block layer of the kernel informs the user that the "disk" appears and disappears. Consider how fast this method of checking how the insertion of USB rods works and trying to find out the process only from the source.

The read-only root file system makes embedded devices possible

Indeed, no one shuts down the server or desktop system by unplugging it. Why? Because the file system mounted on the physical storage device may have pending (incomplete) writes, and the data structure that records its status may not be synchronized with what is written to the memory. When this happens, the system owner will have to wait for the fsck file system recovery tool to finish running at the next startup, and in the worst case, the data will actually be lost.

However, enthusiasts will hear that many Internet of things and embedded devices, such as routers, thermostats and cars, are now running Linux. Many of these devices have almost no user interface, and there is no way to "unboot" them cleanly. Think of starting a car that runs out of batteries, in which mainframe devices running Linux are constantly powered on and off. When the engine is finally up and running, how does the system start without a long fsck? The answer is that embedded devices rely on the read-only root file system (ro-rootfs).

Ro-rootfs is the reason why embedded systems do not often need fsck. Source: https://tinyurl.com/yxoauoub

Ro-rootfs offers many advantages, although they are not as obvious as durability. One is that malware cannot write to / usr or / lib if the Linux process is not writable. The other is that a basically immutable file system is essential for on-site support of remote devices because the support staff have theoretically the same local system as the field. Perhaps the most important (but also subtle) advantage is that ro-rootfs forces developers to decide which system objects are immutable during the design phase of the project. Dealing with ro-rootfs can often be inconvenient or even painful, as is often the case with constant variables in programming languages, but the benefits are easy to cover this extra overhead.

For embedded developers, creating a read-only root file system does require some extra work, and this is an opportunity for VFS to use its talents. Linux requires files in / var to be writable, and in addition, many popular applications running on embedded systems try to create configured point files in $HOME. One solution for configuration files placed in the home directory is usually to pre-generate them and build them into rootfs. For / var, one way is to mount it on a separate writable partition, while / itself is mounted read-only. Using bindings or overlay mounts is another popular alternative.

Binding and overlay mounts and their use in containers

Running man mount is a way to learn about binding mount bind mount and overlaying mount overlay mount, which allows embedded developers and system administrators to create file systems in one path location and then provide it to applications in another path. For embedded systems, this means that files can be stored on a non-writable flash device in / var, but overlay or bind the paths in tmpfs to the / var path at startup, so that applications can write their contents there at will. The changes in / var will disappear the next time you power up. Overlay mounts provide a federation between tmpfs and the underlying file system, allowing direct modifications to existing files in ro-rootfs, while binding mounts make the new empty tmpfs directory appear writable in the ro-rootfs path. Although the overlay file system is an appropriate file system type, binding mount is implemented by the VFS Namespace tool.

Based on the description of overlay and bind mounts, no one would be surprised by the heavy use of them in Linux containers. Let's monitor what happens when the container is started using systemd-nspawn by running bcc's mountsnoop tool:

While mountsnoop.py is running, the system-nspawn call starts the container.

Let's see what happened:

Running mountsnoop during the container "startup" shows that the container runtime depends heavily on binding mounts. (show only the beginning of lengthy output)

Here, systemd-nspawn provides the selected files in the host's procfs and sysfs to the container according to the path in its rootfs. In addition to setting the MS_BIND flag for binding mount, some other flags of the mount system call are used to determine the relationship between the host namespace and the changes in the container. For example, binding mounts can propagate changes in / proc and / sys to the container or hide them, depending on the call.

The above is all the content of the article "what is the meaning of virtual file system in Linux". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report