Chapter V File system 07/19 Update SLTechnology News&Howtos

Chapter V File system

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

5.1 root file system

It is well known that the system must be partitioned and formatted before installing the system on a new hard drive.

For windows, after the partition is complete, each partition is a separate file system. This means that disk C and disk D have nothing to do with it. The visit is also independent of each other.

For linux, if all files are on the linux host, if you want them to be accessible, from a logical structural point of view, it must start from a location called the root file system, but it does not mean that all files must not need to be partitioned from the root. Partitioning is necessary in order to achieve independent management of multiple file systems. However, after partitioning, any partition cannot be accessed independently, but can only be accessed with the existing root.

When the kernel is booted and loaded, it does not provide any extra files for users to access, and it is not a useful process for users to use directly. So the kernel must be able to launch many external commands, including shell programs, various GUI or CLI interfaces, and so on. These commands are usually placed on top of a partition. But there are so many partitions in the system, which one should the kernel recognize? In order to avoid this difficulty in selection, in general, no matter how many partitions are divided, there must be a partition of the system disk, which is usually the first partition to be loaded after the kernel boots.

As shown in the figure above, assuming that partition An is a system disk partition, the kernel thinks that partition An is the first partition it must load, so when the kernel is booted, in order to help start other peripheral programs, the kernel sets a path in its own workspace and calls it the root. Then associate everything on the A (system disk) partition directly to the root. This means that any file to be accessed through the root path is actually a file on partition A.

For Linux, the first file system that the kernel can recognize and must load first is called the root file system (rootfs).

Once partition An is considered by the kernel to be the first partition to load, how can partition B, partition C, and partition D be accessed? In windows, Partition A, Partition B, Partition C and Partition D are all independent, so you can access any partition you want. In linux, any partition other than partition A (system disk) must be associated with the existing root file system in order to be accessed.

5.2 Common file systems

Common file systems include the following:

Linux file system: ext2, ext3, ext4, xfs, btrfs, reiserfs, jfs, swap

Swap: swap partition

Iso9660: optical disk file system

Ext4:centos6 mainstream file system

File system that comes with btrfs:centos7

Recommended file systems on xfs:centos7

Windows file system: fat32, ntfs

Unix file system: FFS, UFS, JFS2

Network file system: NFS, CIFS

Cluster file system: GFS2, OCFS2

Distributed file systems: ceph, moosefs, mogilefs, Glusterfs, Lustre

According to whether it supports the "Journal" function, it is divided into the following two types of file systems:

Log file system: ext3, ext4, xfs,...

When storing log-based file systems, metadata is first written in the log area. If a power outage occurs, it can be restored through logs.

Non-journal file system: ext2, vfat

When a non-journal file system is stored, metadata is written directly in the metadata area. Once the power is off, the unwritten data will be corrupted and cannot be recovered.

Components of the file system:

Modules in the kernel: ext4, xfs, vfat, etc.

User space management tools: mkfs.ext4, mkfs.xfs, mkfs.vfat, etc.

From the above information can be seen, Linux supports a large number of file systems, and each file system call interface is different, which is a headache for programmers, so many file systems, if you want to program for a file system, you must understand a large number of file system call interfaces, which makes programming much more difficult. In fact, programmers are not dealing with file systems such as ext2, but virtual file systems (VFS). VFS unifies all the different invocation mechanisms of file systems on the same calling interface. So no matter what format the system is formatted, as long as it supports VFS, the programmer can directly call the VFS interface, and VFS can convert it into a call to a specific type of file system interface. Among the many file systems of Linux, as long as they follow the POSIX file system specification, they are generally compatible with VFS.

/ proc/filesystems: what are the current file system types supported by the kernel

There is no nodev in front of the file to indicate the file system in use

Configuration file / etc/fstab for the file system:

OS automatically mounts each file system defined in this file initially. The content format of this file is:

Device mount point file system type mount option dump frequency file system detection order (only root can be 1)

There are several types of devices to be mounted:

Device file: / dev/sda5

Volume label: LABEL= ""

UUID:UUID= ""

Pseudo file system name: proc, sysfs, devtmpfs, configfs

Requirements for mount points:

A) this directory is not used by other processes

B) the directory must exist in advance

C) the original files in the directory will be temporarily hidden and visible after uninstallation

Dump frequency: every number of days to make a full backup. 0 means no backup, 1 means daily backup, 2 means backup every 2 days.

Note: both the mount point and the file system type of the swap partition are swap. If you want to enable a certain feature while the file system is mounted automatically, for example, to enable the acl feature, simply add acl after the mount option defaults, such as defaults,acl

5.3 layout structure of the ext file system

5.3.1 data area layout structure

Any file system consists of data and metadata. Here, take the ext system file system as an example.

As in the figure above, the data area (data space) is divided into block groups, and each block group contains super blocks, block group descriptors (GDT), block bitmaps (block bitmap), Inode bitmaps (inode bitmap), Inode tables (inode table), and data blocks (data blocks).

The number of blocks in each block group depends on the size of the block, so a super block is defined in order to locate the blocks in the block group.

A superblock can have multiple backups, which are as follows:

Current file system type

How many inode does the current file system contain

How many blocks are there in the current file system

The size of each block in the current file system

Free disk block, reference disk block, free inode, reference inode

You can see the super block information of / dev/sda1 using the tune2fs-l / dev/sda1 command.

The block group descriptor table (GDT) can have multiple backups with the following contents:

How many block groups are there in the current system?

Each block group starts from the first block to the end of the block

Using the dumpe2fs / dev/sda1 command, you can see not only the super block information of the / dev/sda1 file system, but also the block group descriptor.

5.3.2 metadata area

The metadata area contains the following:

Inode table (stores inode)

Inode bitmap (index bitmap)

Block bitmap (block bitmap)

The storage space that can store all the attribute information of a single file and organize it in a specific format is called Inode.

Inode is the index node (Index node). Inode contains the following:

File / directory size

Time stamp

Authority

Owner, group

Address pointer: which blocks are used in the file to store data, and use the pointer to point to the number of the corresponding data block

Direct pointer (directly to the data block)

Indirect pointer (pointing to another location, another contiguous area, like an extended partition)

Third-level pointer

To access any file, you must first find its corresponding inode, know which blocks the file's data is stored in through inode, and then find the corresponding blocks.

Most blocks in the hard disk must have their numbers and can be referenced by inode in order to work properly.

In order to realize the fast storage of inode, the metadata area has divided the whole interval of inode after the formatting is completed, and the size of each inode block is fixed, but these inode are free and not used.

Assuming that a file system has 1 million inode, how can you tell if inode is idle?

Let's assume that each inode is preceded by a flag bit, 1 for used and 0 for idle. When you want to use it, you must scan globally, find the first free inode, and then populate it with the information you want to store inode.

Suppose that there is also a flag bit in front of each block block of the data area, with 1 indicating that it is already used and 0 indicating idle. When you want to use it, you must also scan globally, find the first free block block, then fill it with data and establish a mapping with the corresponding inode.

Assuming that a file is large and may require multiple block blocks to store its data, you must allocate multiple consecutive free blocks to store it and set the flag bit of the used block to 1

If a file is so small that one block block can store its data, set the rest of the extra block flag bits to 0

5.3.4 Bitmap index

Think of a problem: if the hard drive has 100G, you have to scan it all in order to find a free block, which is too inefficient. In order to solve this problem, there is a secondary index.

Because of the large amount of inode, it will be slow to find an idle inode. At this time, you can find a continuous storage space and do a bitwise identification index on the inode. The 0 bit corresponds to the inode number of 0, the 1 bit corresponds to the inode number of 1, and so on. There are N binary bits. A bit of 1 indicates that the corresponding inode has been used, and a bit of 0 indicates that the corresponding Inode is idle. When you want to create an inode, you no longer need to scan the full scan, only need to scan the secondary index (alignment identification index), so the efficiency is greatly improved. And this secondary index is the inode bitmap index (inode bitmap). The data area has a block bitmap (block bitmap) on the same principle.

Inode bitmap: status information that identifies whether each inode is idle or not

If the whole disk is managed, it is time-consuming to scan 1 million blocks, assuming that there are 1 million blocks in the whole disk. Therefore, both inode bitmaps and block bitmaps are not managed by the full file system, but by block groups.

5.3.5 File access process

A) look up the index node (Inode)

B) find the number of the disk block in the Inode

C) find the corresponding disk block in the data area

5.3.6 directory

Check the inode first when accessing the file, but the inode table contains a lot of inode. How do you determine which Inode corresponds to the file? That's what catalogs are for.

A directory is also a file, stored in a block in the data area, and the directory is essentially a path mapping.

The following are stored in the directory:

A) list of all file names under the first-level directory

B) the inode number of all files under the first-level directory

5.3.7 File creation

A) find a free inode block in the metadata area to store inode information

B) find one or more free block blocks in the data area and map them to inode

C) populate the data into these block blocks

D) set the flag bit to 1 (state in use)

5.3.8 hard links

Multiple files point to the same inode, which is called hard link. These file names can be the same or different, and you cannot link files from different file systems.

Features of hard links:

A) can only be created for files and cannot be applied to directories

B) cannot cross file systems

C) creating hard links will increase the number of times files are linked

5.3.9 soft links

Soft links are also called symbolic links, and this file contains the pathname of another file. It can be any file or directory, and you can link files from different file systems.

Features of soft links:

A) can be applied to directories

B) Cross-file system

C) will not increase the number of links to the linked file

D) its size is the number of characters contained in the specified path

Create a soft link:

Ln [- s-v] SRC DEST

5.4 btrfs file system

5.4.1 introduction to the btrfs file system

Btrfs file systems have been supported since centos7.

Btrfs (Btrfs), which follows the GPL specification, has been developed by Oracle since 2007.

Core features of the Btrfs file system:

A) support for copy-on-write mechanism (CoW): copy, update, and replace pointers instead of traditional "in-place" updates

Suppose you want to modify a file, the copy mechanism when writing is to make a copy of the file first, then modify the copy, and then change the pointer of the file name from the original file to the copy.

In this way, the original file is still in memory, and if the copy is modified incorrectly, it can be restored to the original file by restoring the pointer.

B) multiple physical volume support

Btrfs can be composed of multiple underlying physical volumes and supports RAID to "add", "remove" and "modify" online

C) support for data and metadata check code mechanisms (CheckSum)

When a file is stored, the metadata check code and the data check code are saved by extending some properties of the file. Therefore, when the file is read, it is very convenient and quick to detect whether the file is damaged, and if it is damaged, it will automatically try to repair it.

D) support for subvolumes (sub_volume, equivalent to lvm/lvm2 for ext series file systems)

Multiple underlying physical devices (hard drives) can be organized into a btrfs file system, which can be mounted directly or create subvolumes internally (just like creating LV in VG)

E) support for snapshots (a snapshot is an incomplete copy of a subvolume, and another volume based on the CoW mechanism has just started with a storage space of 0)

The btrfs file system directly supports snapshots, while ext3/ext4 must use lvm2 to support snapshots.

You can take a snapshot of a single file or a volume

It is also supported to make another snapshot and cumulative snapshot of the finished snapshot. Similar to implementing incremental backup

F) support for transparent compression mechanisms

When you want to store a large file, but want to save space, when you can send any data stream to the btrfs file system, it can automatically compress the data and store it after occupying the CPU clock cycle. This process is transparent to the user. When reading these compressed files, they can be decompressed automatically.

There is one drawback: compression and decompression will occupy more clock cycles.

The main design goal of Btrfs is to replace ext3 and ext4, which were used by Linux in the early days, but in fact, after the defects of ext3 and ext4 were exposed, another available file system (xfs) was provided on centos6.

5.4.2 implementation of btrfs file system

Mkfs.btrfs: creating btrfs file system

-L |-- label: specify the volume label

-m |-- metadata: specify how to store metadata

Valid values are raid0,raid1,raid5,raid6,raid10,single or dup

-d |-- data: indicates how to store the data

Valid values are raid0,raid1,raid5,raid6,raid10 or single

-O |-- features [,...]: enable the specified function directly when formatting

A list of filesystem features turned on at mkfs time.Not all features are supported by old kernels.

To see all features run

Mkfs.btrfs-O list-all

Commands for commonly used btrfs file systems:

Btrfs: managing btrfs file system

Btrfs filesystem show [--mounted |-- all-devices |] # displays btrfs file system information btrfs filesystem sync # forces the data cached by the specified btrfs file system to be synchronized to the hard disk btrfs filesystem df [...] # View the mounted btrfs file system space utilization btrfs filesystem defragment [options] | [|.] # disk fragmentation btrfs filesystem resize [devid:] [+ / -] [gkm] | [devid:] max # modify file system size btrfs filesystem label [|] [] # display or update btrfs file system volume label

Mount the btrfs file system:

You can write any physical volume name in mount-t btrfs / dev/sdb MOUNT_POINT # / dev/sdb as long as it is one of the underlying physical volumes of the btrfs file system.

Transparent compression mechanism:

Mount-o compress= {lzo | zlib} DEVICE MOUNT_POINT

Btrfs-convert: achieve lossless dynamic conversion of ext series file systems to btrfs file systems or downgrade btrfs file systems to ext series file systems

Btrfsck: realize the detection of file system

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.