Sample Analysis of ext4 and later versions in Linux File system 07/15 Update SLTechnology News&Howtos

Sample Analysis of ext4 and later versions in Linux File system

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article shares with you the content of sample analysis of ext4 and later versions of the Linux file system. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

Most modern Linux distributions default to the ext 4 file system, just as previous Linux distributions default to ext3, ext2, and-if traced back enough-ext.

If you are new to Linux or file systems, you may wonder what ext 4 brings to tables, while ext3 does not. Given the news coverage of alternate file systems such as btrfs, XFS, and ZFS, you may also want to know if ext4 is still actively under development.

We can't cover everything about the file system in one article, but we'll try to let you know the history of Linux's default file system, where it is located, and what you expect.

I have quoted extensively various ext file system articles as well as my experience in writing this overview.

A brief history of ext

MINIX file system

Before ext, the MINIX file system was used. If you are not familiar with Linux history, it can be understood that MINIX is a very small Unix-like system for IBM PC/AT microcomputers. Andrew Tannenbaum developed it for educational purposes and released the source code in 1987 (in print format! ).

PC/AT, MBlairMartin, CC BY-SA 1980 in the middle of IBM

Although you can read the source code of MINIX carefully, it is not really free and open source software (FOSS). The publisher of the Tannebaum book requires you to pay a license fee of $69 to run MINIX, which is included in the cost of the book. Still, it was very cheap at the time, and the use of MINIX grew rapidly, quickly exceeding Tannebaum's intention to use it to teach operating system coding. Throughout the 1990s, you can find that MINIX installation is very popular in universities around the world. At this point, the young Linus Torvalds used MINIX to develop the original Linux kernel, which was first released in 1991 and then released under the GPL open source protocol in December 1992.

But wait, this is an article on the file system, isn't it? Yes, MINIX has its own file system, and early versions of Linux depend on it. Like MINIX, Linux's file system is as small as a toy-the MINIX file system can handle file names of up to 14 characters and can only handle 64MB storage space. By 1991, the average hard disk size had reached 40-140 MB. Obviously, Linux needs a better file system.

Ext

When Linus developed the fledgling Linux kernel, R é my Card worked on the first generation of ext file systems. The ext file system was first implemented and released in 1992-only a year after the first release of Linux! -- ext solves the worst problem in the MINIX file system.

Ext in 1992 uses the new virtual file system (VFS) abstraction layer in the Linux kernel. Unlike previous MINIX file systems, ext can handle up to 2 GB of storage space and filenames of up to 255 characters.

But ext did not dominate for a long time, mainly because of its original timestamp (there is only one timestamp per file, rather than the timestamp we are familiar with today with inode, the most recent file access time, and the latest file modification time. Only a year later, ext2 replaced it.

Ext2

R é my quickly realized the limitations of ext, so a year later he designed ext2 to replace it. While ext is still rooted in the Toy operating system, ext2 was designed as a commercial file system from the beginning, following the design principles of BSD's Berkeley file system.

Ext2 provides the largest file size at the GB level and the file system size at the TB level, solidifying its position in the file system alliance in the 1990s. Soon it was widely used, both in the Linux kernel and eventually in MINIX, and it could be used in MacOS and Windows using third-party modules.

But there are still some problems that need to be addressed: the ext2 file system, like most file systems in the 1990s, is prone to catastrophic data corruption if the system crashes or loses power while writing data to disk. Over time, they also suffered significant performance losses because of fragments (a single file is stored in multiple locations and physically scattered on a rotating disk).

Despite these problems, ext2 is still used in some special cases today-- the most common is the file system format used as a portable USB drive.

Ext3

In 1998, six years after ext2 was adopted, Stephen Tweedie announced that he was working on improving ext2. This became ext3 and was adopted into the Linux kernel mainline in the 2.4.15 kernel version in November 2001.

Packard Bell computer, Spacekid, CC0 in the mid-1990s

For the most part, ext2 worked well in Linux distributions, but like FAT, FAT32, HFS, and other file systems of the time-- they were prone to catastrophic damage in the event of a power outage. If a power outage occurs when data is written to the file system, it may be left in a so-called inconsistent state-- only half of the work is done and the other half is not done. This can result in the loss or corruption of a large number of files that have nothing to do with the file being saved or even prevent the entire file system from being unmounted.

Ext3 and other file systems in the late 1990s, such as Microsoft's NTFS, used logging to solve this problem. A log is a special allocation area on disk whose writes are stored in a transaction; if the transaction completes writing to disk, the data in the log is committed to the file system itself. If the system crashes before the operation is committed, the rebooted system recognizes it as an outstanding transaction and rolls it back as if it had never happened. This means that the file being processed may still be lost, but the file system itself is consistent and all other data is secure.

Three levels of logging are implemented in the Linux kernel using the ext3 file system: journal, ordered, and writeback.

Journaling is the lowest risk mode, and data and metadata are written to the log before they are submitted to the file system. This ensures that the file being written is consistent with the entire file system, but it significantly degrades performance.

Order is the default mode for most Linux distributions; the sequential mode writes metadata to the log and commits the data directly to the file system. As the name implies, the order of operation here is fixed: first, the metadata is submitted to the log; second, the data is written to the file system, and then the associated metadata in the log is updated to the file system. This ensures that the metadata associated with incomplete writes is still in the log when a crash occurs, and that the file system can clean up incomplete write transactions when the log is rolled back. In sequential mode, a system crash may cause errors in files to be actively written during a crash, but the file system itself-- and files that are not actively written-- are guaranteed to be safe.

Writeback is the third mode-and the least secure logging mode. In writeback mode, like sequential mode, metadata is logged, but the data is not. Unlike sequential mode, metadata and data can be written in any order that is conducive to the best performance. This can significantly improve performance, but the security is much lower. Although writeback mode still ensures the security of the file system itself, files written before a crash or crash can easily be lost or corrupted.

Similar to the previous ext2, ext3 uses 16-bit internal addressing. This means that for an ext3 with a 4K block size, the maximum file size that can be processed in a file system with a maximum size of 16 TiB is 2 TiB.

Ext4

Ext4, published in 2006 by Theodore Ts'o (the main developer of ext3 at the time), was added to the Linux mainline in the 2.6.28 kernel version two years later.

Ts'o describes ext4 as a temporary technology that significantly extends ext3 but still relies on old technologies. He predicts that ext4 will eventually be replaced by the real next-generation file system.

Dell Precision 380Workstation, Lance Fisher, CC BY-SA 2.0

Ext4 is functionally similar to ext3 in functionality, but supports large file systems, improved resistance to fragmentation, higher performance, and better timestamps.

Ext4 vs ext3

There are some very clear differences between ext3 and ext4, which are discussed here.

Backward compatibility

Ext4 is specifically designed to be backward compatible with ext3 as much as possible. This not only allows the ext3 file system to upgrade to ext4; in place, but also allows the ext4 driver to automatically mount the ext3 file system in ext3 mode, thus eliminating the need for it to maintain two separate code bases.

Large file system

The ext3 file system uses 32-bit addressing, which limits it to only 2 TiB file sizes and 16 TiB file system sizes (this is based on the assumption that some ext3 file systems use a smaller block size when the block size is 4 KiB, so it is further limited)

Using 48-bit internal addressing, ext4 can theoretically allocate files up to 16 TiB in size on a file system, with a file system size of up to 1000000 TiB (1 EiB). In early ext4 implementations, some user-space programs still limited it to file systems with a maximum size of 16 TiB, but as of 2011, e2fsprogs had directly supported ext4 file systems larger than 16 TiB. For example, Red Hat Enterprise Linux only supports ext4 file systems up to 50 TiB on their contracts, and recommends that ext4 volumes not exceed 100 TiB.

Improvement of distribution mode

Ext4 has made a number of improvements to the way storage blocks are allocated before writing them to disk, which can significantly improve read and write performance.

Range section

(extent) is a series of contiguous physical blocks (up to 128 MiB, assuming a block size of 4 KiB) that can be reserved and addressed at once. Using extents can reduce the number of inode required for a given file, significantly reduce fragmentation, and improve performance when writing to large files.

Multi-block allocation

Ext3 invokes the block allocator once for each newly allocated block. When multiple writes open the allocator at the same time, it is easy to cause severe fragmentation. However, ext4 uses deferred allocation, which allows it to merge writes and better determine how to allocate blocks for writes that have not yet been committed.

Persistent pre-allocation

When pre-allocating disk space for a file, most file systems must write zeros to the block of the file at creation time. Ext4 allows an alternative to fallocate (), which ensures the availability of space (and attempts to find contiguous space for it) without having to write it first. This significantly improves the performance of writes and future read streams and database applications.

Delayed allocation

This is an intriguing and controversial feature. Delayed allocation allows ext4 to wait for the actual block that will be written to the data to be allocated until it is ready to commit the data to disk. (by contrast, ext3 allocates blocks immediately, even if the data is still being written to the write cache. )

When data in the cache accumulates, delayed block allocation allows the file system to make better choices on how to allocate blocks, reduce fragmentation (writes, and later reads), and significantly improve performance. Unfortunately, however, it increases the possibility of data loss for programs that have not specifically called the fsync () method when the programmer wants to ensure that the data is completely flushed to disk.

Suppose a program completely rewrites a file:

Fd=open ("file", O_TRUNC); write (fd, data); close (fd)

Use the old file system, close (fd); enough to ensure that the contents of the file are flushed to disk. Even though writing is not strictly transactional, if a crash occurs after the file is closed, the risk of data loss is small.

If the write is not successful (due to errors on the program, errors on disk, power outages, and so on), both the original and newer versions of the file may lose data or become corrupted. If other processes access the file while writing to the file, they will see the corrupted version. If another process opens the file and does not want its contents to change-- for example, map to a shared library of multiple running programs. These processes may collapse.

To avoid these problems, some programmers avoid using O_TRUNC altogether. Instead, they might write a new file, close it, and rename it to the old file name:

Fd=open ("newfile"); write (fd, data); close (fd); rename ("newfile", "file")

In a file system without delayed allocation, this is sufficient to avoid the potential corruption and corruption problems listed above: because rename () is an atomic operation, it will not be interrupted by a crash; and the running program will continue to reference the old file. Now the unlinked version of file only needs to have an open file handle. But because the delayed allocation of ext4 causes writes to be delayed and reordered, rename ("newfile", "file") can be executed before the contents of newfile are actually written to disk, which leads to the problem of getting a bad version of file again in parallel.

To alleviate this situation, the Linux kernel (since version 2.6.30) attempts to detect these common code conditions and force immediate allocation. This reduces, but does not prevent, the likelihood of data loss-and it does not help the new file. If you are a developer, please note that the only way to ensure that data is written to disk immediately is to call fsync () correctly.

Unrestricted subdirectories

Ext3 is limited to 32000 subdirectories; ext4 allows an unlimited number of subdirectories. Starting with the 2.6.23 kernel version, ext4 uses HTree indexes to reduce the performance loss of large quantum directories.

Log check

Ext3 does not validate the logs, which poses problems for disks outside the direct control of the kernel or controller devices with their own caches. If the controller or disk with its own cache is out of write order, it may break the journal transaction order of ext3, which may destroy files written during the crash (or some time before).

In theory, write barriers (barrier) can be used for this problem-when you mount a file system, you set barrier=1 in the mount option, and the device faithfully executes fsync all the way down to the underlying hardware. Through practice, it can be found that storage devices and controllers often fail to comply with write barriers-improving performance (and performance benchmarks compared to competitors), but increasing the likelihood that data corruption should have been prevented.

A checksum of the log allows you to realize that some of its entries are invalid or unordered the first time the file system is mounted after a crash. Therefore, this avoids the error of rolling back some entries or unordered log entries and further corrupts the file system-even if some storage devices fake or do not comply with write barriers.

Fast file system check

Under ext3, the entire file system-including deleted or empty files-is checked when fsck is called. By contrast, ext4 marks unallocated blocks and sectors of the inode table, allowing fsck to skip them completely. This greatly reduces the time it takes to run fsck on most file systems, which is implemented in kernel 2.6.24.

Improved timestamp

Ext3 provides a timestamp with a granularity of one second. Although sufficient for most uses, mission-critical applications often require stricter time controls. Ext4 makes it available for enterprise, scientific, and mission-critical applications by providing nanosecond timestamps.

The ext3 file system also does not provide enough bits to store dates after January 18, 2038. Ext4 adds two bits here, extending the Unix era by 408 years. If you read this in 2446 AD, you will probably have moved to a better file system-if you are still measuring the time since 00:00 on January 1, 1970 (UTC), it will put me to sleep after my death.

Online defragmentation

Neither ext2 nor ext3 directly supports online defragmentation-that is, the file system is defragmented when mounted. Ext2 has an included utility, e2defrag, whose name implies that it needs to run offline when the file system is not mounted. (obviously, this is very problematic for the root file system. The situation is even worse in ext3-although ext3 is less vulnerable to severe fragmentation than ext2, running e2defrag on an ext3 file system can lead to catastrophic corruption and data loss.

Although ext3 was initially considered "unaffected by fragmentation," the process of massively parallel writing to the same file, such as BitTorrent, clearly shows that this is not entirely the case. Some user-space tools and solutions, such as Shake, solve this problem in one way or another-- but they are slower than real, file system-aware, kernel-level defragmentation and are less satisfactory in all respects.

Ext4 solves this problem through e4defrag and is an online, kernel-mode, filesystem-aware, block, and extent-level defragmentation utility.

Ongoing ext4 development

Ext4, as the plague infected person in Monty Python once said, "I'm not dead yet!" Although its main developers see it as a real next-generation file system stopgap, no possible candidate is ready (due to technical problems) to deploy as a root file system for some time.

There are still some key features to be developed in future ext4 releases, including metadata checksum, best-in-class quota support, and large allocation blocks.

Metadata checksum

Because ext4 has redundant superblocks, it provides a way for the file system to validate the metadata in it, allowing you to determine for yourself whether the primary superblock is corrupted and requires the use of spare blocks. You can recover from a corrupted super block without a checksum-but the user needs to first realize that it is corrupted and then try to mount the file system manually using an alternate method. Because in some cases, installing file system reads and writes with a corrupted master super block may cause further damage, even experienced users cannot avoid it, which is not a perfect solution!

Ext4's metadata checksum is very weak compared to the extremely powerful every checksum provided by next-generation file systems such as Btrfs or ZFS. But it's better than nothing. Although checking everything sounds simple! In fact, there are some major challenges in connecting checksums to the file system; refer to the design documentation for more information.

First-class quota support

Wait, quota?! We've had this since the day ext2 showed up! Yes, but they are always added after the fact, and they are always stupid. It may not be worth going into detail here, but the design document outlines how quotas will be moved from user space to the kernel and can be executed more correctly and efficiently.

Large allocation block

As time goes on, those pesky storage systems get bigger and bigger. Because some solid state drives already use 8K hardware block sizes, ext4's current restrictions on 4K modules are increasingly limited. Larger blocks can significantly reduce fragmentation and improve performance at the cost of increasing "slack" space (the space left when you only need part of a block to store a file or the last block of a file). You can see the detailed instructions in the design document.

Actual limitations of ext4

Ext4 is a robust and stable file system. Most people should be using it as the root file system today, but it can't handle all the requirements. Let's talk briefly about some things you shouldn't expect-now or maybe in the future:

Although ext4 can handle data up to 1 EiB (equivalent to 1000000 TiB), you really shouldn't try to do so. In addition to being able to remember more blocks of addresses, there is also a problem of scale. And now ext4 does not process (and probably never will) data greater than 50-100 TiB.

Ext4 is not enough to guarantee the integrity of the data. With the major advances in logging back to the days of ext3, it did not cover many common causes of data corruption. If the data has been corrupted on disk-due to faulty hardware, cosmic rays (yes, really), or just the data decays over time-ext4 cannot detect or repair the corruption.

Based on the above two points, ext4 is a pure file system, not a storage volume manager. This means that even if you have multiple disks-that is, parity or redundancy-you can theoretically recover corrupted data from ext4, but you don't know if it's good for you to use it. Although it is theoretically possible to separate file systems and storage volume management systems in different tiers without losing automatic damage detection and repair capabilities, this is not the way the current storage system is designed, and it will pose major challenges to the new design.

Alternate file system

Before we begin, a word of warning: be very careful that no spare file system is built-in and directly supported as part of the mainline kernel!

Even if a file system is secure, it can be scary to use it as the root file system if something goes wrong during a kernel upgrade. If you have no good reason to boot using alternative media through a chroot, patiently manipulate kernel modules, grub configuration, and DKMS. Do not remove the reserved root file in a very important system.

There may be good reasons to use a file system that is not directly supported by your distribution-but if you do, I strongly recommend that you install it after the system is started and available. (for example, you might have an ext4 root file system, but store most of the data in a ZFS or Btrfs pool. )

XFS

XFS plays the same role as non-ext file systems in the mainline of Linux. It is a 64-bit journaling file system built into the Linux kernel since 2001, providing high performance for large file systems and high concurrency (that is, a large number of processes are immediately written to the file system).

Starting with RHEL 7, XFS becomes the default file system for Red Hat Enterprise Linux. For home or small business users, it still has some drawbacks-- most notably, realigning an existing XFS filesystem is a very painful thing, and it makes more sense than creating another one and copying the data.

Although XFS is stable and high-performance, there is not enough specific end-use difference between it and ext4 to recommend it anywhere other than the default (such as RHEL7), unless it solves specific problems for ext4, such as file systems greater than 50 TiB capacity.

XFS is not in any way the "next generation" file system of ZFS, Btrfs, or even WAFL (a proprietary SAN file system). Like ext4, it should be seen as a stopgap in a better way.

ZFS

ZFS was developed by Sun Microsystems and named after zettabyte-the equivalent of 1 trillion GB-because it can theoretically solve large storage systems.

As a true next-generation file system, ZFS provides volume management (the ability to handle multiple individual storage devices in a single file system), block-level encrypted checksum (which allows data corruption detection with extremely high accuracy), automatic corruption repair (where redundant or parity storage is available), fast asynchronous incremental replication, inline compression, and more.

From the perspective of Linux users, the biggest problem with ZFS is the license issue. The ZFS license is a CDDL license, which is a semi-licensed license that conflicts with GPL. There is a lot of controversy about the meaning of using ZFS in the Linux kernel, ranging from "it's a GPL violation" to "it's a CDDL violation" to "it's fine, it hasn't been tested in court." Most notably, Canonical has inlined ZFS code into its default kernel since 2016, and there are currently no legal challenges.

At this point, even if I am an avid ZFS user, I do not recommend ZFS as the root file system of Linux. If you want to take advantage of ZFS on Linux, set up a small root file system with ext4, and then use ZFS for the rest of your storage, putting data, applications, and things you like on it-but keep the root partition on ext4 until your distribution explicitly supports the ZFS root.

Btrfs

Btrfs is short for B-Tree Filesystem and is usually pronounced "butter"-released by Chris Mason during his tenure at Oracle in 2007. Btrfs aims to share most of the same goals as ZFS, providing multiple device management, per-block validation, asynchronous replication, inline compression, and more.

As of 2018, Btrfs is fairly stable and can be used as a standard single-disk file system, but probably should not rely on volume managers. Compared to ext4, XFS, or ZFS in many common use cases, it has serious performance problems, and its next generation of features-- replication, multi-disk topology, and snapshot management-- can be numerous, and the result can range from catastrophic performance degradation to actual data loss.

The maintenance of Btrfs is controversial; SUSE Enterprise Linux adopted it as the default file system in 2015, while Red Hat announced in 2017 that it no longer supports Btrfs from RHEL 7.4. It may be worth noting that the product supports Btrfs deployment as a single-disk file system, rather than the multi-disk volume manager in ZFS, and even Synology uses Btrfs on its storage devices, but it is layered on top of the traditional Linux kernel RAID (mdraid) to manage disks.

Let's give a brief list of the introduction, characteristics and advantages of ext~ext4:

File system name introduction ext first generation extended file system, a file system released in April 1992, is the first file system done for the linux core. The metadata structure of Unix file system (UFS) is adopted to overcome the problem of poor performance of MINIX file system. It is the first file system implemented by virtual file system on linux to overcome the poor performance of MINIX file system. Ext2 second-generation extended file system is the file system used by LINUX kernel. It was first designed by R é my Card to replace ext and was added to linux core support in January 1993. The classic implementation of ext2 is the ext2fs file system driver in the LINUX kernel, which can support the file system of 2TB up to 2.6 version of the linux core, it can be extended to support 32TB. In the ext2 file system, files are uniquely identified by inode, which contains all the information about the file. A file may correspond to multiple file names, and the file will not be deleted until all file names have been deleted. In addition, the inode corresponding to the same file stored on disk and opened is different, and the kernel is responsible for synchronization. File system efficient and stable ext3EXT3 is the third generation extended file system (English: Third extended filesystem, abbreviated as ext3), is a journaling file system, commonly used in the Linux operating system. The .Ext3 file system is developed directly from the Ext2 file system. At present, the ext3 file system is very stable and reliable. It is fully compatible with the ext2 file system. Users can make a smooth transition to a well-journaled file system. 1. High availability: after the system uses the ext3 file system, the system does not need to check the file system even after an abnormal shutdown. 2. Data integrity: the damage to the file system caused by accidental downtime is avoided. 3. The speed of the file system: because the log function of ext3 optimizes the drive read-write head of the disk. Therefore, the read and write performance of the file system is no lower than that of the Ext2 file system. 4. Data conversion: "it is very easy to convert from ext2 file system to ext3 file system." 5. Multiple logging modes ext4EXT4 is the fourth generation extended file system (English: Fourth extended filesystem, abbreviated as ext4) is the journaled file system under Linux system, is the successor version of ext3 file system. Ext4 was implemented by a development team led by Theodore Tso, the maintainer of Ext3, and introduced into the Linux2.6.19 kernel. Ext4 is an improved version of Ext3 that modifies some of the important data structures in Ext3, rather than just adding a logging function as Ext3 did to Ext2. Ext4 can provide better performance and reliability, as well as richer features. Ext3 compatibility: execute several commands to migrate from Ext3 to Ext4 online without reformatting the disk or reinstalling the system. two。 Larger file system and larger file size: compared to the largest 16TB file system and the largest 2TB file currently supported by Ext3, Ext4 supports 1EB (1048576TB, 1EBpound, 1024TB) file system, and 16TB file system, respectively. 3. Unlimited number of subdirectories: Ext3 currently supports only 32000 subdirectories, while Ext4 supports an unlimited number of subdirectories. 4.Extents:Ext4 introduces the concept of extents, which is popular in modern file systems, and each extent is a group of continuous data blocks. Compared with Ext3, indirect block mapping improves a lot of efficiency. 5. Multi-block allocation: Ext4's multi-block allocator "multiblock allocator" (mballoc) supports the allocation of multiple blocks in a single call. * 6. Delayed allocation 7. Fast fsck 8. Log check 9. " No log (No Journaling) mode 10. Online defragmentation 11.inode related features: compared with Ext3's default inode size of 128bytes, ext4's default inode size is 256bytes. Thank you for reading! This is the end of this article on "sample Analysis of ext4 and later versions in Linux File system". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it out for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.