In-depth analysis of the ext2 file system 04/18 Update SLTechnology News&Howtos

In-depth analysis of the ext2 file system

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

For a long time, I wanted to write an article about the ext family file system. When I was at work, I accidentally deleted a lot of files by rm-rf. I really wanted to have a data recovery software that could help me recover the data. Of course, to learn about data recovery, you must first learn about the file system. Recent work reasons, for a long time did not read and learn Linux kernel-related things, feel abominable. Let's go too far and begin our exploration of the ext2 file system.

I won't say anything about the features of ext2. You can find it in any reliable linux tutorial. Let's go straight ahead and start exploring. First generate an ext2 file system. I set aside 500m of space in my Ubuntu with limited disk space to learn the ext2 file system from scratch. The dd command is used to create a file, and by executing this dd command, an all-zero 512000*1KB file, the 500MB file, is generated. Losetup is set to cycle the device (loop service), the loop device can simulate files into block devices. Then set up our ext2 file system on the block device to carry out our learning. So use the mke2fs command to format the loop device into an ext2 file system. Oh,yeah, we finally have the ext2 file system. It needs to be emphasized here that we call the default option of mke2fs:

Root@libin:~# dd if=/dev/zero of=bean bs=1K count=512000

Recorded 512000 readings.

Recorded 512000 of the write-out

524288000 bytes (524 MB) replicated, 9.40989 seconds, 55.7 MB/ seconds

Root@libin:~# ll bean

-rw-r--r-- 1 root root 524288000 2012-07-06 22:24 bean

Root@libin:~# ll-h bean

-rw-r--r-- 1 root root 500m 2012-07-06 22:24 bean

Root@libin:~#

Root@libin:~# losetup / dev/loop0 bean

Root@libin:~# cat / proc/partitions

Major minor # blocks name

7 0 512000 loop0

8 0 312571224 sda

8 1 49182966 sda1

Oot@libin:~# mke2fs / dev/loop0

Mke2fs 1.41.11 (14-Mar-2010)

File system label =

Operating system inux

Block size = 1024 (log=0)

Chunk size = 1024 (log=0)

Stride=0 blocks, Stripe width=0 blocks

128016 inodes, 512000 blocks

25600 blocks (5.00%) reserved for the super user

First data block = 1

Maximum filesystem blocks=67633152

63 block groups

8192 blocks per group, 8192 fragments per group

2032 inodes per group

Superblock backups stored on blocks:

8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409

Writing inode table: complete

Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 24 mounts or

180 days, whichever comes first. Use tune2fs-c or-i to override.

But it's not over yet, we still can't access our new ext2 file system, because it hasn't been mounted yet, so I decided to mount the loop device in the / mnt/bean directory.

Mkdir / mnt/bean

Mount-t ext2 / dev/loop0 / mnt/bean

Root@libin:/mnt/bean# mount

/ dev/loop0 on / mnt/bean type ext2 (rw)

Root@libin:/mnt/bean# ll

Total dosage 17

Drwxr-xr-x 3 root root 1024 2012-07-06 22:31. /

Drwxr-xr-x 4 root root 4096 2012-07-06 22:32.. /

Drwx- 2 root root 12288 2012-07-06 22:31 lost found/

Through our efforts, we finally created our ext2 file system. Next we need to talk about the structure of the ext2 file system.

The following figure is a structure diagram of the classic ext2 file system. Similar images can be found all over the Internet, but the reason I have to draw this picture is to clarify two questions: 1 not all block groups have super block and fast group descriptors. The 2-block group descriptor GDT does not only manage the information of its own block group, on the contrary, it manages the information of all the block groups.

(the number of inode tables and data blocks is not necessarily equal, there is something wrong with my picture.)

We know that super blocks are important because it tells linux how the block device is organized, and it tells linux what the file system is, how big each block is (1024, 2048 or 4096), how many blocks there are in each block group, and how many bytes inode occupies. Wait a minute. It is precisely because the super block is very important, so we can not keep only one copy of this information. Just imagine, if the super block is broken, and we only have one block group with the super block, then it is completely finished, and we can't get the space close to 500m and the data in it. This is easier to understand. However, should every block group have a startup block? This is not necessary, it is also a bit of a waste of space. So which block group do you put the super block into?

Superblock backups stored on blocks:

8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409

This is the result information output from the loop device to the terminal, because each block group is 8192 blocks (the reason will be discussed later), so the 0th block group, the first block group, the third block group, the 5th block group, the 7th block group, the 9th block group, the 25th block group, the 27th block group, and the 49th block group store super blocks.

How is it calculated, and why do these block groups have to exist? The calculation rule is the power of 3 5 and 7, and such a block group holds the super block. Before explaining the block group descriptor, let's take a look at the information about the super block:

Struct ext2_super_block {

U32 s_inodes_count

U32 s_blocks_count

U32 s_r_blocks_count

_ _ u32 s_free_blocks_count

U32 s_free_inodes_count

U32 s_first_data_block

_ _ u32 s_log_block_size

U32 s_dummy3 [7]

Unsigned char s_magic [2]

_ _ u16 s_state

...

}

Let's get some information about ext2 through debugfs.

Root@libin:/mnt/bean# dumpe2fs / dev/loop0

Dumpe2fs 1.41.11 (14-Mar-2010)

Filesystem volume name:

Last mounted on:

Filesystem UUID: 3bff7535-6f39-4720-9b64-1dc8cf9fe61d

Filesystem magic number: 0xEF53

Filesystem revision #: 1 (dynamic)

Filesystem features: ext_attr resize_inode dir_index filetype sparse_super

Filesystem flags: signed_directory_hash

Default mount options: (none)

Filesystem state: not clean

Errors behavior: Continue

Filesystem OS type: Linux

Inode count: 128016

Block count: 512000

Reserved block count: 25600

Free blocks: 493526

Free inodes: 128005

First block: 1

Block size: 1024

Fragment size: 1024

Reserved GDT blocks: 256

Blocks per group: 8192

Fragments per group: 8192

Inodes per group: 2032

Inode blocks per group: 254

Filesystem created: Fri Jul 6 22:31:09 2012

Last mount time: Fri Jul 6 22:33:28 2012

Last write time: Fri Jul 6 22:33:28 2012

Mount count: 1

Maximum mount count: 24

Last checked: Fri Jul 6 22:31:09 2012

Check interval: 15552000 (6 months)

Next check after: Wed Jan 2 22:31:09 2013

Reserved blocks uid: 0 (user root)

Reserved blocks gid: 0 (group root)

First inode: 11

Inode size: 128

Default directory hash: half_md4

Directory Hash Seed: 0140915d-91ae-43df-9d84-9536cedc0d2b

Group 0: (Blocks 1-8192)

Master superblock at 1, Group descriptors at 2-3

Reserved GDT blocks are located at 4-259

Block bitmap at 260,260,261,260)

The Inode table is located at 262-515,261

7663 free blocks, 2021 free inodes, 2 directories

Number of available blocks: 530-8192

Number of inode available: 12-2032

...

Group 62: (Blocks 507905-511999)

Block bitmap at 507905 (+ 0), Inode bitmap at 507906 (+ 1)

The Inode table is at 507907-508160 (+ 2)

3839 free blocks, 2032 free inodes, 0 directories

Number of available blocks: 508161-511999

Number of inode available: 125985-128016

OK, we got this information, but how can I prove that the information debugfs got is correct? There is only one way, we drill into the super block, according to the super block data structure, get the value of each field of the super block, sounds exciting, OK,Just DO IT.

Root@libin:/mnt/bean# dd if=/dev/loop0 bs=1k count=261 | od-tx1-Ax > / tmp/dump_hex

Recorded the reading of 2610.

Recorded the write out of 2610.

267264 bytes (267Bytes) replicated, 0.0393023 seconds, 0.0393023 MB/ seconds

Root@libin:/mnt/bean# vi / tmp/dump_hex

I read the 261K bytes in front of the entire loop device into / tmp/dump_hex. Block 0 is the startup block, press not to mention. The first piece is super block. Very excited, we can finally meet the legendary super block naked.

000400 10 f4 01 00 00 d0 07 00 00 64 00 00 d6 87 07 00

000410 05 f4 01 00 01 00 00 00

000420 00 20 00 00 00 20 00 00 f0 07 00 00 5f cb f7 4f

000430 5f cb f7 4f 01 00 1a 00 53 ef 00 00 01 00 00 00

000440 25 cb f7 4f 00 4e ed 00 00 00 01 00 00 00

000450 00 00 00 0b 00 00 00 80 00 00 00 38 00 00 00

000460 02 00 00 00 01 00 00 00 5a 65 4b 92 fe 63 43 eb

000470 b6 86 3e f3 6e 44 19 af 00 00 00

000480 00 00 00

0004c0 00 00 00 01

0004d0 00 00 00

0004e0 00 00 00 f9 6f 16 79

0004f0 b7 dc 4f 8a a1 a1 18 82 72 a7 d8 25 01 00 00 00

000500 00 00 00 25 cb f7 4f 00 00 00

000510 00 00 00

000560 01 00 00 00

000570 00 00 00

000800 04 01 00 00 05 01 00 00 06 01 00 00 ef 1d e5 07

The leftmost column is the address, hexadecimal. 000400001K, in other words, is the 1K byte of the file. 000800 = 2K, this is the super block we've been thinking about for a long time. I was so excited that I posted the whole super block. Fortunately, I am not the one who cheated the manuscript fee by the number of words, otherwise we would be despised to death. Then paste the data structure of the ext2 super block, and we compare it field by field to see if debugfs is right.

Struct ext2_super_block {

U32 s_inodes_count

U32 s_blocks_count

U32 s_r_blocks_count

_ _ u32 s_free_blocks_count

U32 s_free_inodes_count

_ _ u32 s_first_data_block

_ _ u32 s_log_block_size

...

}

The first field is called s_inodes_count and takes up four bytes. OK, let's see, the first four bytes starting at 1K are 10 f4 01 00. We know there are little-endian and big-endian. In order to support the mobility of the file system, ext2 designers stipulate that all disks are little-endian. When the data is read into memory, kernel is responsible for converting the format to the native format of cpu.

OK, it's little-endian. We'll see. It's just 0x0001f410. 0x0001f410=128016, look at the data that debugfs gave us, Inode count: 128016, exactly the same. For example, if we care about free_blocks_count and look at the data structure, the starting position of the free_blocks_count field is the 12th byte of the super block. That is, 00040c address. Look at d6 87 07 00. By calculating the following, you can get 0x000787d6 = 493526, which is the same as that given by debugfs's Free blocks. OK . Take care of what fields you care about, you can check them yourself. By being naked with the super block, we know the structure of ext2 super block. In conclusion, not all block groups have super blocks, which occupy only one block block. Yes, when the blocksize is 4K, most of the space in this block is wasted. But fortunately, after all, the number of super blocks is limited and can not be wasted much. The following is about block group descriptors: group descriptors have a total of 32 bytes, most textbooks will give us a set of misunderstanding, that is, each block group should have a group descriptor. In fact, this is not the case. We know that a group descriptor occupies only 32 bytes, and most textbooks will tell us that the group descriptor in a block group occupies k blocks, and a group descriptor does not use so much space. There is only one truth, that is, all group descriptors are stored in k blocks as an array. That is, a block group may not have a group descriptor, while a block group with a group descriptor stores all the group descriptors in the k block. Let me confirm this:

Struct ext2_group_desc

{

U32 bg_block_bitmap; / Blocks bitmap block /

U32 bg_inode_bitmap; / Inodes bitmap block /

U32 bg_inode_table; / Inodes table block /

U16 bg_free_blocks_count; / Free blocks count /

U16 bg_free_inodes_count; / Free inodes count /

_ _ u16 bg_used_dirs_count; / Directories count /

U16 bg_flags

_ _ u32 bg_exclude_bitmap_lo;/ Exclude bitmap for snapshots /

U16 bg_block_bitmap_csum_lo;/ crc32c (s_uuid+grp_num+bitmap) LSB /

U16 bg_inode_bitmap_csum_lo;/ crc32c (s_uuid+grp_num+bitmap) LSB /

U16 bg_itable_unused; / Unused inodes count /

U16 bg_checksum; / crc16 (s_uuid+grouo_num+group_desc) /

}

Group 0: (Blocks 1-8192)

Master superblock at 1, Group descriptors at 2-3

Reserved GDT blocks are located at 4-259

Block bitmap at 260,260,261 Inode bitmap at

The Inode table is located at 262-515 (+ 261)

7663 free blocks, 2021 free inodes, 2 directories

Number of available blocks: 530-8192

Number of inode available: 12-2032

Group 1: (Blocks 8193-16384)

Backup superblock at 8193, Group descriptors at 8194-8195

Reserved GDT blocks are located at 8196-8451

Block bitmap at 8452 (+ 259), Inode bitmap at 8453 (+ 260)

The Inode table is at 8454-8707.

7677 free blocks, 2032 free inodes, 0 directories

Number of available blocks: 8708-16384

Number of inode available: 2033-4064

Group 2: (Blocks 16385-24576)

Block bitmap at 16385 (+ 0), Inode bitmap at 16386 (+ 1)

The Inode table is at 16387-16640 (+ 2)

7936 free blocks, 2032 free inodes, 0 directories

Number of available blocks: 16641-24576

Number of inode available: 4065-6096

Looking at the picture above, the information from debugfs, Group 2, has no so-called group descriptors. Group1, on the other hand, is stored in 8194 and 8195 blocks. OK, let's see what's stored in it.

The second and third blocks in Group 0 store group descriptors, that is, the contents of the group descriptor block from 0x000800~0x001000. 000800 04 01 00 00 05 01 00 00 06 01 00 00 ef 1d e5 07

000810 02 00 04 00 00 00 block group 0 group descriptor 000820 04 21 00 00 05 21 00 06 21 00 00 fd 1D f007

000830 00 00 04 00 00 00 block group 1 group descriptor 000840 01 40 00 00 02 40 00 03 40 00 00 1f f007

000850 00 00 04 00 00 00 Block 2 Group descriptor 000860 04 61 00 00 05 61 00 06 61 00 00 fd 1D f007

000870 00 00 04 00 00 00

000880 01 80 00 00 02 80 00 00 03 80 00 00 00 1f f0 07

000890 00 00 04 00 00 00

0008a0 04 a1 00 00 05 a1 00 00 06 a1 00 00 fd 1d f0 07

0008b0 00 00 04 00 00 00

0008c0 01 c0 00 00 02 c0 00 00 03 c0 00 00 00 1f f0 07

0008d0 00 00 04 00 00 00

0008e0 04 e1 00 00 05 e1 00 00 06 e1 00 00 fd 1d f0 07

0008f0 00 00 04 00 00 00

000900 01 00 01 00 02 00 01 00 03 00 01 00 00 1f f0 07

000910 00 00 04 00 00 00

000fb0 00 00 04 00 00 00

000fc0 01 c0 07 00 02 c0 07 00 03 c0 07 00 ff 0e f0 07

000fd0 00 00 04 00 00 00 block group 62 group descriptor

000fe0 00 00 00

No block group 63

001000 04 20 00 00 04 60 00 00 04 a0 00 00 04 e0 00 00

04 01 00 00 converted to readable decimal is 0x104=259, indicating that the data bitmap is located in block 259 block. The inode bitmap is located at 260, which is the same as the information from debugfs (excluding the startup block). 0x1def=7663 free data blocks.

You can parse the relevant information of any block group yourself, and you can prove that the information of the block group is consistent with that of the debugfs. Now we have determined that the group descriptor is stored as an array on K fast. For us, we only have 63 chunks, each chunk needs 32 bytes, and only 2 block of 1KB will suffice. That is to say, group descriptors, like super blocks, are actually redundant. That is, the two block of the other storage group descriptors, the information and the two block of the group descriptors in block group 0 are the same. Let me prove it.

Block group 25 also has group descriptor blocks, 204802 and 204803 blocks, which record group descriptor information for 63 block groups. The content should be consistent with the two blocks of the previous block group 0. I have taken out the contents of these two block. Compare them by yourselves. The result is that the contents are the same.

Group 25: (Blocks 204801-212992)

Backup superblock at 204801, Group descriptors at 204802-204803

Reserved GDT blocks are located at 204804-205059

Block bitmap at 205060 (+ 259), Inode bitmap at 205061 (+ 260)

The Inode table is at 205062-205315.

7677 free blocks, 2032 free inodes, 0 directories

Number of available blocks: 205316-212992

Number of inode available: 50801-52832

Click (here) to collapse or open

Root@libin:/mnt/bean# dd if=/dev/loop0 bs=1k skip=204802 count=2 | od-tx1-Ax > / tmp/dumphex

Recorded the reading of 2: 0

Recorded the write out of 2: 0.

2048 bytes (2.0 kB) replicated, 0.000160205 seconds, 12.8 MB/ seconds

Root@libin:/mnt/bean# vi / tmp/dumphex

000000 04 01 0000 05 01 0000 06 01 0000 ef 1d e5 07

000010 02 00 04 0000 0000 0000 00

000020 04 21 0000 05 21 0000 06 21 0000 fd 1d f0 07

000030 0000 04 0000 0000 0000 00

000040 01 40 0000 02 40 0000 03 40 0000 00 1f f0 07

000050 0000 04 0000 0000 0000 00

000060 04 61 0000 05 61 0000 06 61 0000 fd 1d f0 07

000070 0000 04 0000 0000 0000 00

000080 01 80 0000 02 80 0000 03 80 0000 00 1f f0 07

000090 0000 04 0000 0000 0000 00

0000a0 04 a1 0000 05 a1 0000 06 a1 0000 fd 1d f0 07

0000b0 0000 04 0000 0000 0000 00

....

0007c0 01 c0 07 00 02 c0 07 00 03 c0 07 00 ff 0e f0 07

0007d0 00 00 04 00 00 00

0007e0 00 00 00

000800

Finally, finally, explain the following why the number of blocks in each block group blocks per group is 8192, because we use 1 block as a bitmap to save the block usage of this block group (bit = 1 means the corresponding block is used, bit = 0 means the corresponding block is idle), 1 block is 1024 bytes, a total of 1024 blocks 8192 blocks, so each block group can only be a maximum of 81292 blocks. By the same token, if the user is using 4094 blocks, then 4096 blocks 8 blocks 32768 bit, so each block group will have 32K blocks. The evidence is down there.

Root@libin:/mnt/bean# cd / home

Root@libin:/home# umount / dev/loop0

Root@libin:/home# cd / mnt/bean

Root@libin:/mnt/bean# ll

Total dosage 8

Drwxr-xr-x 2 root root 4096 2012-07-06 22:32. /

Drwxr-xr-x 4 root root 4096 2012-07-06 22:32.. /

Root@libin:/mnt/bean# mke2fs-b 4096 / dev/loop0

Mke2fs 1.41.11 (14-Mar-2010)

File system label =

Operating system inux

Block size = 4096 (log=2)

Chunk size = 4096 (log=2)

Stride=0 blocks, Stripe width=0 blocks

128000 inodes, 128000 blocks

6400 blocks (5.00%) reserved for the super user

First data block = 0

Maximum filesystem blocks=134217728

4 block groups

32768 blocks per group, 32768 fragments per group

32000 inodes per group

Superblock backups stored on blocks:

32768, 98304

Writing inode table: complete

Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 39 mounts or

180 days, whichever comes first. Use tune2fs-c or-i to override

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.