In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
For a long time, I wanted to write an article about the ext family file system. When I was at work, I accidentally deleted a lot of files by rm-rf. I really wanted to have a data recovery software that could help me recover the data. Of course, to learn about data recovery, you must first learn about the file system. Recent work reasons, for a long time did not read and learn Linux kernel-related things, feel abominable. Let's go too far and begin our exploration of the ext2 file system.
I won't say anything about the features of ext2. You can find it in any reliable linux tutorial. Let's go straight ahead and start exploring. First generate an ext2 file system. I set aside 500m of space in my Ubuntu with limited disk space to learn the ext2 file system from scratch. The dd command is used to create a file, and by executing this dd command, an all-zero 512000*1KB file, the 500MB file, is generated. Losetup is set to cycle the device (loop service), the loop device can simulate files into block devices. Then set up our ext2 file system on the block device to carry out our learning. So use the mke2fs command to format the loop device into an ext2 file system. Oh,yeah, we finally have the ext2 file system. It needs to be emphasized here that we call the default option of mke2fs:
Root@libin:~# dd if=/dev/zero of=bean bs=1K count=512000
Recorded 512000 readings.
Recorded 512000 of the write-out
524288000 bytes (524 MB) replicated, 9.40989 seconds, 55.7 MB/ seconds
Root@libin:~# ll bean
-rw-r--r-- 1 root root 524288000 2012-07-06 22:24 bean
Root@libin:~# ll-h bean
-rw-r--r-- 1 root root 500m 2012-07-06 22:24 bean
Root@libin:~#
Root@libin:~#
Root@libin:~# losetup / dev/loop0 bean
Root@libin:~# cat / proc/partitions
Major minor # blocks name
7 0 512000 loop0
8 0 312571224 sda
8 1 49182966 sda1
.
Oot@libin:~# mke2fs / dev/loop0
Mke2fs 1.41.11 (14-Mar-2010)
File system label =
Operating system inux
Block size = 1024 (log=0)
Chunk size = 1024 (log=0)
Stride=0 blocks, Stripe width=0 blocks
128016 inodes, 512000 blocks
25600 blocks (5.00%) reserved for the super user
First data block = 1
Maximum filesystem blocks=67633152
63 block groups
8192 blocks per group, 8192 fragments per group
2032 inodes per group
Superblock backups stored on blocks:
8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409
Writing inode table: complete
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 24 mounts or
180 days, whichever comes first. Use tune2fs-c or-i to override.
But it's not over yet, we still can't access our new ext2 file system, because it hasn't been mounted yet, so I decided to mount the loop device in the / mnt/bean directory.
Mkdir / mnt/bean
Mount-t ext2 / dev/loop0 / mnt/bean
Root@libin:/mnt/bean# mount
.
/ dev/loop0 on / mnt/bean type ext2 (rw)
Root@libin:/mnt/bean# ll
Total dosage 17
Drwxr-xr-x 3 root root 1024 2012-07-06 22:31. /
Drwxr-xr-x 4 root root 4096 2012-07-06 22:32.. /
Drwx- 2 root root 12288 2012-07-06 22:31 lost found/
Through our efforts, we finally created our ext2 file system. Next we need to talk about the structure of the ext2 file system.
The following figure is a structure diagram of the classic ext2 file system. Similar images can be found all over the Internet, but the reason I have to draw this picture is to clarify two questions: 1 not all block groups have super block and fast group descriptors. The 2-block group descriptor GDT does not only manage the information of its own block group, on the contrary, it manages the information of all the block groups.
(the number of inode tables and data blocks is not necessarily equal, there is something wrong with my picture.)
We know that super blocks are important because it tells linux how the block device is organized, and it tells linux what the file system is, how big each block is (1024, 2048 or 4096), how many blocks there are in each block group, and how many bytes inode occupies. Wait a minute. It is precisely because the super block is very important, so we can not keep only one copy of this information. Just imagine, if the super block is broken, and we only have one block group with the super block, then it is completely finished, and we can't get the space close to 500m and the data in it. This is easier to understand. However, should every block group have a startup block? This is not necessary, it is also a bit of a waste of space. So which block group do you put the super block into?
Superblock backups stored on blocks:
8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409
This is the result information output from the loop device to the terminal, because each block group is 8192 blocks (the reason will be discussed later), so the 0th block group, the first block group, the third block group, the 5th block group, the 7th block group, the 9th block group, the 25th block group, the 27th block group, and the 49th block group store super blocks.
How is it calculated, and why do these block groups have to exist? The calculation rule is the power of 3 5 and 7, and such a block group holds the super block. Before explaining the block group descriptor, let's take a look at the information about the super block:
Struct ext2_super_block {
U32 s_inodes_count
U32 s_blocks_count
U32 s_r_blocks_count
_ _ u32 s_free_blocks_count
U32 s_free_inodes_count
U32 s_first_data_block
_ _ u32 s_log_block_size
U32 s_dummy3 [7]
Unsigned char s_magic [2]
_ _ u16 s_state
...
}
Let's get some information about ext2 through debugfs.
Root@libin:/mnt/bean# dumpe2fs / dev/loop0
Dumpe2fs 1.41.11 (14-Mar-2010)
Filesystem volume name:
Last mounted on:
Filesystem UUID: 3bff7535-6f39-4720-9b64-1dc8cf9fe61d
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: ext_attr resize_inode dir_index filetype sparse_super
Filesystem flags: signed_directory_hash
Default mount options: (none)
Filesystem state: not clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 128016
Block count: 512000
Reserved block count: 25600
Free blocks: 493526
Free inodes: 128005
First block: 1
Block size: 1024
Fragment size: 1024
Reserved GDT blocks: 256
Blocks per group: 8192
Fragments per group: 8192
Inodes per group: 2032
Inode blocks per group: 254
Filesystem created: Fri Jul 6 22:31:09 2012
Last mount time: Fri Jul 6 22:33:28 2012
Last write time: Fri Jul 6 22:33:28 2012
Mount count: 1
Maximum mount count: 24
Last checked: Fri Jul 6 22:31:09 2012
Check interval: 15552000 (6 months)
Next check after: Wed Jan 2 22:31:09 2013
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Default directory hash: half_md4
Directory Hash Seed: 0140915d-91ae-43df-9d84-9536cedc0d2b
Group 0: (Blocks 1-8192)
Master superblock at 1, Group descriptors at 2-3
Reserved GDT blocks are located at 4-259
Block bitmap at 260,260,261,260)
The Inode table is located at 262-515,261
7663 free blocks, 2021 free inodes, 2 directories
Number of available blocks: 530-8192
Number of inode available: 12-2032
...
Group 62: (Blocks 507905-511999)
Block bitmap at 507905 (+ 0), Inode bitmap at 507906 (+ 1)
The Inode table is at 507907-508160 (+ 2)
3839 free blocks, 2032 free inodes, 0 directories
Number of available blocks: 508161-511999
Number of inode available: 125985-128016
OK, we got this information, but how can I prove that the information debugfs got is correct? There is only one way, we drill into the super block, according to the super block data structure, get the value of each field of the super block, sounds exciting, OK,Just DO IT.
Root@libin:/mnt/bean# dd if=/dev/loop0 bs=1k count=261 | od-tx1-Ax > / tmp/dump_hex
Recorded the reading of 2610.
Recorded the write out of 2610.
267264 bytes (267Bytes) replicated, 0.0393023 seconds, 0.0393023 MB/ seconds
Root@libin:/mnt/bean# vi / tmp/dump_hex
I read the 261K bytes in front of the entire loop device into / tmp/dump_hex. Block 0 is the startup block, press not to mention. The first piece is super block. Very excited, we can finally meet the legendary super block naked.
000400 10 f4 01 00 00 d0 07 00 00 64 00 00 d6 87 07 00
000410 05 f4 01 00 01 00 00 00
000420 00 20 00 00 00 20 00 00 f0 07 00 00 5f cb f7 4f
000430 5f cb f7 4f 01 00 1a 00 53 ef 00 00 01 00 00 00
000440 25 cb f7 4f 00 4e ed 00 00 00 01 00 00 00
000450 00 00 00 0b 00 00 00 80 00 00 00 38 00 00 00
000460 02 00 00 00 01 00 00 00 5a 65 4b 92 fe 63 43 eb
000470 b6 86 3e f3 6e 44 19 af 00 00 00
000480 00 00 00
0004c0 00 00 00 01
0004d0 00 00 00
0004e0 00 00 00 f9 6f 16 79
0004f0 b7 dc 4f 8a a1 a1 18 82 72 a7 d8 25 01 00 00 00
000500 00 00 00 25 cb f7 4f 00 00 00
000510 00 00 00
000560 01 00 00 00
000570 00 00 00
*
000800 04 01 00 00 05 01 00 00 06 01 00 00 ef 1d e5 07
The leftmost column is the address, hexadecimal. 000400001K, in other words, is the 1K byte of the file. 000800 = 2K, this is the super block we've been thinking about for a long time. I was so excited that I posted the whole super block. Fortunately, I am not the one who cheated the manuscript fee by the number of words, otherwise we would be despised to death. Then paste the data structure of the ext2 super block, and we compare it field by field to see if debugfs is right.
Struct ext2_super_block {
U32 s_inodes_count
U32 s_blocks_count
U32 s_r_blocks_count
_ _ u32 s_free_blocks_count
U32 s_free_inodes_count
_ _ u32 s_first_data_block
_ _ u32 s_log_block_size
...
}
The first field is called s_inodes_count and takes up four bytes. OK, let's see, the first four bytes starting at 1K are 10 f4 01 00. We know there are little-endian and big-endian. In order to support the mobility of the file system, ext2 designers stipulate that all disks are little-endian. When the data is read into memory, kernel is responsible for converting the format to the native format of cpu.
OK, it's little-endian. We'll see. It's just 0x0001f410. 0x0001f410=128016, look at the data that debugfs gave us, Inode count: 128016, exactly the same. For example, if we care about free_blocks_count and look at the data structure, the starting position of the free_blocks_count field is the 12th byte of the super block. That is, 00040c address. Look at d6 87 07 00. By calculating the following, you can get 0x000787d6 = 493526, which is the same as that given by debugfs's Free blocks. OK . Take care of what fields you care about, you can check them yourself. By being naked with the super block, we know the structure of ext2 super block. In conclusion, not all block groups have super blocks, which occupy only one block block. Yes, when the blocksize is 4K, most of the space in this block is wasted. But fortunately, after all, the number of super blocks is limited and can not be wasted much. The following is about block group descriptors: group descriptors have a total of 32 bytes, most textbooks will give us a set of misunderstanding, that is, each block group should have a group descriptor. In fact, this is not the case. We know that a group descriptor occupies only 32 bytes, and most textbooks will tell us that the group descriptor in a block group occupies k blocks, and a group descriptor does not use so much space. There is only one truth, that is, all group descriptors are stored in k blocks as an array. That is, a block group may not have a group descriptor, while a block group with a group descriptor stores all the group descriptors in the k block. Let me confirm this:
Struct ext2_group_desc
{
U32 bg_block_bitmap; / Blocks bitmap block /
U32 bg_inode_bitmap; / Inodes bitmap block /
U32 bg_inode_table; / Inodes table block /
U16 bg_free_blocks_count; / Free blocks count /
U16 bg_free_inodes_count; / Free inodes count /
_ _ u16 bg_used_dirs_count; / Directories count /
U16 bg_flags
_ _ u32 bg_exclude_bitmap_lo;/ Exclude bitmap for snapshots /
U16 bg_block_bitmap_csum_lo;/ crc32c (s_uuid+grp_num+bitmap) LSB /
U16 bg_inode_bitmap_csum_lo;/ crc32c (s_uuid+grp_num+bitmap) LSB /
U16 bg_itable_unused; / Unused inodes count /
U16 bg_checksum; / crc16 (s_uuid+grouo_num+group_desc) /
}
Group 0: (Blocks 1-8192)
Master superblock at 1, Group descriptors at 2-3
Reserved GDT blocks are located at 4-259
Block bitmap at 260,260,261 Inode bitmap at
The Inode table is located at 262-515 (+ 261)
7663 free blocks, 2021 free inodes, 2 directories
Number of available blocks: 530-8192
Number of inode available: 12-2032
Group 1: (Blocks 8193-16384)
Backup superblock at 8193, Group descriptors at 8194-8195
Reserved GDT blocks are located at 8196-8451
Block bitmap at 8452 (+ 259), Inode bitmap at 8453 (+ 260)
The Inode table is at 8454-8707.
7677 free blocks, 2032 free inodes, 0 directories
Number of available blocks: 8708-16384
Number of inode available: 2033-4064
Group 2: (Blocks 16385-24576)
Block bitmap at 16385 (+ 0), Inode bitmap at 16386 (+ 1)
The Inode table is at 16387-16640 (+ 2)
7936 free blocks, 2032 free inodes, 0 directories
Number of available blocks: 16641-24576
Number of inode available: 4065-6096
Looking at the picture above, the information from debugfs, Group 2, has no so-called group descriptors. Group1, on the other hand, is stored in 8194 and 8195 blocks. OK, let's see what's stored in it.
The second and third blocks in Group 0 store group descriptors, that is, the contents of the group descriptor block from 0x000800~0x001000. 000800 04 01 00 00 05 01 00 00 06 01 00 00 ef 1d e5 07
000810 02 00 04 00 00 00 block group 0 group descriptor 000820 04 21 00 00 05 21 00 06 21 00 00 fd 1D f007
000830 00 00 04 00 00 00 block group 1 group descriptor 000840 01 40 00 00 02 40 00 03 40 00 00 1f f007
000850 00 00 04 00 00 00 Block 2 Group descriptor 000860 04 61 00 00 05 61 00 06 61 00 00 fd 1D f007
000870 00 00 04 00 00 00
000880 01 80 00 00 02 80 00 00 03 80 00 00 00 1f f0 07
000890 00 00 04 00 00 00
0008a0 04 a1 00 00 05 a1 00 00 06 a1 00 00 fd 1d f0 07
0008b0 00 00 04 00 00 00
0008c0 01 c0 00 00 02 c0 00 00 03 c0 00 00 00 1f f0 07
0008d0 00 00 04 00 00 00
0008e0 04 e1 00 00 05 e1 00 00 06 e1 00 00 fd 1d f0 07
0008f0 00 00 04 00 00 00
000900 01 00 01 00 02 00 01 00 03 00 01 00 00 1f f0 07
000910 00 00 04 00 00 00
000fb0 00 00 04 00 00 00
000fc0 01 c0 07 00 02 c0 07 00 03 c0 07 00 ff 0e f0 07
000fd0 00 00 04 00 00 00 block group 62 group descriptor
000fe0 00 00 00
No block group 63
001000 04 20 00 00 04 60 00 00 04 a0 00 00 04 e0 00 00
04 01 00 00 converted to readable decimal is 0x104=259, indicating that the data bitmap is located in block 259 block. The inode bitmap is located at 260, which is the same as the information from debugfs (excluding the startup block). 0x1def=7663 free data blocks.
You can parse the relevant information of any block group yourself, and you can prove that the information of the block group is consistent with that of the debugfs. Now we have determined that the group descriptor is stored as an array on K fast. For us, we only have 63 chunks, each chunk needs 32 bytes, and only 2 block of 1KB will suffice. That is to say, group descriptors, like super blocks, are actually redundant. That is, the two block of the other storage group descriptors, the information and the two block of the group descriptors in block group 0 are the same. Let me prove it.
Block group 25 also has group descriptor blocks, 204802 and 204803 blocks, which record group descriptor information for 63 block groups. The content should be consistent with the two blocks of the previous block group 0. I have taken out the contents of these two block. Compare them by yourselves. The result is that the contents are the same.
Group 25: (Blocks 204801-212992)
Backup superblock at 204801, Group descriptors at 204802-204803
Reserved GDT blocks are located at 204804-205059
Block bitmap at 205060 (+ 259), Inode bitmap at 205061 (+ 260)
The Inode table is at 205062-205315.
7677 free blocks, 2032 free inodes, 0 directories
Number of available blocks: 205316-212992
Number of inode available: 50801-52832
Click (here) to collapse or open
Root@libin:/mnt/bean# dd if=/dev/loop0 bs=1k skip=204802 count=2 | od-tx1-Ax > / tmp/dumphex
Recorded the reading of 2: 0
Recorded the write out of 2: 0.
2048 bytes (2.0 kB) replicated, 0.000160205 seconds, 12.8 MB/ seconds
Root@libin:/mnt/bean# vi / tmp/dumphex
000000 04 01 0000 05 01 0000 06 01 0000 ef 1d e5 07
000010 02 00 04 0000 0000 0000 00
000020 04 21 0000 05 21 0000 06 21 0000 fd 1d f0 07
000030 0000 04 0000 0000 0000 00
000040 01 40 0000 02 40 0000 03 40 0000 00 1f f0 07
000050 0000 04 0000 0000 0000 00
000060 04 61 0000 05 61 0000 06 61 0000 fd 1d f0 07
000070 0000 04 0000 0000 0000 00
000080 01 80 0000 02 80 0000 03 80 0000 00 1f f0 07
000090 0000 04 0000 0000 0000 00
0000a0 04 a1 0000 05 a1 0000 06 a1 0000 fd 1d f0 07
0000b0 0000 04 0000 0000 0000 00
....
0007c0 01 c0 07 00 02 c0 07 00 03 c0 07 00 ff 0e f0 07
0007d0 00 00 04 00 00 00
0007e0 00 00 00
*
000800
Finally, finally, explain the following why the number of blocks in each block group blocks per group is 8192, because we use 1 block as a bitmap to save the block usage of this block group (bit = 1 means the corresponding block is used, bit = 0 means the corresponding block is idle), 1 block is 1024 bytes, a total of 1024 blocks 8192 blocks, so each block group can only be a maximum of 81292 blocks. By the same token, if the user is using 4094 blocks, then 4096 blocks 8 blocks 32768 bit, so each block group will have 32K blocks. The evidence is down there.
Root@libin:/mnt/bean# cd / home
Root@libin:/home# umount / dev/loop0
Root@libin:/home# cd / mnt/bean
Root@libin:/mnt/bean# ll
Total dosage 8
Drwxr-xr-x 2 root root 4096 2012-07-06 22:32. /
Drwxr-xr-x 4 root root 4096 2012-07-06 22:32.. /
Root@libin:/mnt/bean# mke2fs-b 4096 / dev/loop0
Mke2fs 1.41.11 (14-Mar-2010)
File system label =
Operating system inux
Block size = 4096 (log=2)
Chunk size = 4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
128000 inodes, 128000 blocks
6400 blocks (5.00%) reserved for the super user
First data block = 0
Maximum filesystem blocks=134217728
4 block groups
32768 blocks per group, 32768 fragments per group
32000 inodes per group
Superblock backups stored on blocks:
32768, 98304
Writing inode table: complete
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 39 mounts or
180 days, whichever comes first. Use tune2fs-c or-i to override
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.