Write a UNIX file system 07/04 Update SLTechnology News&Howtos

Write a UNIX file system

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Recently, someone asked for help to write a UNIX file system as a summer homework. This kind of thing is basically something that must be done or has been done to learn the operating system. After all, the file system is an important part of the operating system course. To implement this UNIX file system, many people plunged into the system source code of UNIX V6, as well as Leon's UNIX Source Code Analysis and back to basics: inside UNIX Technology, many people came out, and many people got lost in it. Eventually I forget that I just want to implement a UNIX file system.

Why are you lost? because the code is not written by yourself, and it is old, the programming concept is different, and the author may not be able to understand why he writes that way. In fact, there is always something difficult to understand about any code written by others. Of course, if the author is super high-level, then the code is relatively easy to understand. Therefore, it is always easier to write code than to read it! If you are writing a file system, why use ready-made UNIX V6 code? If you understand the layout and structure of the UNIX file, it is not difficult to do one without referring to any existing code from scratch, the most fundamental thing is the UNIX file system itself, as for the code, it is only an interface to implement and operate with the user. If the code is written bit by bit by yourself, then you must be able to fully understand the exact meaning of every statement on each line, and of course you know why you wrote it this way!

This article left me in a hurry for a few hours to write a UNIX-like file system code, not for others to read, but for their own file, because this article has already said, look at other people's generation can only learn experience, can not understand the essence, not to mention, you have to see super first-class code, while I wrote super junk code. I just want to say that understanding the nature of the problem is much more important than code, which is not as important as many people think! The implementation of this paper is based on the Linux system, that is, a user state program is written on the Linux system to realize the IO interface and operation of UNIX files.

I have always adhered to the principle that everything is fundamental, the essential principle and the ideas behind it are very simple, and the so-called complexity is brought about by optimization and strategic extension, just like TCP, the file system of UNIX is no exception! We must know what is the most fundamental and what is secondary. For the UNIX file system, the most fundamental thing is its layout and its system call interface. One is at the lowest level and the other is open to users at the top, as shown below:

System call API: open,write,read,close...

File system layout: boot Fast, Super Block, inode area Table, data Block area

All the parts in between are secondary, which means that those things are fine without them, such as caching, caching, VFS layer, block layer. So there is no such thing in my implementation code, and what I have done is only the minimum set required by the UNIX file system, that is:

In the face of a "block device" that is laid out according to the UNIX file system standard, you can use interfaces such as open,read,write for IO operations.

In the implementation, I use a standard Linux large file to simulate the disk block, so that the operation of the block is basically mapped to the Linux standard write,read and other system calls.

First define the necessary structures:

/ / Super block structure struct filesys {unsigned int size; / / Total size unsigned int size unsigned int s_freeinodesize; / / Table size unsigned int s_freeinodesize / / number of free I nodes unsigned int sidle nextfreeblock; / / next idle I node unsigned int sfreeblock [num]; / / array of idle I nodes unsigned int sidle freeblock; / / number of free blocks unsigned int sidle nextfreeblock; / / next idle block unsigned int sfreeblock [NUM] / / Free block array unsigned int s_pad []; / / fill to 512 bytes}; / / disk inode structure struct finode {int fi_mode; / / Type: file / directory int fi_uid_unused; / / uid. Since it is not associated with the process, it is only simulating a FS, which is not used. It is the same as int fi_gid_unused below. The number of int fi_nlink; / / links. When the number of links is 0, it means that the long int fi_size; / / file size long int fi_ addr [BNUM] is deleted; / / the first-level pointer to the file block is not implemented, and the multi-level pointer time_t fi_atime_unused; / / does not implement time_t fi_mtime_unused;} / / memory inode structure struct inode {struct finode ifixfinder; struct inode * icalendar; / / the directory to which it belongs: I node int ifixino; / / I node number int iadministrators users; / / reference count}; / / directory entry structure (non-Linux kernel directory entry) struct direct {char dname [Max] / / the name of the file or directory unsigned short diterino; / / the I node number of the file or directory}; / / the directory structure struct dir {struct direct direct [DIRNUM]; / / the directory item array unsigned short size; / / contains the directory entry size}; / / the abstract file structure struct file {struct inode * size / / I node int of the file; / / the current read and write pointer of the file}

It is called a UNIX-like file system because I did not accurately confirm the super block and the structure of the inode table of the UNIX file system at that time, but roughly imitated its layout, such as the fields in the super block, and the order of the fields may not be exactly the same as the standard UNIX file system. But in any case, the UNIX file system at that time was basically like this. In addition, you can see that the file structure has very little content, because in essence, I just want to say that "an inode node is a file relative to a reader," that's all. Next is the concrete implementation, my approach is bottom-up, the advantage of this is to facilitate future expansion. Then the first thing to complete is the allocation and release of the I node. In my implementation, the file I node is mapped to the memory I node, which may be contrary to my original intention. Didn't I say not to disturb the audio-visual system with so many "extra" things? Yes, however, I don't like calling read and write more frequently than those so-called extra optimizations. Anyway, as long as you can control the situation.

In the implementation, there is also a major event is the allocation and release of memory, which is not essential, remember, to implement only a UNIX file system, the rest can be bypassed! Obviously, malloc,free and so on are also things that we have to bypass, so I basically use something that pre-allocates space-global arrays. The following are global variables:

/ / memory I node array. NUM is the number of files contained in the file system struct inode g_usedinode [NUM]; / / the memory I node of ROOT struct inode * gopened rootwitter / the array of open files struct file* g_opened [OPENNUM]; / / the super block struct filesys * gmemory superbeat / the file descriptor int g_fake_disk =-1 for Linux large files that simulate the secondary file system

Before giving the implementation code, I want to make it clear that when I delete the file, I don't clean up the file block area and I node, which is notoriously time-consuming, and like many implementations, I just recorded some information indicating that the file block or inode field can be overwritten at any time.

/ synchronize the I node and write it to the "disk" void syncinode (struct inode * inode) {int ino =-1, ipos =-1; ino = inode- > iDevino; / / ipos is the offset of the inode node table in the file system block ipos = IBPOS + ino*sizeof (struct finode); / / read the inode information lseek (g_fake_disk, ipos, SEEK_SET) from the specified offset position of the simulation block Write (g_fake_disk, (void *) & inode- > i_finode, sizeof (struct finode));} / / synchronize super block information int syncsuper (struct filesys * super) {int pos =-1, size =-1; struct dir dir = {0}; pos = BOOTBSIZE; size = SUPERBSIZE; lseek (g_fake_disk, pos, SEEK_SET) Write (g_fake_disk, (void *) super, size); syncinode (g_root); breadwrite (grubroot-> i_finode.fi_addr [0], (char *) & dir, sizeof (struct dir), 0,1); breadwrite (grubroot-> i_finode.fi_addr [0], (char *) & dir, sizeof (struct dir), 0,0) } / / key function to convert pathname to I node. Relative path struct inode * namei (char * filepath, char flag, int * match, char * ret_name) {int in = 0; int repeat = 0; char * name = NULL; char * path = calloc (1, MAXLEN*10); char * back = path; struct inode * root = iget (0); struct inode * parent = root Struct dir dir = {0}; strncpy (path, filepath, MAXLEN*10); if (path [0]! ='/') return NULL; breadwrite (root- > i_finode.fi_addr [0], & dir, sizeof (struct dir), 0,1); while ((name=strtok (path, "/"))! = NULL) {int I = 0 Repeat = 0; * match = 0; path = NULL; if (ret_name) {strcpy (ret_name, name);} for (; ii_parent = parent; * match = 1 If (root- > i_finode.fi_mode = = MODE_DIR) {memset (& dir, 0, sizeof (struct dir)); breadwrite (root- > i_finode.fi_addr [0], & dir, sizeof (struct dir), 0,1) } else {free (back); return root;} repeat = 1 }} if (repeat = = 0) {break;}} if (* match! = 1) {* match = 0 } if (* match = = 0) {if (ret_name) {strcpy (ret_name, name);}} free (back); return root;} / / struct inode * iget (int ino) {int ibpos = 0; int ipos = 0 Int ret = 0; / / tends to get if (g _ usedinode [ino] .I _ users) {gusedinode [ino] .I _ users + +; return & g _ usedinode [ino];} if (g_fake_disk) directly from the memory I node

< 0) { return NULL; } //实在不行则从模拟磁盘块读入 ipos = IBPOS + ino*sizeof(struct finode); lseek(g_fake_disk, ipos, SEEK_SET); ret = read(g_fake_disk, &g_usedinode[ino], sizeof(struct finode)); if (ret == -1) { return NULL; } if (g_super->

S _ freeinode [ino] = 0) {return NULL;} / / if it is a file that has been deleted or an I node that has never been assigned, initialize its link value and the size value if (g _ usedinode [ino]. I _ finode.fi_nlink = = 0) {g _ usedinode [ino] .I _ finode.fi_nlink + + G _ usedinode [ino]. I _ finode.fi_size = 0; syncinode (& g _ usedinode [ino]);} g _ usedinode [ino] .I _ users + +; g _ usedinode [ino]. I _ ino = ino; return & g _ usedinode [ino] } / / release an occupied memory I node void iput (struct inode * ip) {if (ip- > i_users > 0) ip- > i_users -;} / / allocate an unused I node. Note that I did not use the s_freeinodesize field of the super block, / / because there will be a better and faster allocation algorithm struct inode* ialloc () {int ino =-1, nowpos =-1; ino = gsupersuper- > slicnextfreeode; if (ino = =-1) {return NULL;} nowpos = ino + 1; gsupersuper- > s_nextfreeinode =-1 / / find the next idle I node. As mentioned above, this algorithm is not good at for (; nowpos).

< NUM; nowpos++) { if (g_super->

Return iget (ino);} / / attempt to delete a file I node int itrunc (struct inode * ip) {iput (ip) If (ip- > i_users = = 0 & & g_super) {syncinode (ip); gelecsuper- > s _ freestate [IP-> i_ino] = 0; gambisuper- > s_nextfreeinode = ip- > ionomino; return 0;} return ERR_BUSY } / / assign an unused block int balloc () {int bno =-1, nowpos =-1; bno = bno super- > return bno; extfreeblock; if (bno =-1) {return bno;} block = bno + 1; gblocks super- > s_nextfreeblock =-1; for (; block)

< NUM; nowpos++) { if (g_super->

Return bno;} / / read and write operations int breadwrite (int bno, char * buf, int size, int offset, int type) {int pos = BOOTBSIZE+SUPERBSIZE+g_super- > s_itsize + bno*BSIZE Int rs =-1; if (offset + size > BSIZE) {return ERR_EXCEED;} lseek (g_fake_disk, pos + offset, SEEK_SET); rs = type? Read (g_fake_disk, buf, size): write (g_fake_disk, buf, size); return rs;} / / IO reader interface int mfread (int fd, char * buf, int length) {struct file * fs = g _ opened [FD]; struct inode * inode = fs- > fanciinode; int baddr = fs- > fancicurpos; int bondary = baddr%BSIZE; int max_block = (baddr+length) / BSIZE Int size = 0; int I = inode- > iDefinode.fi_ addr [baddr / BSIZE+1]; for (; I

< max_block+1; i ++,bondary = size%BSIZE) { size += breadwrite(inode->

I_finode.fi_addr [I], buf+size, (length-size)% BSIZE, bondary, 1);} return size;} / / IO write interface int mfwrite (int fd, char * buf, int length) {struct file * fs = g _ opened [FD]; struct inode * inode = fs- > fanciinode; int baddr = fs- > fancicurposure; int bondary = baddr%BSIZE; int max_block = (baddr+length) / BSIZE Int curr_blocks = inode- > iSize; int size = 0; int sync = 0; int I = inode- > iambifinode.fil _ addr [baddr / BSIZE+1]; / / if you write for the first time, assign a block if (inode- > i_finode.fi_size = = 0) {int nbno = balloc () If (nbno = =-1) {return-1;} inode- > i_finode.fi_addr [0] = nbno; sync = 1 } / / if you must expand, you can redistribute the block, and you can merge and optimize if (max_block > curr_blocks) {int j = curr_blocks + 1; for (; j) above.

< max_block; j++) { int nbno = balloc(); if (nbno == -1) { return -1; } inode->

IDefinode.fiaddr [j] = nbno;} sync = 1;} for (; I

< max_block+1; i ++,bondary = size%BSIZE) { size += breadwrite(inode->

I_finode.fi_addr [I], buf+size, (length-size)% BSIZE, bondary, 0);} if (size) {inode- > i_finode.fi_size + = size; sync = 1;} if (sync) {syncinode (inode);} return size } / / seek interface of IO int mflseek (int fd, int pos) {struct file * fs = g _ open [FD]; fs- > f_curpos = pos; return pos;} / / IO open interface int mfopen (char * path, int mode) {struct inode * inode = NULL; struct file * file = NULL; int match = 0; inode = namei (path, 0, & match, NULL) If (match = = 0) {return ERR_NOEXIST;} file = (struct file*) calloc (1, sizeof (struct file)); file- > f_inode = inode; file- > f_curpos = 0; g _ open [g _ fd] = file; gopened fdbits; return g_fd-1 } / / IO shuts down interface void mfclose (int fd) {struct inode * inode = NULL; struct file * file = NULL; file = g _ opened[ FD]; inode = file- > fanciinode; iput (inode); free (file);} / IO create interface int mfcreat (char * path, int mode) {int match = 0; struct dir dir; struct inode * new = NULL Char name [MAXLEN] = {0}; struct inode * inode = namei (path, 0, & match, name); if (match = = 1) {return ERR_EXIST;} breadwrite (inode- > i_finode.fi_addr [0], (char *) & dir, sizeof (struct dir), 0,1); strcpy (dir.direct [dir.size] .d _ name, name) New = ialloc (); if (new = = NULL) {return-1;} dir.direct[ size] .d _ ino = new- > iDevino; new- > i_finode.fi_mode = mode; if (mode = = MODE_DIR) {/ / delay in allocating directory entries int nbno = balloc () is not allowed If (nbno =-1) {return-1;} new- > i_finode.fi_addr [0] = nbno;} new- > i_parent = inode; syncinode (new); dir.size + + Breadwrite (inode- > i_finode.fi_addr [0], (char *) & dir, sizeof (struct dir), 0,0); syncinode (inode); iput (inode); syncinode (new); iput (new); return ERR_OK;} / / IO delete interface int mfdelete (char * path) {int match = 0; struct dir dir; struct inode * del = NULL Struct inode * parent = NULL; char name [MAXLEN]; int I = 0; struct inode * inode = namei (path, 0, & match, name); if (match = 0 | | inode- > i_ino = = 0) {return ERR_NOEXIST;} match =-1; parent = inode- > i_parent Breadwrite (parent- > i_finode.fi_addr [0], (char *) & dir, sizeof (struct dir), 0,1); for (; I

< dir.size; i++) { if (!strncmp(name, dir.direct[i].d_name, strlen(name))) { del = iget(dir.direct[i].d_ino); iput(del); if (itrunc(del) == 0) { memset(dir.direct[i].d_name, 0, strlen(dir.direct[i].d_name)); match = i; break; } else { return ERR_BUSY; } } } for (i = match; i < dir.size - 1 && match != -1; i++) { strcpy(dir.direct[i].d_name, dir.direct[i+1].d_name); } dir.size--; breadwrite(parent->

I_finode.fi_addr [0], (char *) & dir, sizeof (struct dir), 0,0); return ERR_OK;} / / sequence initialization interface to initialize the memory structure int initialize (char * fake_disk_path) {g_fake_disk = open (fake_disk_path, O_RDWR) from the analog block device; if (g_fake_disk =-1) {return ERR_NOEXIST } g_super = (struct filesys*) calloc (1, sizeof (struct filesys)); lseek (g_fake_disk, BOOTBSIZE, SEEK_SET); read (g_fake_disk, g_super, sizeof (struct filesys)) ROOT if super- > s_freeblocksize = (gtiny super- > s_size-(BOOTBSIZE+SUPERBSIZE+INODEBSIZE)) / BSIZE; g_root = iget (0); / / for the first time, assign ROOT if (g_root = = NULL) {g_root = ialloc (); gallowed root-> i_finode.fi_addr [0] = balloc ();} return ERR_OK;}

Here is a test program:

Int main () {int fd =-1 laws ws =-1; char buf [16] = {0}; initialize ("bigdisk"); mfcreat ("/ aa", MODE_FILE); fd = mfopen ("/ aa", 0); ws = mfwrite (fd, "abcde", 5); mfread (fd, buf, 5); mfcreat ("/ bb", MODE_DIR) Mfcreat ("/ bb/cc", MODE_FILE); fd = mfopen ("/ bb/cc", 0); ws = mfwrite (fd, "ABCDEFG", 6); mfread (fd, buf, 5); mflseek (0,4); ws = mfwrite (0, "ABCDEFG", 6); mflseek (0,0); mfread (0, buf, 10); mfclose (0) Mfdelete ("/ aa"); fd = mfopen ("/ aa", 0); mfcreat ("/ aa", MODE_FILE); fd = mfopen ("/ aa", 0); syncsuper (g_super);}

This file system is super simple to implement, removes a lot of extra non-essential stuff, and bypasses annoying memory management problems! Therefore, my implementation also shows the nature of the UNIX file system. So take another look, is there anything else that is extra, but essential or at least interesting? The answer is obviously the organization and allocation algorithm of free blocks or free inode, but this algorithm can be abstracted separately.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.