In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains the "Linux file descriptor fd is what", the article explains the content is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in-depth, together to study and learn "Linux file descriptor fd what is" it!
Summary of antecedents
We know that there are two ways to read and write files, one is the system call, the operation object is an integer fd, the other is the standard library IO encapsulated by the Go standard library, and the operation object is the file structure encapsulated by Go, but its internal operation is still aimed at the integer fd. So the source of everything is operated through fd, so what on earth is this fd? Let's make an in-depth analysis at this point.
What is fd?
Fd is the abbreviation of File descriptor, which is called: file descriptor in Chinese. The file descriptor is a non-negative integer and is essentially an index value (which is very important).
When did you get the fd?
When a file is opened, the kernel returns a file descriptor (obtained by open system call) to the process. For subsequent read and write files, you only need to use this file descriptor to identify the file and pass it as a parameter to read and write.
What is the range of values for fd?
In POSIX semantics, the three fd values have been given special meanings, namely, standard input (STDIN_FILENO), standard output (STDOUT_FILENO), and standard error (STDERR_FILENO).
The file descriptor has a range: 0 ~ OPEN_MAX-1. In the earliest UNIX systems, the range is very small. In terms of this value alone, the range of changes in mainstream systems is almost unlimited, and is only restricted by system hardware configuration and system administrator configuration.
You can view the configuration of the current system through the ulimit command:
➜ulimit-n 4864
As above, processes on my system open up to 4864 files by default.
Peep into the Linux Kernel
What on earth is fd? You have to take a look at the Linux kernel.
Users use the system call open or creat to open or create a file, and the result value obtained in user mode is fd. All subsequent IO operations use fd to identify the file. It is conceivable that the operation done by the kernel is not simple, and the next step is to unveil this layer.
Task_struct
First of all, we know that the abstraction of the process is based on the struct task_struct structure, which is one of the most complex structures in Linux. There are many member fields. We do not need to explain this structure in detail today. I will simplify it a little bit and only extract the fields we need to understand today as follows:
Struct task_struct {/ /... / * Open file information: * / struct files_struct * files; / /...}
The files; field is one of the protagonists today, and files is a pointer to a structure that is struct files_struct. This structure is used to manage the management structure of all files opened by the process.
Focus on understanding one concept:
Struct task_struct is the abstract encapsulation of a process, identifying a process. All kinds of abstract perspectives of the process in Linux are given to you by this structure. When creating a process, it is actually new a struct task_struct out.
Files_struct
Well, the above leads to the struct files_struct structure through the process structure. This structure manages the management structure of all files opened by a process, and the structure itself is relatively simple:
/ * * Open file table structure * / struct files_struct {/ / read related fields atomic_t count; bool resize_in_progress; wait_queue_head_t resize_wait; / / Open file management structure struct fdtable _ _ rcu * fdt; struct fdtable fdtab; / / write related fields unsigned int next_fd; unsigned long close_on_exec_init [1] Unsigned long open_fds_init [1]; unsigned long full_fds_bits_init [1]; struct file * fd_ array [NR _ OPEN_DEFAULT];}
Files_struct is a structure we say is used to manage all open files. How to manage it? In essence, it is the way of array management, with all open file structures in one array. This may make you wonder, where is the array? There are two places:
Struct file * fd_ array [NR _ OPEN_DEFAULT] is a static array assigned with the files_struct structure. On 64-bit systems, the static array size is 64.
Struct fdtable is also an array management structure, except that this is a dynamic array and the array boundaries are described by fields.
Think about it: why is there such a static + dynamic approach?
The tradeoff between performance and resources! Most processes open only a small number of files, so static arrays are sufficient, so there is no need to allocate additional memory. If the threshold of the static array is exceeded, it extends dynamically.
You can recall that this is not similar to the direct index of inode, the optimization idea of the first-level index.
Fdtable
A brief introduction to the fdtable structure, which is the structure that encapsulates the structure used to manage fd, and the secret of fd lies in this. The simplified structure is as follows:
Struct fdtable {unsigned int max_fds; struct file _ rcu * * fd; / * current fd array * /}
Notice that the fdtable.fd field is a secondary pointer. What do you mean?
It points to whether the fdtable.fd is a pointer field and the memory address to which the pointer is stored (the element pointer type is struct file *). In other words, fdtable.fd points to an array, and the array elements are pointers (pointer type is struct file *).
Where max_fds indicates the array boundary.
Summary of files_struct
File_struct is essentially used to manage all open files, and the internal core is implemented by a static array and dynamic array management structure.
Remember when we said that the file descriptor fd is essentially an index? Here's the concept. Fd is the index of the array, that is, the slot number of the array. The address of the corresponding struct file structure can be obtained through the non-negative number fd.
Let's string the concepts together (note that fdtable management is simplified here to highlight the nature of fd):
Fd is really just the index of the pointer array that the files field points to (that's all). Through fd, you can find the struct file structure of the corresponding file.
File
Now we know that fd is essentially an array index, and array elements are pointers to struct file structures. So this leads to a structure of struct file. What is this structure used for?
This structure is used to represent the files opened by the process. The simplified structure is as follows:
Struct file {/ /... Struct path fancipaths; struct inode * fanciinodes; const struct file_operations * fanciops; atomic_long_t favored counters; unsigned int favoids; fmode_t f_mode Struct mutex flocks; loff_t flocks; struct fown_struct flocks; / /...}
This structure is very important. It identifies a file opened by a process. Here are some of the most important fields related to IO:
F_path: identifies the file name
F_inode: a very important field. Inode, which is the inode type of vfs, is an abstract encapsulation based on a specific file system.
F_pos: this field is very important, offset, yes, is the current file offset. Remember that offset was also mentioned in the last IO foundation, right? it refers to this. F_pos will be set to the default value when open, and can be changed when seek, thus affecting the location of write/read.
Think about the problem
Consider question 1: the files_struct structure will only belong to one process, but will the struct file structure belong to only one process? Or is it possible to be shared by multiple processes?
Highlight: struct file is a system-level structure, in other words, it can be shared with many different processes.
Consider question 2: when will the fd of multiple processes point to the same file structure?
For example, in fork, the parent process opens the file, followed by a child process fork. In this case, there is a scenario of sharing file. As shown in the figure:
Question 3: is it possible for multiple fd to point to the same file structure in the same process?
Sure. That's what the dup function does.
# include int dup (int oldfd); int dup2 (int oldfd, int newfd); inode
We see that there is a pointer to inode in the struct file structure, which naturally leads to the concept of inode. The inode that this points to does not directly point to the inode of the specific file system, but rather a layer of virtual file system abstracted by the operating system, called VFS (Virtual File System), and then under the VFS is the real file system, such as ext4.
The complete architecture diagram is as follows:
Think: why is there such a layer of encapsulation?
In fact, it is quite understandable that it is decoupling. If you let struct file interface directly with a file system like struct ext4_inode, it will cause the processing logic of struct file to be very complex, because every time you dock a specific file system, you have to consider an implementation. Therefore, the operating system must block the following file system, provide a unified concept of inode, and callback and register the defined interfaces. This unifies the concept of inode, which is the basis of everything in Unix.
Let's take a look at the structure of VFS's inode:
Struct inode {/ / basic file-related information (permissions, mode, uid,gid, etc.) umode_t iTunes; unsigned short iCompletes; kuid_t iCompleuid; kgid_t iDigid; unsigned int iCompletes; / / callback function const struct inode_operations * i_op Struct super_block * iSecretsb; struct address_space * imapping; / / File size, loff_t itimesizer, such as atime,ctime,mtime, struct timespec64 iSecretatime, struct timespec64 itimetime.struct timespec64 itimetime.callback function const struct file_operations * i_fop. Struct address_space iSecretData; / / points to the special data of the backend specific file system void * iuploaded private; / * fs or device private pointer * /}
It includes some basic file information, including uid,gid, size, mode, type, time, and so on.
A link between vfs and the back-end concrete file system: the i_private field. * * it is used to pass some data structures used by specific file systems.
As for the i_op callback function, when constructing the inode, it is registered as the back-end file system function, such as ext4, and so on.
Question to consider: a common VFS layer defines a common inode for all file systems, called vfs inode, and the back-end file system also has its own special inode format, which is extended on top of vfs inode. How to get the inode of a specific file system through vfs inode?
As an example of the ext4 file system (because all file system routines are the same), the inode type of ext4 is struct ext4_inode_info.
Highlight: the method is actually very simple, this is a common (and unique) programming technique in the c language: strong conversion type. Vfs inode is assigned to the ext4_inode_info structure at birth, and the ext4_inode_info structure can be obtained directly by turning the type of the address of the vfs inode structure.
Struct ext4_inode_info {/ / ext4 inode characteristic field / /... / / important! Struct inode vfs_inode;}
For example, the known internal offsets of the inode address and vfs_inode fields are as follows:
The address of inode is 0xa89be0
Ext4_inode_info has an embedded field vfs_inode of type struct inode, which is offset to 64 bytes in the body of the structure.
You can get:
The address of ext4_inode_info is
(struct ext4_inode_info *) (0xa89be0-64)
The strong turn method uses a macro called container_of, as follows:
/ / strong transfer function static inline struct ext4_inode_info * EXT4_I (struct inode * inode) {return container_of (inode, struct ext4_inode_info, vfs_inode);} / strong transfer actual encapsulation # define container_of (ptr, type, member)\ (type *) ((char *) (ptr)-(char *) & ((type *) 0)-> member) # endif
So, do you understand?
When you assign an inode, you are actually assigning an ext4_inode_info structure that contains vfs inode, and then you can give out the address of the vfs_inode field. The VFS layer uses the address of inode. After the file system underneath is strongly converted, the outer inode address is used.
Take an example of the ext4 file system:
Static struct inode * ext4_alloc_inode (struct super_block * sb) {struct ext4_inode_info * ei; / / memory allocation, assigning the address of ext4_inode_info ei = kmem_cache_alloc (ext4_inode_cachep, GFP_NOFS); / / ext4_inode_info structure initializing / / returning the address of vfs_inode field return & ei- > vfs_inode;}
This is the inode address that vfs got.
Highlight: the memory of inode is allocated by the back-end file system, and the vfs inode structure is embedded in the inode of different file systems. Different layers use different addresses, the ext4 file system uses the address of the structure of ext4_inode_info, and the vfs layer uses the address of the ext4_inode_info.vfs_inode field.
This usage is common in C language programming and is a feature of C (when you think about it, this usage is similar to the implementation of object-oriented polymorphism).
Question to consider: how to understand the difference between structures such as vfs inode and ext2_inode_info,ext4_inode_info?
All the things that are common to file systems are abstracted to vfs inode, and things that differ from different file systems are put in their own inode structures.
Summary carding
When the user opens a file, the user only gets a fd handle, but the kernel does a lot of things. Combing it down, we get several key data structures, which are hierarchical and progressive. Let's simply sort out:
Process structure task_struct: represents a process entity, with each process corresponding to a task_struct structure, where task_struct.files points to a structure fiels_struct that manages open files
File table item management structure files_struct: used to manage the list of open files opened by the process, internally implemented as an array (static array and dynamic array combination). The fd returned to the user is just the numbered index of the array, and the index element is the file structure.
Files_struct is only subordinate to a process
3. File file structure: represents an open file with key fields: current file offset, inode structure address
Although the structure is triggered by the process, the file structure can be shared between processes.
4. Vfs inode structure: the file file structure points to the inode of vfs, which is a layer abstracted by the operating system to shield the inode differences of various file systems at the back end.
Inode is a resource at the file system level, which is irrelevant to the specific process.
5. Ext4 inode structure (refers to specific file system inode): the inode structure of the back-end file system, the custom structure of different file systems, ext2 has ext2_inode_info,ext4, ext4_inode_info,minix and minix_inode_info, these structures are embedded with a vfs inode structure with the same principle
Complete architectural diagram:
Thinking experiment
Now that we have a thorough understanding of the underlying meaning of fd, the so-called non-negative integer representation, we can prepare some IO thoughts.
What happens when files are read and written (IO)?
After the write operation is completed, the current file offset in the file file will increase the number of bytes written. If this causes the current file offset to exceed the current file length, the current length of the inode will be set to the current file offset (that is, the file becomes longer).
When the O_APPEND flag opens a file, the corresponding identity is set to the identity of the file's file state. Each time a write operation is performed on a file with an append write identity, the current file offset of file is first set to the file length in the inode structure, which makes the data written each time appended to the current end of the file (this operation provides atomic semantics to the user state)
If a file seek is located at the end of the current file, the current file offset in file is set to the current file length of inode
The value of the seek function modifies the current file offset in file without any Icano operation.
Each process pair has its own file, which contains the current file offset. When multiple processes write the same file, because a file IO will eventually fall on a global inode, this concurrent scenario may produce unexpected results for users.
Summary
Back to the beginning, what's the use of understanding the concept of fd?
All IO behaviors are carried out in the form of fd at the system level. No matter it is the same with IO, any language is the same, this is the most original thing. Only by understanding a series of structures related to fd can you be comfortable with Java.
A brief summary:
In terms of posture, the user's open file gets a non-negative handle fd, and then IO operations on the file are based on this fd.
The file descriptor fd is essentially an array index. Fd equals 5, which corresponds to the fifth element of the array, which is an array of all files opened by the process. The array element type is struct file.
The structure task_struct corresponds to an abstract process, and files_struct is the file array manager that the process manages to open by the process. Fd corresponds to the number of the array. Each open file is represented by a file structure and contains information such as current offset.
File structures can be shared among processes and belong to system-level resources. The same file may correspond to multiple file structures. There is an inode pointer inside the file that points to the inode of the file system.
Inode is a concept at the file system level, which is only managed and maintained by the file system without changing the process (file is created by the process, and the process open the same file will result in multiple file, pointing to the same inode)
Review the architecture diagram:
~ end ~
Postscript
The kernel does the most complex work, exposing only the simplest non-negative integer fd. So, most scenarios can use fd, but don't think too much about it. Of course, it would be best if you could take a closer look and know why it is. Sharing this article is a basic preparation, and I hope it will give you a different IO perspective.
Thank you for your reading, the above is the "Linux file descriptor fd is what" the content, after the study of this article, I believe you are the Linux file descriptor fd is what this problem has a deeper understanding, the specific use of the situation also needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.