In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article will explain in detail about the process and threading model in UNIX. The content of the article is of high quality, so the editor will share it with you for reference. I hope you will have some understanding of the relevant knowledge after reading this article.
The tradition of UNIX tends to hand over a task to a process, but there is not only one thread inside a task, for example, all members of a company are doing the same thing, but everyone is only responsible for part of it. After the granularity is reduced, everything can be done at the same time, anyway, everyone shares all the resources. So there are threads. Threads are actually different threads that share resources. The semantics of threads are different from those of plain UNIX processes.
0. Original process model-the famous fork call
The naive UNIX process relies on the famous fork call, which makes the UNIX process different from the Windows process, and because of this fork call, there is no room for compatibility between the two. The origin of this fork call has a long history. It existed in large operating systems before UNIX. In 1969, when UNIX first appeared, there was no fork call. At that time, there were two fixed processes connecting two terminals. When fork calls are introduced, the number of processes increases rapidly. Note that there are no exec calls yet!
Before you understand the philosophy behind fork, take a look at what fork is. Fork is a fork, which gradually bifurcates from the same fork handle to a fork, similar to the way that gives birth to one, one gives birth to two, two gives birth to three, and three gives birth to all things. We see that with fork, countless processes can theoretically be generated, all of which can be traced up to the same root! Why did UNIX adopt this model? First of all, we need to understand what process means when there is no concept of "executable file".
Imagine how the program was entered into the computer in the first place. Today, they naturally exist on disk, as "executable files" have been deeply rooted in the hearts of the people, but in the early 1950-1960s, programs were recorded on the spot, through the original paper tape or carrying very heavy tape, the file system has no concept. The contents of the whole paper tape and tape are the programs to be executed by the computer. After execution, if you want to execute another program, you have to change the medium. Of course, people write a program to do something more than once, so if multiple "processes" can execute programs on paper tape / tape at the same time, the throughput of the system will be greatly improved. Note that multiple processes execute the same program! This is the most simple process model of time-sharing system. Fork in Berkeley time-sharing system came into being! Fork provides the means to copy the current execution flow, and all child processes out of fork can easily execute the same code.
This famous fork call has deeply influenced how people interpret time-sharing systems! Naturally, the naive UNIX was introduced in the early 1970s, saying that fork calls are famous because it has directly influenced the process model of UNIX since it followed UNIX (and UNIX-like, such as Linux). Now summarize why UNIX uses fork calls to generate processes. We know that it is difficult to go from 0 to 1, from 1 to 2 is relatively easy and difficult, from 2 to 3. It's very simple. This is Dao Sheng Yi. Give birth to all things! There were already two processes in UNIX in 1969, and it was super simple to use fork to achieve two, three, and all things, so, perhaps coincidentally, the fork of the earlier Berkeley time-sharing system happened to be there, and Thomas introduced UNIX.
I would like to talk about why all things are born three times instead of two. Tao Sheng Yi this is the most difficult, we all know. 0 and 1 are two extremely special numbers, and 0 is even more special. 2 is also special, but 3 is very general, so why is 2 special? I don't want to use game theory to describe it, just to give an example, when two people are together and smell a fart, everyone must be 100% sure who put it. If it is me, then I must know that if I don't put it, it must be the other party. Of course, there is a chance that two people will put it together. But when three people are together, the two people other than the one who actually farted can't tell who farted. This is the essential difference between 3 and 0, 1, 1, and 2. So give birth to all things.
1.UNIX process model
At the beginning of UNIX, the concept of process was the same as that of its prehistoric predecessors, when the file system was quite immature, and programmers focused on performing tasks that were not easily written rather than writing the task itself (first, not so much demand, second, information storage is a problem, there is no Internet, you can compare today's AppStore...). The fork call directly organizes the process of UNIX into tree, so:
Swap/sched process 1.0 and init process 1 have a special status.
two。 A model of who fork who wait and recycle is formed, which is very important in the tree organization and is convenient for resource recovery.
3. If the parent process exits first and all child processes are passed to the init, the init must exist and cannot be exited. In short, no process can be separated from the entire process tree.
In short, a naive UNIX process is an executable object at a node in a tree. Note that it is an executable object.
The UNIX process model is built on the above basic principles. In addition, on the periphery, UNIX continues the shell idea of the Multics project, opening a shell for each terminal. Shell is the second important feature of UNIX (if file abstraction is not mentioned first!) Which requires the process out of fork to exec a new and different execution flow. Judging from the history of the above fork/exec, they are separate from the beginning, which builds a complete UNIX process model: fork+exec.
Let's take a look at what UNIX's process model can build. The early UNIX organized the process, working with the concept of terminal. UNIX gave the concept of process group and session.
A process group is a collection of associated processes, such as commands connected by pipe characters. It is more likely that the relationship between them is explained by the user. A session is a collection of process groups, and the meaning of a session is that users can easily allow multiple process groups to share terminal access in some form. Because the person sitting in front of a terminal is a person, he performs an operation one at a time, and it is a question to whom the operation works. He can create a session in which multiple process groups are created, and he manipulates it by turning different process groups into foreground process groups in his own way. The concept of session and process group can be understood as a time-sharing system controlled by the operator, but the dispatcher is no longer the operating system, but the operator in front of the terminal. Just as each CPU can have only one process running at a time, each terminal session can only have one foreground process group at the same time.
We can see that the process organization built by the UNIX process model naturally forms a hierarchical time-sharing scheduling hierarchy, and the lowest level is the process, which is scheduled by the operating system kernel, and then the process group, which cooperates to complete a task, organizes multiple processes, and is scheduled by the operator who creates the session. At the bottom of this hierarchical hierarchy, all processes are organized into a tree. This is the picture of the complete UNIX process model. The reason why you can build such a beautiful picture, fork+exec is the basic principle, between fork and exec, give the process more control over their own space, how to control which group or session they belong to, it is up to the process to decide, not the caller. For an example, please take a look at Win32 API's CreateProcess. Now that the trouble comes and the thread appears, what should I do? If you want to know how Linux makes history, please skip to the end.
The reason I didn't mention any UNIX version of the implementation of the above build is that ideas are far more important than implementation, and implementation will drag you down to build a new model. At the end of this article, I will show how Linux reconciles the semantics between different process models and confirms the advanced nature of the UNIX process model.
two。 Provide a process model of resource environment
Although Windows NT draws lessons from the idea of UNIX in many aspects, it adopts a completely different idea in the process model. In the 1990s, when Windows NT was born, applications began to blossom everywhere, and the file system has been very mature. The concept of executable file continues from the MS-DOS era. (in fact, the UNIXv6 version has the concept of executable file. After UNIX introduced exec calls, executable files are only the backup resources of the process, that's all). People can develop a large number of different programs based on Win32 API and let them run separately. If you want a program to execute multiple times, just click on it a few times.
In such an era, as mentioned at the beginning of this article, the granularity of execution is refined to the interior of a program. In order for an application to accomplish a task, it needs to do several different things, which may need to be done at the same time, similar to the overall method in mathematics. Processes, which can also be equated with named resource collections extracted from executable files in WinNT, are no longer suitable as executable objects, and truly executable objects become threads. The process at this time only provides a resource environment, and threads use these resources that can be shared to accomplish specific things together. This process model that provides the resource environment is called the resource model.
In this section, I use WinNT as an example to describe another process model, just because it is relatively pure as a representative of this model. In fact, many versions of UNIX are also trying to integrate the fork model and the resource model in an attempt to inherit the semantics of UNIX and implement multithreaded scheduling.
3. The harmony of the two models
First of all, the conflict between the fork model and the resource model is obvious, which is typical in the following two aspects:
1. Signal problem: which thread performs signal processing
2.fork semantics: assuming that a thread is already running and fork is executed in it, how to interpret which execution flow of fork
The first problem is easier to solve, stipulating that if the signal is not caused by the exception thrown by the thread itself, it will be handled by any thread, otherwise it will be handled by the thread that throws the exception. The second problem is tricky, and the tricky part is how a UNIX implements the process model.
Maintain a linked list in the process structure or u zone and save the thread control block pointer! Oh,NO! What's going on?! How could UNIX forget that the executable object is a process? In this way, doesn't the process become a container for threads? I went straight to the resource model, but I am really a pure UNIX! Is it a good idea to design LWP? Maybe, but it introduces a lot of high-level abstractions, which seems complicated. What if a new program is introduced in a few years' time? In short, any method of modifying the naive UNIX process model is not a good one. What about user library-level threads? This does not belong to the kernel, but it shows the incompetence of the kernel.
Put aside the realization and return to the mind. Let's take a look at the relationship among processes, process groups, and conversations. the most basic executable object is a process. The above process groups and sessions encapsulate a set of processes in some form of organization. Each collection has a series of resources that can be shared by the processes in this collection. For example, the environment variable of the session, the command line variable of the process group, and so on, what is the thread? isn't a thread a collection of shared memory addresses for a set of execution flows? Do you understand anything? If we don't understand, we can change the process in the UNIX process model picture into a scheduling entity, just go down one level on the basis of this picture, and the thread is naturally supported:
Threads, thread collections, process groups, sessions.
In the words of the dispatching entity, it is:
Scheduling entity, scheduling entity group, process group, session.
Just as there can be only one process in a process group, and the group ID is equal to the process ID, there can also be only one thread in the process, and the thread ID is the process ID. Everything is unified into the picture of the UNIX process model. If there is only one thread in a thread collection, then we call it a process. If we have more than one thread, we call the collection a process, and the elements of the collection are called threads. In fact, at this moment, it doesn't matter how you call it.
What's missing now? What is missing is how to implement thread collections to share memory address space. There is no doubt that the traditional UNIX fork model cannot do this because it does not have any parameters to indicate that this behavior is implemented. So you need to modify the fork semantics slightly to introduce a clone call with parameters that the user can control:
Int clone (int (* fn) (void *), void * child_stack, int flags, void * arg,... / * pid_t * ptid, struct user_desc * tls, pid_t * ctid * /)
Users can not only control the location of the user stack, but also have many flags to choose from. If you want to share the caller's memory, the CLONE_VM flag is undoubtedly needed. Of course, you will not only need this flag for clone threads, but you can refer to the latest NPTL specification for details.
Implementation of UNIX process model based on 4.Linux
The threading support of the Linux implementation is so cool that it hardly touches any existing task_struct structures or changes any existing fork semantics. It just introduces a PID type called TGID, the process group ID. The executable object in Linux is task_struct, and there is only task_struct. Each task_struct has more than one ID, and the task_struct is located to a process or a thread of a process according to the different interpretation of these ID, that is, different types. The ID type is as follows:
Enum pid_type {PIDTYPE_PID, PIDTYPE_TGID, PIDTYPE_PGID, PIDTYPE_SID, PIDTYPE_MAX}
Where:
PIDTYPE_PID: scheduling entity ID. If the task_struct is a thread of a process, it is a thread ID, and if the process has only one thread, it is also a process ID
The PIDTYPE_TGID,: thread collection ID. If the process to which the task_struct belongs has multiple threads, it is the process ID, and if there is only one thread, it is equivalent to PIDTYPE_PID
PIDTYPE_PGID: process group ID. Do not explain
PIDTYPE_SID: session ID. No explanation.
According to the above explanation, whether a process has one thread or multiple threads, its process ID, or PID, is equal to the ID identified by PIDTYPE_TGID. The ID of PIDTYPE_PID logo is explained differently according to the specific situation. The specific implementation is as follows:
1. Each task_struct has a unique ID identifier within this PID namespace, which is assigned to both the process ID and the thread ID during initialization
two。 If the task_struct is the first thread of a process, that is, created by a standard fork call, leave the initialization value of 1 unchanged
3. If the task_struct is not the first thread of a process, that is, it is created by a clone call with CLONE_VM, etc., then the ID identified by the current caller's PIDTYPE_TGID overrides the ID identified by the new task_struct 's PIDTYPE_TGID
4. With regard to the setting of process group ID and session ID, there are special system calls such as setpgid, setpgrp,setsid and so on, which are very similar to the above processes and threads.
5. There are four pid structures in each task_struct, concatenating these pid structures rather than the task_struct itself with a linked list, indicating who is the process, who is the thread of which process, and who is the head member of which process group.
In short, in Linux, both threads and processes use task_struct as a structure, and the connection method of its PID type value indicates how to build a picture of the UNIX process model, which is really cool. Personally, I think it is more intuitive to use a picture to show the connection, and the text expression is weak in this respect:
If you understand the figure above, you will see how handsome Linux is in implementing the UNIX process model. Such a simplified model and such a concise implementation of Linux just match, somehow led to such a complex direction by the traditional UNIX. The implementation of UNIX has obvious insight into the hierarchical structure of the Linux process model, that is, processes, process groups, and conversations. if you extend one more level and move task_struct down to the bottom, you will basically draw the above picture.
About how the process and threading model in UNIX is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.