Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The difference between Linux process and thread

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "the difference between Linux process and thread". In daily operation, I believe that many people have doubts about the difference between Linux process and thread. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "the difference between Linux process and thread". Next, please follow the editor to study!

0. First, take a brief look at processes and threads. As far as the operating system is concerned, the process is the core of the core, and the foundation of the whole modern operating system is to execute tasks in the unit of process. The management architecture of the system is also based on the process level. After pressing the power button, the computer begins a complex startup process. Here is a classic question: how does the computer start itself from rest when the power button is pressed? This article does not discuss the system startup process, ask readers to popularize science by themselves. The process of operating system startup can be described as the process of God's creation of all things. At the beginning of the period, there was no world, but God created the world, then created all things, then created man, and then shaped man's seven emotions and six desires. and then human society began to reproduce in accordance with the laws of nature. The stage in which the operating system starts the process is equivalent to the stage in which God created man. All the contents discussed in this article are the things after "God created man". The first process created is the numbering process, which is invisible at the operating system level, but it exists. Process # completes the loading and initial setup of the operating system, and then it creates process 1 (init), which is the "Jesus" of the operating system. Process 1 is sent by God to manage the entire operating system, so looking at the process tree with pstree shows that process 1 is at the root of the tree. After that, many of the management programs of the system were created by process 1 as a process, and also created a bridge to communicate with human beings-shell. Since then, humans can communicate with the operating system, write programs, and perform tasks.

And all this is based on the process. When each task (process) is created, the system allocates necessary resources such as storage space to it, and then creates a management node for the process in the kernel management area to later control and schedule the execution of the task.

When the process really enters the execution stage, it also needs to obtain the right to use CPU, all of which is controlled by the operating system, that is, the so-called scheduling, which starts the execution process of the process when various conditions are met (resources and CPU use rights are obtained).

In addition to CPU, a very important resource is memory. The system allocates unique storage space for each process, including, of course, other resources it particularly needs, such as the availability of external devices when writing, and so on. With the introduction above, we can make a brief summary of the process:

Process is a running activity of a program in a computer on a certain data set, the basic unit of resource allocation and scheduling of the system, and the basis of the structure of the operating system. Its execution requires the system to allocate resources to create entities before it can be carried out.

With the development of technology, when performing some small tasks, there is no need to allocate separate resources (multiple tasks share the same set of resources, for example, all child processes share the resources of the parent process). The implementation mechanism of the process will still be tedious to divide the resources, which results in waste and time-consuming. Later, a special multitasking technology was created-threads.

The characteristic of threads is that they can run without independent resources. This will greatly save resource overhead and processing time.

1. Well, the previous paragraph briefly introduces two nouns, process and thread. The goal of this article is to explain the difference between process and thread. Please consult the relevant materials about the technical implementation of the two.

Let's focus on the core of this article. Explain the difference between processes and threads from the following aspects.

1)。 The similarities between the two

2)。 The difference in the way of implementation

3)。 The difference of multitasking programming pattern

4)。 Different ways of communication between entities (between processes, threads, and threads)

5)。 Similarities and differences of control modes

6)。 Similarities and differences of resource management methods

7)。 Great differences in the generational relationship between individuals

8)。 Technical implementation difference between process pool and thread pool

Then we will explain them one by one.

1)。 The similarities between the two

Both processes and threads are technical means to achieve multi-task concurrency for programmers. Both of them can be scheduled independently, so there is no difference in function in multi-tasking environment. And both of them have their own entities and are the individual objects managed independently by the system. Therefore, at the system level, both of them can be controlled by technical means. And the states of both are very similar. Moreover, in multitasking programs, the scheduling of child processes (child threads) is generally equal to that of parent processes (parent threads).

In fact, before the version 2.4 of the Linux kernel, threads were implemented and managed entirely according to the process. A separate threaded implementation was not available until after the 2.6 kernel.

2)。 The difference in the way of implementation

Process is the basic unit of resource allocation and thread is the basic unit of scheduling.

This classic saying has been around for decades and can be seen in various operating system textbooks. Indeed, this is the remarkable difference between the two. Readers please pay attention to the word "basic". I believe that some readers will think to themselves when they see the first half of the sentence, "isn't it impossible to schedule the process?" Oh, no! Processes and threads can be scheduled, otherwise how to run multi-process programs!

However, a thread is a smaller schedulable unit, that is, as long as it reaches the thread level, the process can be scheduled naturally. It emphasizes that the object when allocating resources must be a process and will not allocate system-managed resources to a thread separately. If you want to run a task, if you want to get resources, at least there must be a process, other subtasks can be run as threads, and resource sharing is fine.

In short, the individuals of the process are completely independent, while the threads are interdependent. In a multi-process environment, the termination of any one process will not affect other processes. In a multithreaded environment, the parent thread terminates and all child threads are forced to terminate (without resources). The termination of any child thread generally does not affect other threads, unless the child thread executes the exit () system call. Any child thread executes exit (), and all threads die at the same time.

In fact, no one has written a program with threads but no processes. There is at least one main thread in a multithreaded program, and this main thread is actually a process with a main function. It is the process of the whole program, and all threads are its child threads. We usually refer to the main process with multithreading as the main thread.

From the system implementation point of view, the implementation of the process is to call the fork system call:

Pid_t fork (void)

The thread is implemented by calling the clone system call:

Int clone (int (* fn) (void *), void * child_stack, int flags, void * arg,...

/ * pid_t * ptid, struct user_desc * tls, pid_t * ctid * /

);

Where fork () copies all the resources of the parent process to the child process. The thread's clone copies only a small portion of the necessary resources. When you call clone, you can control the object to be copied through parameters. It can be said that fork implements an enhanced full version of clone. Later, of course, the operating system further optimized the fork implementation-replication-on-write technology. Copy only when the child process needs to copy resources (such as when the child process performs a write action to change the memory space of the parent process), otherwise it is not copied when the child process is created.

In practice, fork is used to create child process entities when writing multi-process programs. Instead of using clone system call, the thread library function is used to create the thread. The common thread libraries are Linux-Native thread library and POSI X-ray thread library. Among them, POSI X-ray program library is the most widely used. So readers see pthread_create rather than clone in multithreaded programs.

We know that a library is a collection of functions built on the operating system level, so its functions are provided by the operating system. From this, it is possible that the inner part of the thread library implements the call to clone. Whether it is the entity of a process or thread, it is an entity running on the operating system.

Finally, let's talk about vfork (). This is also a system call to create a new process. The process it creates does not copy the resource space of the parent process, but shares it, which means that vfork actually implements an entity close to a thread and manages it in a process manner. Also, the running time of the child process and the parent process of vfork () is determined: the parent process does not run until the child process "ends". Please pay attention to the word "end". Does not mean that the child process completes and exits, but when the child process returns. A child process that usually uses vfork () will then execute execv to start a completely new process whose process space is completely independent of the parent process, so there is no need to copy the parent process resource space. At this point, when the execv returns, the parent process thinks the child process is "over" and starts running on its own. In fact, the child process continues to run in a completely separate space. For example, in a chat program, a video player pops up. Why do you say the video player inherits the process space resources of your chat program? Does the video player want to pry into your chat privacy? Got it!

3)。 The difference of multitasking programming pattern

Because the processes are independent, when designing multi-process programs, there is a natural advantage when resources are managed independently, and threads are much more troublesome. For example, on the server side of a multitasking TCP program, the parent process executes accept () a client connection request and then returns a newly established connection descriptor DES. If fork () a child process, bring DES into the child process space to process the connection request, and the parent process continues to accept and waits for other client connection requests, so the design is very concise. And the parent process can use the same variable (val) to save the return value of accept (), because the child process will copy the val to its own space, and the parent process overrides the previous value does not affect the child process's work. But if you switch to multithreading, the parent thread cannot reuse a variable val to execute accept () multiple times. Because the child thread does not copy the storage space of the val, but uses the parent thread, if the parent thread accepts another client request to overwrite the value when reading the val, the child thread cannot continue to process the last connection task. The improvement is that the child thread immediately copies the value of the val in its own stack area, but the parent thread must ensure that the child thread copies the action before executing the new accept (). However, this is not easy to implement, because the scheduling of the child thread is independent of the parent thread, and the parent thread cannot know when the child thread has finished copying. This requires inter-thread communication, and the child thread actively notifies the parent thread after the replication is completed. In this way, the processing action of the parent thread must be incoherent, and the parent thread appears to be less efficient than the multi-process environment.

PS: here's a well-known interview question: can the multi-process TCP server swap the positions of fork () and accept ()? Readers are asked to think for themselves.

The lack of independence of resources seems to be a disadvantage, but in some cases it becomes an advantage. Multi-process environments are completely independent. In order to achieve communication, it is necessary to use inter-process communication, which is usually time-consuming. Threads, on the other hand, share data without any means. Of course, multiple child threads need to be mutually exclusive when performing write operations at the same time, otherwise the data will be written "dirty".

4)。 Different ways of communication between entities (between processes, threads, and threads)

There are several ways to communicate between processes:

a. Shared memory B. Message queuing C. Semaphore D. The famous pipeline E. Unnamed pipeline F. Signal

g. File H.socket

The mode of communication between threads can be followed by the above methods between processes, and there are several unique ways of their own:

a. The mutually exclusive quantity B. Spin lock C. Conditional variable D. Read-write lock E. Thread signal

g. Global variable

It is worth noting that the signals used for inter-thread communication cannot use inter-process signals because the signals are based on processes and threads belong to the same process space. Therefore, thread signals should be used.

To sum up, there are eight means of inter-process communication. There are 13 means of communication between threads.

Moreover, the mode of communication between processes either requires switching kernel context or accessing with peripherals (named pipes, files). So the speed will be slow. If the thread uses its own unique way of communication, it is basically completed in its own process space, and there is no switching, so the communication speed will be faster. In other words, the way of communication between processes and threads is not only different in type, but also different in speed.

In addition, the way of interspersed communication between processes and threads can be adopted except for signals.

Threads have kernel mode threads and user-level threads. For more information, please see my other blog post, "the essence of Linux threads".

5)。 Similarities and differences of control modes

The identity of the process and the thread are managed differently by ID. The ID of the process is of type pid_t, which is actually an int-type variable (that is, limited):

/ usr/include/unistd.h:260:typedef _ _ pid_t pid_t

/ usr/include/bits/types.h:126:# define _ _ STD_TYPE typedef

/ usr/include/bits/types.h:142:__STD_TYPE _ _ PID_T_TYPE _ _ pid_t

/ usr/include/bits/typesizes.h:53:#define _ _ PID_T_TYPE _ _ S32_TYPE

/ usr/include/bits/types.h:100:#define _ _ S32_TYPE int

In the whole system, the process ID is the unique identity, and the management of the process is realized through PID. Each time a process is created, the kernel creates a structure to store all the information about the process:

Note: the following code is from the Linux kernel 3.18.1

Include/linux/sched.h:1235:struct task_struct {

Volatile long state; / *-1 unrunnable, 0 runnable, > 0 stopped * /

Void * stack

...

Pid_t pid

Pid_t tgid

...

}

Each node that stores process information also holds its own PID. This ID is used when the process needs to be managed (such as sending signals). When the child process ends and needs to be reclaimed (the child process calls exit () to exit or the code is finished), it needs to be called by the wait () system. The unrecycled dying process will become a zombie process, and its process entity will no longer exist, but it will occupy PID resources, so recovery is necessary.

The thread's ID is a long variable:

/ usr/include/bits/pthreadtypes.h:60:typedef unsigned long int pthread_t

Its scope is much larger and its management style is different. Thread ID generally works in this process space, but the system also needs to record its information when managing threads. The way to do this is to create a kernel-state thread in the kernel, that is, each user-created thread has a kernel-state thread corresponding to it. But this correspondence is not one-to-one, but many-to-one, that is, a kernel thread can correspond to multiple user-level threads. Readers are also invited to refer to the essence of Linux threading to popularize related concepts. The blog address is posted here:

Http://my.oschina.net/cnyinlinux/blog/367910

For a thread, to actively terminate, you need to call pthread_exit (), and the main thread needs to call pthread_join () to reclaim (provided that the thread is not detached, please refer to the thread's "detach property" for related concepts). Thread signals are also sent through thread ID.

6)。 Similarities and differences of resource management methods

The process itself is the basic unit of resource allocation, so its resources are independent. If there are shared resources among multiple processes, it is necessary to use inter-process communication, such as shared memory. The shared data is placed in the shared memory and can be accessed by everyone. In order to ensure the security of data writing, semaphores are used together. In general, shared memory is used with semaphores. Message queuing is different, because the sending and receiving of messages is an atomic operation, so mutual exclusion is realized automatically, and it is safe to use it alone.

To use shared resources between threads, you don't need to use shared memory, just use global variables, or malloc () applies for memory dynamically. It seems convenient and direct. And mutexes use mutexes in the same process space, so there is also an advantage in efficiency.

In practice, in order to make the resources in the program fully organized, shared memory is also used to store core data. This approach is used for both processes and threads. One of the reasons is that shared memory is a resource separated from the process, and if the process terminates unexpectedly, shared memory can exist independently and will not be reclaimed (whether it is recycled or not is implemented by user programming). The space of the process is also reclaimed by the system at the moment the process crashes. Although there is a coredump mechanism, it can only be a limited compensation. The shared memory process is saved completely after down, so that it can be used to analyze the cause of the failure of the program. At the same time, the running valuable data is not lost, and the program can continue to deal with previously unfinished tasks after restarting, which is another advantage of using shared memory.

To sum up, the mode of communication between processes is separated from the process itself and is visible to the whole system. In this way, a single point of failure of the process does not corrupt the data, which is not necessarily an advantage. For example, if the process locks the semaphore before it crashes, restarts after the crash, and then enters the running state again, locking directly at this time may result in a deadlock and the program can no longer run. For example, shared memory is system-wide visible, and if your process resources are misread and written by others, the consequences must be unwanted. Therefore, each has its own advantages and disadvantages, and the key lies in how to consider it in programming and how to avoid it technically. It's a matter of programming skills and experience again.

7)。 Great differences in the generational relationship between individuals

The backup relationship of the process is strict, and before the end of the parent process, all child processes respect the father-son relationship, that is to say, if A creates B, then An and B have a father-son relationship, and B creates C, then B and C also have a father-son relationship, and An and C constitute a grandchild relationship, that is to say, C is the grandchild process of A. Using the pstree command to print the process tree on the system, you can clearly see the backup relationship.

The relationship between multithreads is not so strict, whether the parent thread or the child thread creates a new thread, it is the resource of the shared parent thread, so it can be said that it is the child thread of the parent thread, that is, there is only one parent thread. The other threads are all child threads of the parent thread.

8)。 Technical implementation difference between process pool and thread pool

As we all know, the creation of processes and threads takes time, and there is an upper limit on the number of processes and threads that the system can bear, so that if the business needs to dynamically create child processes or threads while running, if the system cannot afford to create it immediately, it will inevitably affect the business. In summary, smart programmers have invented a new method-pooling.

When the program starts, create some child processes or threads in advance so that you can use them directly when needed. This is the concept of "having more children and more trees" among the old population. The program just started to run, there are not so many service requests, there must be a large number of processes or threads idle, at this time generally let them "hibernate", so that it does not consume resources, otherwise a lot of children's food is also a burden. The approach is different for processes and threads. In addition, when you have a task to assign to those children, the means are also different. Let's explain it separately.

Process pool

First create a batch of processes, you have to manage, that is, you have to separately save the process ID, you can use arrays, or linked lists. It is recommended to use an array, so that you can find a thread within the constant, and now that you have made a process pool, you can estimate in advance how many processes are appropriate to produce, and generally will not dynamically extend. Even if you want to extend dynamically, you can estimate the range and make a large enough array in advance. For nothing else, but for quick response. The purpose of the wrong process pool is also for efficiency.

The next step is to let idle processes hibernate, you can let them pause () suspend, you can also suspend with semaphores, you can also block with IPC, there are many ways to analyze their advantages and disadvantages according to the actual situation.

Then there is the assignment, and when you have a task, you have to let him work. Awaken the process and let it start from where? Interprocess communication must be used, such as a signal to wake it up and then let it read the task in a pre-specified place, which can be done with a function pointer, and set a code snippet pointer in the agreed place to do what it does. This just tells it how to do it, doesn't say what to do (data conditions), and then sets up the data to be processed through shared memory, so that the child process knows what to do. After the work is done, there is another interprocess communication and then continues to hibernate on its own, and the parent process knows that the child has finished and harvests the results.

Finally, reclaim the child process at the end, send a signal to each process to wake up, change the activation state to let it end actively, and then wait () one by one.

Thread pool

The idea of a thread pool is similar to the above, except that it is more lightweight, so you don't have to wait for additional resources to schedule.

To block a thread, use a condition variable. When the parent thread changes the condition when it needs to work, the child thread is activated.

There is no need to elaborate on the mode of communication between threads, which can be achieved without tedious communication, which is more efficient than that between processes.

The thread changes the condition itself after it has finished, so that the parent thread knows it is time to harvest the harvest.

At the end of the whole program, change the conditions one by one and change the activation state to let the child threads end, and finally recycle them one by one.

At this point, the study on "the difference between Linux processes and threads" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report