Knowledge of LINUX operating system: detailed explanation of processes and threads 07/09 Update SLTechnology News&Howtos

Knowledge of LINUX operating system: detailed explanation of processes and threads

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

When a program starts to execute, the part of it in memory is called a process during the period from the beginning of execution to the end of execution.

Linux is a multitasking operating system, that is, multiple processes can execute at the same time. The single CPU computers commonly used by all of us can actually execute only one instruction at a time.

So how does Linux achieve simultaneous execution of multiple processes? It turns out that Linux uses a method called "process scheduling". First, each process is assigned a certain running time, which is usually very short, as short as milliseconds, and then according to certain rules, one of the many processes is selected to run, and the other processes wait temporarily, when the running process runs out of time, or exits after execution, or is suspended for some reason. Linux reschedules and picks a process to run, because each process takes a short period of time, as if multiple processes were running at the same time from the consumer's point of view.

In Linux, each process is assigned a data structure called a process control block (PCB) when it is created. PCB contains a lot of important information for system scheduling and process ability execution, the most important of which is the ID of the process. The ID of the process, also known as the process identifier, is a non-negative integer, which is the only sign of a process in the Linux operational system. On the most commonly used I386 architecture, the value of a non-negative integer is 0x32767, which is also the process ID that we may get, which is the × × number of the process.

The generation of zombie processes

A zombie process is a process that has ended, but has not been removed from the process table. Too many zombie processes can lead to full entries in the process table, which in turn causes the system to crash, but does not take up system resources.

In the state of the process, the zombie process is a very special one. It has given up almost all memory space, has no executable code, and cannot be scheduled. It only keeps a location in the process list to record the exit status of the process and other information for other processes to collect. In addition, the zombie process no longer takes up any memory space, and it needs its parent process to collect it. If the parent process does not install the SIGCHLD signal handling function to call wait or waitpid () to wait for the child process to finish, and does not ignore the signal displayed, then it has been in a zombie state. If the parent process ends, then the init process will automatically take over the child process, collect the body for it, and it can still be cleared. But if the parent process is a loop and does not end, then the child process is always in a zombie state.

The cause of the zombie process:

Each Linux process has an entry point (Entry) in the process table, and all the information used by the core program to execute the process is stored at the entry point. When you use the ps command to view the process information in the system, you see the relevant data in the process table.

When the fork system call establishes a new process, the core process assigns an entry point to the new process in the process table, and then stores the relevant information in the process table corresponding to the entry point, one of which is the identification number of the parent process.

When the process has finished its life cycle, it will execute the exit () system call, and the data in the original process table will be replaced by the process's exit code, the CPU time used for execution, and so on, which will be retained until the system passes it to its parent process. Thus it can be seen that the occurrence time of the zombie process is after the termination of the subroutine, but before the parent process reads the data.

How to avoid zombie processes

1. The parent process waits for the child process to end through functions such as wait and waitpid, which will cause the parent process to hang.

2. If the parent process is busy, you can use the signal function to install handler for SIGCHLD, because after the child process ends, the parent process will receive the signal and you can call wait collection in handler.

3. If the parent process does not care about when the child process ends, you can use "singal (SIGCHLD), SIG_IGN" to inform the kernel that it is not interested in the end of the child process, then when the child process ends, the kernel will reclaim and no longer send a signal to the parent process.

4, there are some skills, that is, fork () twice, the parent process fork a child process, and then continue to work, the child process fork a grandchild process and then quit, then the grandchild process is taken over by init, after the end of the grandchild process, init will be recycled, but the child process recovery has to be done by themselves.

Process contention thread

Let's take an example. Multithreading is a plane traffic system at a crossroads with low cost, but there are many traffic lights and traffic jams, and multi-process is an overpass. Although the cost is high and the uphill and downhill consume more fuel, there are no traffic jams. This is an abstract concept. I believe you will feel this way after reading it.

Process and thread are two relative concepts. Generally speaking, a process can define an instance of a program (Instance). In Win32, the process does not execute anything, it just occupies the address space used by the application. In order for the process to do some work, the process must own at least one thread, which is responsible for containing the code in the process address space.

In fact, a process can contain several threads that can execute code in the process address space at the same time. To do this, each thread has its own set of CPU registers and stacks. At least one thread in each process is executing code in its address space. If no thread executes the code in the process address space, there is no reason for the process to continue to exist, and the system automatically clears the process and its address space.

The implementation principle of multithreading

When you create a process, its first thread is called the Primary thread, which is automatically generated by the system. Additional threads can then be generated by this main thread, and these threads can generate more threads. When running a multithreaded program, on the face of it, these threads seem to be running at the same time. This is not the case. In order to run all of these threads, the operating system schedules some CPU time for each independent thread.

A single CPU operating system provides time slices (Quantum) to threads in the way of time slice rotation. Each thread hands over control after using time slices, and the system allocates CPU time slices to the next thread. Because each time slice is short enough, it gives the illusion that these threads are running at the same time. The only purpose of creating additional threads is to make the most of CPU time.

The problem of multithreading

Using multithreaded programming can not only give programmers a lot of flexibility, but also make it easier to solve problems that previously require complex skills. However, you should not artificially divide the written program into fragments and let these fragments execute on their own threads, which is not the right way to develop applications.

Threads are useful, but when using threads, they may create new problems while solving old problems. For example, you want to develop a word processor and want the print function to be executed as a separate thread. This sounds like a good idea because when printing, the user can immediately go back and start editing the document.

But in this way, the data in the document may be modified when the document is printed, and the printed result is no longer what is expected. It may be best not to put the printing function in a separate thread, but if you must use multithreading, you can also consider using the following methods: the first way is to lock the document being printed and let the user edit other documents, so that the document will not be modified until the end of printing. Another method that might be more efficient is to copy the document to a temporary file, print the contents of the temporary file, and allow the user to make changes to the original document.

When the temporary file containing the document is printed, delete the temporary file. As can be seen from the above analysis, multithreading may also bring new problems while helping to solve problems. So it's necessary to figure out when you need to create multithreading and when you don't need it. In general, multithreading is often used in situations where background calculations or logical judgments are needed at the same time as foreground operations.

Classification of threads

In MFC, threads are divided into two categories, namely worker threads and user interface threads. If a thread only completes background calculations and does not need to interact with the user, you can use worker threads; if you need to create a thread that handles the user interface, you should use user interface threads. The main difference between the two is that the MFC framework adds a message loop to the user interface thread so that the user interface thread can process messages in its own message queue.

From this point of view, if you need to do some simple calculations in the background (such as recalculating the spreadsheet), you should first consider using the worker thread, and when the background thread needs to deal with more complex tasks, specifically, when the execution process of the background thread will change with the actual situation, you should use the user interface thread so that you can respond to different messages.

Priority of the thread

When the system needs to execute multiple processes or threads simultaneously, it is sometimes necessary to specify the priority of the thread. The priority of a thread generally refers to the priority of the thread, that is, the combination of the relative priority of the thread relative to the process and the priority of the process that contains the thread.

The operating system arranges all active threads based on priority, and each thread in the system is assigned a priority, ranging from 0 to 31. At run time, the system simply assigns CPU time to the first thread with priority 31, and after the thread's time slice ends, the system allocates CPU time to the next thread with priority 31. When there is no thread with priority 31, the system starts assigning CPU time to the thread with priority 30, and so on.

In addition to programmers changing the priority of threads in the program, sometimes the system will automatically change the priority of threads during execution, in order to ensure that the system is highly responsive to end users. For example, when a user presses a key on the keyboard, the system temporarily increases the priority of threads processing WM_KEYDOWN messages by 2 to 3. CPU executes the thread according to a complete time slice, and when the time slice is finished, the priority of the thread is reduced by 1.

Synchronization of threads

Another very important issue when using multithreaded programming is thread synchronization. The so-called thread synchronization refers to the ability of threads to avoid destroying their own data when communicating with each other. The synchronization problem is caused by the CPU time slice allocation of the Win32 system mentioned earlier.

Although only one thread takes up CPU (single CPU) time at some point, there is no way to know when and where threads are interrupted, so it is particularly important to ensure that threads do not destroy each other's data. In MFC, four synchronization objects can be used to ensure that multiple threads run simultaneously. They are critical area object (CCriticalSection), mutex object (CMutex), semaphore object (CS emaphore) and event object (CEvent).

Among these objects, the critical section object is the easiest to use, and its disadvantage is that it can only synchronize threads in the same process. In addition, there is a basic method, this paper is called linearization method, that is, in the programming process, the write operation of a certain data is completed in a single thread. In this way, because code in the same thread is always executed sequentially, it is not possible to rewrite data at the same time.

Summary:

In a thread (as opposed to a process), a thread is a concept closer to the executor, which can share data with other threads in the same process, but has its own stack space and independent execution sequence. Both of them can improve the concurrency of the program, the efficiency of running the program and the response time.

Threads and processes have their own advantages and disadvantages: thread execution overhead is small, but it is not conducive to resource management and protection, while processes are just the opposite. The fundamental difference is that each process has its own address space with multiple processes, while threads share the address space. In terms of speed, threads generate fast, communicate quickly between threads, switch quickly, and so on, because they are in the same address space.

In terms of resource utilization: the resource ratio of threads is also better because they are in the same address space. In terms of synchronization: threads need to use synchronization mechanisms when using common variables / memory, because they are in the same address space: the child process is a copy of the parent process, and the child process gets a copy of the parent process data space, heap, and stack.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.