In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the relevant knowledge of "how to understand Python processes, threads, and collaborations". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
What is a process?
Process-the abstract concept provided by the operating system is the basic unit of resource allocation and scheduling of the system and the basis of the structure of the operating system. A program is a description of instructions, data and its organizational form, and a process is the entity of a program. The program itself has no life cycle, it is just some instructions on the disk, once the program is run, it is a process.
When the program needs to run, the operating system records the code and all static data in memory and in the address space of the process (each process has a unique address space, as shown in the following figure). By creating and initializing the stack (local variables, function parameters and return addresses), allocating heap memory and IO-related tasks, the operating system starts the program when the preparatory work is complete OS transfers control of CPU to the newly created process, which starts running.
The process is controlled and managed by the operating system through PCB (Processing Control Block). PCB is usually a continuous memory area in the memory occupation area of the system, which stores all the information (process identification number, process status, process priority, file system pointer and the contents of each register, etc.) that the operating system needs to describe the process and control the process. The PCB of the process is the only entity that the system perceives the process.
A process has at least five basic states: initial state, execution state, waiting (blocking) state, ready state, and termination state.
Initial state: the process has just been created and cannot be executed because other processes are in possession of CPU, so it can only be in the initial state.
Execution status: there can be only one process in the execution state at any one time.
Ready state: only those in the ready state can reach the execution state after scheduling.
Wait status: the process waits for an event to complete
Stop status: end of process
Handover between processes
Whether in a multi-core or single-core system, a CPU appears to be executing multiple processes concurrently, which is achieved by processor switching between processes.
The operating system calls the mechanism of exchanging CPU control between different processes context switching (context switch), that is, saving the context of the current process, restoring the context of the new process, and then transferring the CPU control to the new process, and the new process will start from where it was last stopped. Therefore, processes take turns using CPU, and CPU is shared by several processes, using a scheduling algorithm to decide when to stop one process and instead provide services to another process.
The case of single-core CPU with dual processes
The process performs context switching and takes turns using CPU resources when it encounters a direct and specific mechanism and in the event of an Imax O interruption
The case of dual-core CPU dual processes
Each process has an exclusive CPU core resource, and the CPU is in a blocking state when processing the Imax O request.
Inter-process data sharing
Processes in the system share CPU and main memory resources with other processes. In order to better manage main memory, the system now provides an abstract concept of main memory, that is, virtual memory (VM). It is an abstract concept that provides every process with the illusion that each process is using main memory exclusively.
Virtual memory provides three main capabilities:
Think of main memory as a cache stored on disk, save only the active area in main memory, and transfer data back and forth between disk and main memory as needed, in this way, main memory can be used more efficiently
Provides a consistent address space for each process, simplifying memory management
Protects the address space of each process from being destroyed by other processes
Because processes have their own exclusive virtual address space, CPU translates virtual addresses into real physical addresses through address translation, and each process can only access its own address space. Therefore, without the assistance of other mechanisms (inter-process communication), data cannot be shared between processes.
Take multiprocessing in python as an example
Import multiprocessing import threading import time n = 0 def count (num): global n for i in range (100000): n + = I print ("Process {0}: n = {1}, id (n) = {2}" .format (num, n) Id (n)) if _ _ name__ = ='_ main__': start_time = time.time () process = list () for i in range (5): P = multiprocessing.Process (target=count, args= (I,)) # Test multiple processes using # p = threading.Thread (target=count, args= (I) ) # Test multithreading using process.append (p) for p in process: p.start () for p in process: p.join () print ("Main:n= {0}, id (n) = {1}" .format (n, id (n)) end_time = time.time () print ("Total time: {0}" .format (end_time-start_time))
Result
Process 1Process Process 4999950000 Process id (n) = 139854202072440 Process 4999950000Magi id (n) = 139854329146064 Main:n=0,id (n) = 139854202072400 Process 4149950000MagneID (n) = 139854202072400 Process 4149950000Magneid (n) = 139854202069320 Main:n=0,id (n) = 9462720 Total time:0.03138256072998047
The variable n has a unique address space in both the process p {0pm 1pm 2jm 3je 4} and the main process (main).
What is a thread?
Thread-also an abstract concept provided by the operating system, is a single sequential control flow in program execution, the smallest unit of program execution flow, and the basic unit of processor scheduling and dispatch. A process can have one or more threads, and multiple threads in the same process will share all the system resources in the process, such as virtual address space, file descriptors, signal processing, and so on. However, multiple threads in the same process have their own call stacks and thread local storage (as shown in the following figure).
The system uses PCB to control and manage the process. Similarly, the system assigns a thread control block TCB (Thread Control Block) to the thread, and records all the information used to control and manage the thread in the thread's control block, and the TCB usually includes:
Thread marker
A set of registers
Thread running state
Priority
Thread proprietary storage area
Signal shielding
Like processes, threads have five states: initial state, execution state, wait (blocking) state, ready state, and termination state. Switching between threads, like processes, requires context switching, which is not discussed here.
There are many similarities between processes and threads, so what is the difference between them?
Process VS thread
A process is an independent unit of resource allocation and scheduling. Processes have a complete virtual address space, and when a process switch occurs, different processes have different virtual address spaces. However, multiple threads of the same process can share the same address space.
Thread is the basic unit of CPU scheduling, and a process contains several threads.
Threads are smaller than processes and basically have no system resources. The creation and destruction of a thread takes much less time than a process.
Because the address space can be shared between threads, synchronization and mutex operations need to be considered.
The unexpected termination of one thread will reflect the normal operation of the whole process, but the unexpected termination of one process will not reflect the operation of other processes. Therefore, multi-process programs are more secure.
In short, multi-process programs have high security, high process switching overhead and low efficiency; multi-threaded programs have high maintenance costs, low thread switching overhead and high efficiency. (multithreading of python is pseudo-multithreading, which will be described in more detail below)
What is a cooperative process?
Coroutine (also known as microthreading) is a more lightweight existence than threads, which is not managed by the operating system kernel, but completely controlled by the program. The relationship between the co-program and the thread and the process is shown in the following figure.
The co-program can be compared to a subroutine, but during execution, the subroutine can be interrupted internally, and then other subroutines are executed, and then come back at an appropriate time for further execution. Switching between protocols does not need to involve any system calls or any blocking calls
The co-program is executed in only one thread, and it is a switch between subroutines that occurs in the user mode. Moreover, the blocking state of the thread is completed by the operating system kernel and occurs in the kernel state, so the co-program saves the overhead of thread creation and switching compared with threads.
There is no simultaneous write variable conflict in the collaboration program, so there is no need for synchronization primitives to guard key blocks, such as mutexes, semaphores, and so on, and does not require support from the operating system.
The co-program is suitable for scenarios where IO blocking requires a lot of concurrency. When IO blocking occurs, it is scheduled by the scheduler of the co-program. By yield the data flow and recording the data on the current stack, the stack is immediately restored through the thread after blocking, and the blocking result is put on the thread to run.
Next, we will analyze how to choose to use processes, threads and co-programs in Python in different application scenarios.
How to choose?
Before comparing the differences between the three for different scenarios, we first need to introduce the multithreading of python (which has been criticized by programmers as "fake" multithreading).
So why is multithreading in Python considered "pseudo" multithreading?
In the above multiprocessing example, p=multiprocessing.Process (target=count,args= (I,)) is p=threading.Thread (target=count,args= (I,)), and the other is the same. The result is as follows:
In order to reduce code redundancy and article length, please ignore the problem of naming and printing irregularities
Process 0Process 5756690257 Process (n) = 140103573185600 Process 11829507727 Main:n=17812587459,id (n) = 140103573185600 Main:n=17812587459,id (n) = 140103573185600 Process 4ncircle 178125873072912 Process 3RN 14424763612 Main:n=17812587459,id (n) = 140103573185600 Total time:0.1056210994720459
N is a global variable, and the print result of Main is equal to that of threads, which proves that there is data sharing between threads.
But why does multithreading take longer to run than multiprocesses? This is similar to what we said above (thread overhead
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.