In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces the relevant knowledge of "python multithreading how to achieve multitasking". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
1 multithreading to achieve multitasking 1.1 what is a thread?
A process is a unit in which the operating system allocates program execution resources, while a thread is an entity of a process and a unit of CPU scheduling and allocation. A process must have a main thread, and we can create multiple threads in a process to implement multitasking.
1.2 the method of multitasking with one program
There are several ways to achieve multitasking.
(1) start multiple sub-processes in the main process, and the main process and multiple sub-processes deal with the task together.
(2) multiple child threads are opened in the main process, and the main thread and multiple child threads process the task together.
(3) multiple cooperative programs are opened in the main process, and multiple cooperative programs deal with tasks together.
Note: because using multiple threads to deal with tasks together will cause thread safety problems, multi-process + multi-co-programming is generally used in development to achieve multitasking.
1.3Multithreading creation method 1.3.1 create threading.Thread object import threadingp1 = threading.Thread (target= [function name], args= ([arguments to be passed in function]) p1.start () # start the p1 thread
Let's simulate multithreading for multitasking.
If you are using NetEase Yun music while listening to music while downloading. NetEyun music is a process. Suppose NetEase cloud music internal program is multithreaded to achieve multitasking, NetEase cloud music open two sub-threads. One is used to cache music for current playback. One used to download the music that the user wants to download. The code framework at this time looks like this:
Import threadingimport time def listen_music (name): while True: time.sleep (1) print (name, "Music is playing") def download_music (name): while True: time.sleep (2) print (name, "downloading music") if _ _ name__ = = "_ _ main__": P1 = threading.Thread ("NetEyun Music" )) p2 = threading.Thread (target=download_music,args= ("NetEyun Music",)) p1.start () p2.start ()
Output:
If you look at the output code above, you can see:
CPU executes child threads in a timeslice poll. Time slices will be allocated reasonably within cpu. When the time slice reaches program a, if program an is dormant, it will automatically switch to program b.
Strictly speaking, CPU is only performing one task at some point in time, but because CPU runs and switches fast, it looks like multiple tasks are executed together.
1.3.2 inherit threading.Thread and rewrite run
In addition to the above method to create a thread, there is another way. You can write a class that inherits the threaing.Thread class, and then overrides the parent class's run method.
Import threadingimport time class MyThread (threading.Thread): def run (self): for i in range (5): time.sleep (1) print (self.name,i) T1 = MyThread () T2 = MyThread () T3 = MyThread () t1.start () t2.start () t3.start ()
Output:
Unordered at run time, indicating that multitasking is enabled.
Here are the thread object methods and properties provided by threading.Thread:
Start (): after creating a thread, start the thread through start and wait for CPU scheduling to prepare for the execution of the run function
Run (): the entry function that the thread begins to execute. The function body calls the user-written target function, or executes the overloaded run function.
Join ([timeout]): blocks the thread that called the function until the called thread completes execution or times out. This method is usually called in the main thread, waiting for other threads to finish execution.
Name, getName () & setName (): operations related to thread names
Ident: thread identifier of integer type, None before thread starts execution (before calling start)
IsAlive (), is_alive (): True after the execution of the start function to before the execution of the run function
Daemon, isDaemon () & setDaemon (): related to daemon threads
1.4 when does the thread start and end
(1) when the child thread starts and runs when the thread is started when thread.start () is called, and then run the code of the thread
(2) when the child thread ends when the child thread finishes executing the statement in the function pointed to by target, or immediately ends the current child thread after the execution of the run function code in the thread
(3) View the number of current threads and enumerate all currently running threads through threading.enumerate ().
(4) when the main thread ends after all child threads have finished execution, the main thread ends.
Example 1:
Import threadingimport time def run (): for i in range (5): time.sleep (1) print (I) T1 = threading.Thread (target=run) t1.start () print ("where will I show up")
Output:
Why does the code of the main process (main thread) appear first? Because CPU uses time slice polling, if a child thread is polled and it is found that he wants to sleep for 1 second, he will first run the main thread. Therefore, the time slice polling mode of CPU can ensure the best operation of CPU.
What if I want the sentence output from the main process to run at the end? I don't know what to do At this point you need to use the join () method.
Join () method import threadingimport time def run (): for i in range (5): time.sleep (1) print (I) T1 = threading.Thread (target=run) t1.start () t1.join () print ("where will I show up")
Output:
The join () method blocks the main thread (note that only the main thread can be blocked, not the other child threads) until the T1 child thread finishes executing before unblocking.
1.6 problems with multi-thread sharing global variables
We open two child threads, the global variable is 0, each thread adds 1 to it, each thread adds a million times, and then there will be a problem. Come on, look at the code:
Import threadingimport time num = 0 def work1 (loop): global num for i in range (loop): # equivalent to num + = 1 temp = num num = temp + 1 print (num) def work2 (loop): global num for i in range (loop): # equivalent to num + = 1 temp = num num = temp + 1 print (num) if _ name__ = "_ _ main_ _ ": T1 = threading.Thread (target=work1 Args= (1000000,)) T2 = threading.Thread (target=work2, args= (1000000,)) t1.start () t2.start () while len (threading.enumerate ())! = 1: time.sleep (1) print (num)
Output
1459526 # Global variables are added to this number after the end of the first child thread
1588806 # Global variables are added to this number after the end of the second child thread
1588806 # when both threads are finished, the global variables are added to this number
That's weird. Don't I add a million times to each thread? In theory, the final result should be 2 million. What is the problem?
We know that CPU uses time slice polling to execute several threads.
Suppose I CPU first polled work1 (), and num is now 100, and by the time I run to line 10, the time is over! At this point, the value has been assigned, but it has not been added yet! That is, temp=100,num=100.
Then, the timeslice polls work2 () and adds the value itself. Num=101.
Back to the breakpoint of work1 (), num=temp+1,temp=100, so num=101.
That's all! Num is missing a self-addition! After too many times, such errors accumulate, and the result is only 158806!
This is the thread safety problem!
1.7 mutexes can make up for some thread safety problems. Mutex locks and GIL locks are different things! )
When multiple threads modify a shared data almost at the same time, synchronization control is needed.
Thread synchronization can ensure that multiple threads can safely access competitive resources, and the simplest synchronization mechanism is to introduce mutex locks.
Mutexes introduce a state for resources: locked / non-locked
When a thread wants to change shared data, it is locked first, when the state of the resource is "locked" and cannot be changed by other threads; until the thread releases the resource and changes the state of the resource to "unlocked", other threads can lock the resource again. The mutex ensures that only one thread writes at a time, thus ensuring the correctness of data in the case of multiple threads.
There are three common steps for mutexes:
Lock = threading.Lock () # acquire lock lock.acquire () # lock lock.release () # unlock
Let's use mutexes to solve the thread safety problem in the above example.
Import threadingimport time num = 0lock = threading.Lock () # acquire lock def work1 (loop): global num for i in range (loop): # equivalent to num + = 1 lock.acquire () # Lock temp = num num = temp + 1 lock.release () # unlock print (num) def work2 (loop): global num for i in range (loop): # equivalent to num + = 1 lock.acquire () # Lock temp = num num = temp + 1 lock.release () # unlock print (num) if _ _ name__ = "_ _ main__": T1 = threading.Thread (target=work1 Args= (1000000,)) T2 = threading.Thread (target=work2, args= (1000000,)) t1.start () t2.start () while len (threading.enumerate ())! = 1: time.sleep (1) print (num)
Output:
1945267 # Global variables are added to this number after the end of the first child thread
2000000 # Global variables are added to this number after the end of the second child thread
2000000 # when both threads are finished, the global variables are added to this number
1.8 Thread Pool ThreadPoolExecutor
Starting with Python3.2, the standard library provides us with the concurrent.futures module, which provides two classes, ThreadPoolExecutor and ProcessPoolExecutor, and implements a further abstraction of threading and multiprocessing (here we focus on thread pool), which can not only help us schedule threads automatically, but also:
The main thread can get the status of a thread (or task) and return a value.
When a thread finishes, the main thread can know immediately.
Make the coding interface of multi-thread and multi-process consistent.
1.8.1 create thread pool
Example:
The from concurrent.futures import ThreadPoolExecutorimport time # parameter times is used to simulate the time of the network request def get_html (times): time.sleep (times) print ("get page {} s finished" .format (times)) return times executor = ThreadPoolExecutor (max_workers=2) # submit the executed function to the thread pool through the submit function, and the submit function returns immediately Do not block task1 = executor.submit (get_html, (3)) task2 = executor.submit (get_html, (2)) # done method is used to determine whether a task is completed or not print ("1:", task1.done ()) # cancel method is used to cancel a task that cannot be successfully print ("2:", task2.cancel ()) time.sleep (4) print ("3:") if the task is not placed in the thread pool Task1.done () # result method can get the execution result of task print ("4:", task1.result ())
Output:
When ThreadPoolExecutor constructs an instance, pass in the max_workers parameter to set the maximum number of threads in the thread pool that can run simultaneously.
Use the submit function to submit the task (function name and parameters) that the thread needs to perform into the thread pool and return the handle to the task (similar to file, drawing). Note that submit () is not blocked, but returns immediately.
From the task handle returned by the submit function, you can use the done () method to determine whether the task is finished or not. As you can see in the above example, because the task has a delay of 2 seconds, immediately after the task1 is submitted, the task1 is not completed, but after the delay of 4 seconds, the task1 is completed.
You can cancel a submitted task using the cancel () method, but not if the task is already running in the thread pool. In this example, the size of the thread pool is set to 2, and the task is already running, so cancellation fails. If you change the size of the thread pool to 1, then the first to submit is that task1,task2 is still waiting in queue, which is the time to cancel successfully.
You can get the return value of the task using the result () method. Look at the internal code and find that this method is blocked.
1.8.2 as_completed
Although the above provides a way to determine whether the task ends or not, it cannot be judged all the time in the main thread. Sometimes when we know that a task is over, we get the results, instead of always judging whether each task is over or not. This is the result of using the as_completed method to fetch all tasks at once.
From concurrent.futures import ThreadPoolExecutor, as_completedimport time # parameter times is used to simulate the time of the network request def get_html (times): time.sleep (times) print ("get page {} s finished" .format (times)) return times executor = ThreadPoolExecutor (max_workers=2) urls = [3,2,4] # is not the real urlall_task = [executor.submit (get_html) (url)) for url in urls] for future in as_completed (all_task): data = future.result () print ("in main: get page {} s success" .format (data)) # execution result # get page 2s finished# in main: get page 2s success# get page 3s finished# in main: get page 3s success# get page 4s finished# in main: get page 4s success
The as_completed () method is a generator that blocks when no task is completed, yield the task when a task is completed, executes the statement under the for loop, and then continues blocking until the end of all tasks. As can be seen from the results, the task completed first will notify the main thread first.
1.8.3 map
In addition to the as_completed method above, you can also use the executor.map method, but it's a little different.
The from concurrent.futures import ThreadPoolExecutorimport time # parameter times is used to simulate the time of the network request def get_html (times): time.sleep (times) print ("get page {} s finished" .format (times)) return times executor = ThreadPoolExecutor (max_workers=2) urls = [3,2,4] # is not the real url for data in executor.map (get_html Urls): print ("in main: get page {} s success" .format (data)) # execution result # get page 2s finished# get page 3s finished# in main: get page 3s success# in main: get page 2s success# get page 4s finished# in main: get page 4s success
Using the map method, there is no need to use the submit method in advance, the map method has the same meaning as map in the python standard library, which performs the same function for each element in the sequence. The above code executes the get_html function on each element of urls and assigns each thread pool. You can see that the execution result is different from that of the above as_completed method, and the output order is the same as that of the urls list. Even if the 2s task is executed first, the 3s task will be printed first, and then the 2s task will be printed.
1.8.4 wait
The wait method allows the main thread to block until the set requirements are met.
From concurrent.futures import ThreadPoolExecutor, wait, ALL_COMPLETED, FIRST_COMPLETEDimport time # Parameter times is used to simulate the time of the network request def get_html (times): time.sleep (times) print ("get page {} s finished" .format (times)) return times executor = ThreadPoolExecutor (max_workers=2) urls = [3,2,4] # is not the real urlall_task = [executor.submit (get_html, (url) for url in urls] wait (all_task) Return_when=ALL_COMPLETED) print ("main") # execution result # get page 2s finished# get page 3s finished# get page 4s finished# main
The wait method takes three parameters, the waiting task sequence, the timeout, and the wait condition. The return_ waiting condition defaults to ALL_COMPLETED, which indicates that you want to wait for all tasks to finish. You can see that in the run result, it is true that all the tasks have been completed before the main thread prints out the main. The wait condition can also be set to FIRST_COMPLETED, which means that the wait stops when the first task is completed.
2. Multiprocess implements multitasking 2.1 multithreading.
The way you create a process is similar to how you create a thread:
Instantiate an object of multiprocessing.Process and pass in an initialization function object (initial function) as the execution entry for the new process
Inherit multiprocessing.Process and override the run function
2.1.1 Mode 1
Before we begin, we need to know what a process is. The reason is very simple, your computer usually opens the QQ client, is a process. Open another QQ client, and it's another process. So, how can you start several processes with a single piece of code in python? Demonstrate through a simple example:
Import multiprocessingimport time def task1 (): while True: time.sleep (1) print ("I am task1") def task2 (): while True: time.sleep (2) print ("I am task2") if _ _ name__ = = "_ _ main__": p1 = multiprocessing.Process (target=task1) # multiprocessing.Process creates the child process object p1 p2 = multiprocessing.Process (target=task2) # multiprocessing. Process created the child process object p2 p1.start () # child process p1 starts p2.start () # child process p2 starts print ("I am main task") # this is the task of the main process
Output:
You can see that the child process object is created by the Process class in the multiprocessing module. In addition to the two created child processes of p1 and p2. And, of course, the main process. The main process is our code from beginning to end, including child processes that are also created by the main process.
The points to pay attention to are:
(1) first explain concurrency: concurrency means that when the number of tasks is greater than the number of cpu cores, multiple tasks can be executed "together" through various task scheduling algorithms of the operating system. (in fact, there are always some tasks that are not being performed, because switching tasks are so fast that they seem to want to be executed at the same time.)
(2) in the case of concurrency, there is no order between the child process and the main process. CPU will use time slice polling, which program will run first.
(3) by default, the main process waits for the execution of all child processes before it exits. So in the above example, the p1, p2 child process is an endless loop process, the last code of the main process print ("I am main task") is finished, but the main process will not shut down, he will always wait for the child process.
(4) the main process creates a non-daemon by default. Attention, combined with 3. And 5. Look.
(5) but! If the child process is a daemon, then after the main process runs the last sentence of code, the main process will shut down directly, regardless of whether your child process has finished running!
2.1.2 method 2from multiprocessing import Process import os, time class CustomProcess (Process): def _ init__ (self, p_name, target=None): # step 1: call base _ init__ function () super (CustomProcess, self). _ init__ (name=p_name, target=target, args= (p_name,) def run (self): # step 2: # time.sleep (0.1print) ("CustomProcess name:% s") Pid:% s "% (self.name, os.getpid ()) if _ _ name__ = =" _ _ main__ ": P1 = CustomProcess (" process_1 ") p1.start () p1.join () print (" subprocess pid:% s "% p1.pid) print (" current process pid:% s "% os.getpid ())
Output:
Here you can think about it: if there is a global variable share_data like multithreading, will it be a problem for different processes to access share_data at the same time?
Because each process has a separate memory address space and is isolated from each other, the share_data seen by different processes is different, located in a different address space, and access will not be a problem. We need to pay attention here.
2.2 Daemon
Test:
Import multiprocessingimport time def task1 (): while True: time.sleep (1) print ("I am task1") def task2 (): while True: time.sleep (2) print ("I am task2") if _ _ name__ = = "_ _ main__": p1 = multiprocessing.Process (target=task1) p2 = multiprocessing.Process (target=task2) p1.daemon = True # set the p1 child process to Daemon p2.daemon = True # set the P2P child process to the daemon p1.start () p2.start () print ("I am main task")
Output:
I am main task
Isn't the output a little strange? Why are there no outputs of p1 and p2 child processes?
Let's sort out our thoughts:
Create p1, p2 child processes
Set p1, p2 child process to daemon
P1, p2 child processes open
There is sleep time in both p1 and p2 child process code, so cpu first does the subsequent code of the main process in order not to waste time.
Execute the subsequent code of the main process, print ("I am main task")
The subsequent code execution of the main process is complete, so the remaining child processes are daemons and are all about to shut down. But what if the code of the main process is finished and there are two child processes, one daemon and one non-daemon? In fact, he will wait for the non-daemon child process to finish running, and then all three processes will shut down together.
P1Gramp2 was terminated while it was still dormant, so there was no output.
For example, set P1 to a non-daemon:
Import multiprocessingimport time def task1 (): I = 1 while I < 5: time.sleep (1) I + = 1 print ("I am task1") def task2 (): while True: time.sleep (2) print ("I am task2") if _ name__ = = "_ _ main__": p1 = multiprocessing.Process (target=task1) p2 = multiprocessing.Process (target=task2) p2.daemon = True # set the P2P child process to the daemon p1.start () p2.start () print ("I am main task")
Output:
It involves two points of knowledge:
(1) when the main process ends, a message is sent to the child process (daemon). When the daemon receives the message, it ends immediately.
(2) CPU runs multiple processes according to time slice polling. Which is the right one to run, if you have time.sleep in your child process. Well, CPU, in order not to waste resources, I must do something else first.
So, the daemon can be interrupted at any time, what is the point of its existence?
In fact, daemons are mainly used to do business-independent tasks, irrelevant tasks, dispensable tasks, such as memory garbage collection, timing of the execution of certain methods, and so on.
2.3.The created child process needs to pass the parameters import multiprocessing def task (a _ a) print ("b") print ("b") print (args) print (kwargs) if _ _ name__ = "_ _ main__": p1 = multiprocessing.Process (target=task,args= (1, name, 3, 4, 5), kwargs= {"name": "chichung" "age": 23}) p1.start () print ("the main process has finished running the last line of code")
Output:
The function that the child process wants to run needs to pass in the variable acentine b, a tuple, and a dictionary. When we create a child process, the variable aline b will be put into the tuple, and when the task function takes it, it will take out the first two and assign values to aforce b, respectively.
2.4 several commonly used methods of a child process p.start starts executing a child thread p.name to check the name of the child process p.pid to see whether the child process idp.is_alive determines whether the child process is alive or not p.join (timeout)
Block the main process, and when the child process p is finished, unblock it and let the main process run the subsequent code
If timeout=2, it is blocking the main process 2s, within which the main process can not run subsequent code. After 2s, even if the operator process is not finished, the main process can also run the subsequent code.
The p.terminate terminates the operation of the child process p import multiprocessing def task (import multiprocessing def task): print ("a") print ("b") print (args) print (kwargs) if _ _ name__ = = "_ _ main__": p1 = multiprocessing.Process (target=task,args= (1 name 2, 4 5), kwargs= {"name": "chichung" "age": 23}) p1.start () print (name of "p1 child process:% s"% p1.name) print ("id:%d"% p1.pid of p1 child process) p1.join () print (p1.is_alive ())
Output:
2.5 Global variables cannot be shared between processes
Global variables cannot be shared between processes, even if the child process and the main process. The reason is very simple, a new process, in fact, occupies a new memory space, different memory space, the variables must not be shared. The experiment proves as follows:
Example 1:
Import multiprocessing g_list = [123] def task1 (): g_list.append ("task1") print (g_list) def task2 (): g_list.append ("task2") print (g_list) def main_process (): g_list.append ("main_processs") print (g_list) if _ name__ = "_ _ main__": P1 = multiprocessing.Process (target=task1) p2 = multiprocessing.Process (target=task2) p1.start () p2.start () main_process () print ("11111:" G_list)
Output:
[123, "main_processs"]
11111, "main_processs"]
[123, "task1"]
[123, "task2"]
Example 2:
Import multiprocessingimport time def task1 (loop): global num for i in range (loop): # equivalent to num + = 1 temp = num num = temp + 1 print (num) print ("I am task1") def task2 (loop): global num for i in range (loop): # equivalent to num + = 1 temp = num num = temp + 1 print (num) print ("I am" Task2 ") if _ _ name__ = =" _ _ main__ ": P1 = multiprocessing.Process (target=task1 Args= (100000,) # multiprocessing.Process created the child process object p1 p2 = multiprocessing.Process (target=task2, args= (100000,) # multiprocessing.Process created the child process object p2 p1.start () # child process p1 starts p2.start () # child process p2 starts print ("I am main task") # this is the task of the main process
Output:
2.6 python process pool: multiprocessing.pool
The process pool can be understood as a queue, which can easily specify a number of child processes. when the queue is full of tasks, the new tasks have to be queued until the old process has some spare time to execute the new tasks.
When using Python for system management, especially when operating multiple file directories at the same time, or remotely controlling multiple hosts, parallel operation can save a lot of time. When the number of operated objects is small, you can directly use the Process in multiprocessing to dynamically generate multiple processes. A dozen are fine, but if there are hundreds or thousands of goals, it is too tedious to manually limit the number of processes. At this time, you can play the role of process pool.
Pool can provide a specified number of processes for users to call. When a new request is submitted to the pool, if the pool is not full, a new process will be created to execute the request; but if the number of processes in the pool has reached the specified maximum, the request will wait until there is a process in the pool.
2.6.1 use process pool (non-blocking) # coding: utf-8import multiprocessingimport time def func (msg): print ("msg:", msg) time.sleep (3) print ("end") if _ _ name__ = = "_ _ main__": pool = multiprocessing.Pool (processes = 3) # set the number of processes to 3 for i in range (4): msg = "hello% d"% (I) pool.apply_async (func) (msg,)) # the total number of processes maintained for execution is processes When a process finishes execution, a new process will be added to print ("Mark~ Mark~ Mark~~") pool.close () pool.join () # before calling join, call the close function first, otherwise an error will occur. After the close is executed, no new process is added to the pool,join function waiting for all child processes to finish the print ("Sub-process (es) done.")
Output:
Function explanation:
Apply_async (func [, args [, kwds [, callback]]) it is non-blocking, apply (func [, args [, kwds]]) is blocking (understand the difference, see example 1 case 2 result difference)
Close () closes pool so that it is no longer accepting new tasks.
Terminate () ends the work process and is no longer dealing with unfinished tasks.
The join () main process blocks, waiting for the child process to exit, and the join method is used after close or terminate.
Apply (), apply_async ():
Apply (): blocks the main process and executes the child processes one by one. After all the child processes have been executed, continue to execute the code of the main process behind apply ().
Apply_async () is non-blocking asynchronous, it will not wait for the child process to finish execution, the main process will continue to execute, and it will switch processes according to the system schedule.
Execution instructions: create a process pool pool, and set the number of processes to 3xrange (4) will successively generate four objects [0,1,2,4], four objects will be submitted to pool, because pool specifies the number of processes as 3, so 0,1,2 will be sent directly to the process for execution, when one of the execution is finished, a process processing object 3 will be vacated, so the output "msg: hello 3" will appear after "end". Because it is non-blocking, the main function will execute on its own, ignoring the execution of the process, so it directly outputs "mMsg: hark~ Mark~ Mark~~" after running the for loop, and the main program waits for the end of each process at pool.join ().
2.6.2 use process pool (blocking) # coding: utf-8import multiprocessingimport time def func (msg): print ("msg:", msg) time.sleep (3) print ("end") if _ _ name__ = = "_ main__": pool = multiprocessing.Pool (processes = 3) # set the number of processes to 3 for i in range (4): msg = "hello% d"% (I) pool.apply (func) (msg,)) # the total number of processes maintained for execution is processes When a process finishes execution, a new process will be added to print ("Mark~ Mark~ Mark~~") pool.close () pool.join () # before calling join, call the close function first, otherwise an error will occur. After the close is executed, no new process is added to the pool,join function waiting for all child processes to finish the print ("Sub-process (es) done.")
Output:
2.6.3 using process pools And follow the result import multiprocessingimport time def func (msg): print ("msg:", msg) time.sleep (3) print ("end") return "done" + msg if _ name__ = = "_ main__": pool = multiprocessing.Pool (processes=4) result = [] for i in range (3): msg = "hello% d"% (I) result.append (pool.apply_async (func, (msg) ) pool.close () pool.join () for res in result: print ("::", res.get ()) print ("Sub-process (es) done.")
Output:
Note: the get () function obtains the value of each returned result.
3 comparison between python multithreading and multiprocess
Let's take a look at two examples:
(1) example 1, multithreaded and single-threaded, start two python threads to do 100 million plus one operation respectively, and use a single thread to do 100 million add one operation:
Import threadingimport time def tstart (arg): var = 0 for i in range (100000000): var + = 1 print (arg, var) if _ _ name__ = "_ _ main__": T1 = threading.Thread (target=tstart, args= ("This is thread 1",)) T2 = threading.Thread (target=tstart, args= ("This is thread 2") ) start_time = time.time () t1.start () t2.start () t1.join () t2.join () print ("Two thread cost time:% s"% (time.time ()-start_time)) start_time = time.time () tstart ("This is thread 0") print ("Main thread cost time:% s"% (time.time ()-start_time))
Output:
In the above example, if only one of the T1 and T2 threads is turned on, the run time is basically the same as that of the main thread.
(2) example 2, using two processes
From multiprocessing import Process import os, time def pstart (arg): var = 0 for i in range (100000000): var + = 1 print (arg, var) if _ _ name__ = "_ _ main__": P1 = Process (target = pstart, args = ("1",)) p2 = Process (target = pstart, args = ("2") ) start_time = time.time () p1.start () p2.start () p1.join () p2.join () print ("Two process cost time:% s"% (time.time ()-start_time)) start_time = time.time () pstart ("0") print ("Current process cost time:% s"% (time.time ()-start_time)
Output:
Comparative analysis:
Dual-process parallel execution and single-process execution of the same operation code, time-consuming is basically the same, dual-process time-consuming will be slightly more, the possible reason is that process creation and destruction will make system calls, resulting in additional time overhead.
But for python threads, dual-thread parallel execution is much more time-consuming than single-thread execution, and the efficiency difference is nearly 10 times. If you change two parallel threads to serial execution, that is:
Import threadingimport time def tstart (arg): var = 0 for i in range (100000000): var + = 1 print (arg, var) if _ _ name__ = "_ _ main__": T1 = threading.Thread (target=tstart, args= ("This is thread 1",)) T2 = threading.Thread (target=tstart, args= ("This is thread 2") ) start_time = time.time () t1.start () t1.join () print ("thread1 cost time:% s"% (time.time ()-start_time)) start_time = time.time () t2.start () t2.join () print ("thread2 cost time:% s"% (time.time ()-start_time)) start_time = time.time ( ) tstart ("This is thread 0") print ("Main thread cost time:% s"% (time.time ()-start_time))
Output:
You can see that three threads execute sequentially, each at roughly the same time.
The essential reason is that dual threads are executed concurrently, not really in parallel. The reason is the GIL lock.
4 GIL lock
When we mention python multithreading, we have to mention GIL (Global Interpreter Lock global interpreter lock), which is a kind of lock implemented in CPython, the dominant python interpreter, to ensure data security. No matter how many threads there are in the process, only the thread that gets the GIL lock can run on CPU, even if it is a multicore processor. For a process, no matter how many threads there are, there is only one thread executing at any one time. For CPU-intensive threads, it is not only inefficient, but may be less efficient. Python multithreading is more suitable for IO-intensive programs. For programs that do need to run in parallel, consider multiple processes.
Multi-thread scramble for locks, CPU scheduling of threads, switching between threads and so on will have time overhead.
5 Thread and process comparison 5.1 the difference between thread and process
Here is a simple comparison between threads and processes
Process is the basic unit of resource allocation, and thread is the basic unit of CPU execution and scheduling.
Communication / synchronization mode:
Synchronization mode: mutex, recursive lock, condition variable, semaphore
Communication mode: threads in the same process share process resources, so the communication between threads is not similar to that used for data transmission between processes, and the communication between threads is mainly used for thread synchronization.
Communication methods: pipes, FIFO, message queues, signals, shared memory, socket,stream streams
Synchronization mode: PV semaphore, pipe
Process:
Thread:
What is really executed on the CPU is the thread, which is lighter than the process, and its switching and scheduling cost is less than the process.
Thread safety needs to be considered for shared process data between threads. Because processes are isolated and have independent memory space resources, it is relatively safe that data can only be transferred through the IPC (Inter-Process Communication) listed above.
The system consists of processes, each of which consists of code segment, data segment, stack space and stack space, as well as the operating system sharing part, which has three states: waiting, ready and running.
A process can contain multiple threads, and the resources of the process (file descriptors, global variables, heap space, etc.) are shared between threads, and register variables and stack space are private to the thread.
In the operating system, the death of a process will not affect other processes. If a thread in a process dies and OS supports threads in a many-to-one model, it will cause the current process to die.
If CPU and the system support multi-thread and multi-process, while multiple processes execute in parallel, the threads in each process can also execute in parallel, thus maximizing the performance of the hardware.
5.2 context switching between threads and processes
Process switching involves a lot of things, such as saving the register contents to the task state segment TSS, switching page tables, stacks, and so on. To put it simply, it can be divided into the following two steps:
Page global catalog switching to enable CPU to address the linear address space of the new process
Toggles the kernel stack and hardware context, which contains the contents of CPU registers and is stored in TSS
The thread runs in the process address space, and the switching process does not involve the transformation of the space, only the second step
5.3 use multithreading or multiprocess?
CPU-intensive: programs need to take up CPU for a large number of operations and data processing; suitable for multiple processes
Socket O-intensive: programs need to perform frequent Imax O operations, such as transferring and reading socket data in the network; suitable for multithreading
Because python multithreading is not executed in parallel, it is more suitable to be executed in parallel with I Dot O-intensive programs, and multi-process parallel execution is suitable for CPU-intensive programs.
This is the end of the content of "python multithreading how to achieve multitasking". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.