In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article focuses on "how to understand Python multithreading", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to understand Python multithreading.
In the actual processing of data, because of the limited memory of the system, it is impossible for us to export all the data at once, so we need to export in batch. In order to speed up the operation, we will use the multi-threaded method for data processing. The following is my summary of the multi-thread batch processing data template:
Import threading# class class Scheduler (): def _ init__ (self): self._lock = threading.RLock () self.start = 0 # take 10000 pieces of data at a time self.step = 10000 def getdata (self): # locked to prevent multiple threads from accessing the database at the same time Fetch duplicate data self._lock.acquire () # to fetch data data = 'select * from table'\' where id between self.start and self.start + self.step' # after fetching the data The pointer moves back to self.start + = self.step self._lock.release () return data# the process of processing data is written here def processdata (): # extracting data from this instance data = scheduler.getdata () while data: # for specific operations of data processing: # de-duplication, filling, operation. As long as there is still data, this thread will continue to fetch new data # and then get the data, loop data = scheduler.getdata () # to create multi-thread Threads_num for the number of threads created def threads_scheduler (threads_num): threads = [] for i in range (threads_num): # create thread td = threading.Thread (target=processdata Name='th'+str (iTun1) threads.append (td) for t in threads: # start thread t.start () for t in threads: # child thread guard t.join () print ('all data has been processed successfully') if _ _ name__=='__main__': # instantiate a scheduler Initialization parameter scheduler = Scheduler () # create thread and start processing data threads_scheduler (4)
It is mainly divided into three parts:
The Scheduler class is responsible for initializing parameters, and the getdata method is responsible for extracting data
Write the flow of specific data processing in the processdata method
The threads_scheduler method is responsible for creating threads
I will explain the knowledge of Python multithreading in four parts. Let's review the main points below:
Multithreaded threading
This chapter first introduces the related concepts of threading:
Main thread: when a program starts, a process is created by the operating system (OS), while a thread runs immediately, which is usually called the program's main thread (Main Thread). Because it is executed at the beginning of the program, if you need to create another thread, then the created thread is a child of the main thread.
Child threads: lines created with threading and ThreadPoolExecutor are all child threads.
The importance of the main thread is reflected in two aspects: 1. Is the thread that generates other child threads; 2. Usually it must be executed at last, such as performing various close actions.
In the flying program, if there is no multithreading, we cannot play fast cars while listening to music. Listening to music and playing games cannot go hand in hand; after using multithreading, we can listen to background music while playing games. In this example, starting the car program is a process, playing games and listening to music are two threads.
Python provides the threading module to implement multithreading: threading.Thread can create threads; setDaemon (True) is the main guardian thread, and False;join () is the guardian child thread by default.
From time import sleepimport threadingdef music (music_name): for i in range (2): print ('listening {}' .format (music_name)) sleep (1) print ('music over') def game (game_name): for i in range (2): print (' playing {} '.format (game_name)) sleep (3) print (' game over') threads = [] T1 = threading.Thread (target=music) Args= ('Daoxiang',) threads.append (T1) T2 = threading.Thread (target=game,args= ('flying car',) threads.append (T2) if _ _ name__ = ='_ _ main__': for t in threads: # t.setDaemon (True) t.start () for t in threads: t.join () print ('main thread ends') thread pool
Because the new thread system needs to allocate resources, and the terminating thread system needs to recycle resources, if you can reuse threads, you can subtract the overhead of creating / terminating to improve performance. At the same time, the syntax of using a thread pool is more concise than creating your own thread to execute a thread.
Python provides us with ThreadPoolExecutor to implement the thread pool, which defaults to the child thread daemon. It adapts to a sudden large number of requests or requires a large number of threads to complete the task, but the actual task processing time is short.
From time import sleep# fun is the function to be run with ThreadPoolExecutor (max_workers=5) as executor: ans = executor.map (fun, [ergodic value]) for res in ans: print (res) with ThreadPoolExecutor (max_workers=5) as executor: list = [ergodic value] ans = [executor.submit (fun, I) for i in list] for res in as_completed (ans): print (res.result ())
Where max_workers is the number of threads in the thread pool, and the common traversal methods are map and submit+as_completed. Depending on the business scenario, if we need the output to be returned in traversal order, we use the map method, and if we want to finish it first, we use the submit+as_complete method.
Thread mutual exclusion
We call the resources that only one thread is allowed to use in a period of time as critical resources, and the access to critical resources must be mutually exclusive. Mutual exclusion, also known as indirect restriction relationship. Thread mutual exclusion means that when one thread accesses a critical resource, another thread that wants to access the critical resource must wait. The current thread access to the critical resource ends, and after the resource is released, another thread can access the critical resource. The function of a lock is to implement thread mutual exclusion.
I compare thread mutual exclusion to the process of going to a large size in a toilet room, because there is only one pit in the room, so only one person is allowed to use a large account. When the first person wants to go to the toilet, the door will be locked. If the second person also wants a large account, he must wait for the first person to unlock the lock. In the meantime, the second person can only wait outside the door. This process is similar to the principle of using locks in code, where the pit is the critical resource.
Locks are introduced into the threading module of Python. The threading module provides the Lock class, which has the following methods to lock and release locks:
Acquire (): locks the Lock, where the timeout parameter specifies how many seconds to lock
Release (): release the lock
Class Account: def _ _ init__ (self, card_id, balance): # encapsulates the account ID and the two variables of the account balance self.card_id= card_id self.balance = balance def withdraw (account Money): # lock lock.acquire () # account balance is greater than the number of withdrawals if account.balance > = money: # spit out banknotes print (threading.current_thread (). Name + "money withdrew successfully! Spit out banknotes: "+ str (money), end='') # modify the balance account.balance-= money print ("\ t balance: "+ str (account.balance)) else: print (threading.current_thread (). Name +" failed to withdraw money! Insufficient balance ") # unlock lock.release () # create an account with a bank card id of 8888 Deposit 1000 yuan acct = Account ("8888", 1000) # simulate two withdrawals from the same account # create a lock in the main thread lock = threading.Lock () threading.Thread (name=' window, target=withdraw, args= (acct, 800). Start () threading.Thread (name=' window, target=withdraw, args= (acct, 800). The difference between start () lock and Rlock
Difference one: Lock is called primitive lock, and a thread can only request once; RLock is called reentrant lock, which can be requested multiple times by a thread, that is, locks can be nested in the lock.
Import threadingdef main (): lock.acquire () print ('first lock') lock.acquire () print ('second lock') lock.release () lock.release () if _ name__ = ='_ _ main__': lock = threading.Lock () main ()
We will find that this program only prints the first lock, and the program neither terminates nor continues to run. This is because when the Lock lock is not released after the first lock in the same thread, the second acquire request is made, so the release cannot be executed, so the lock can never be released, which is a deadlock. If we use RLock, we can run normally and there will be no deadlock.
Difference two: when Lock is locked, does not belong to a specific thread, can be unlocked and released in another thread; while RLock only the current thread can release the lock on this thread, can not be released by other threads, so when using RLock, acquire and release must appear in pairs, that is, unlocking the bell must also tie the bell.
Import threadingdef main (): lock.release () print ("print after child thread unlocks") if _ _ name__ = ='_ _ main__': lock = threading.Lock () lock.acquire () t = threading.Thread (target=main) t.start ()
Define the Lock lock in the main thread, then lock it, and then create a child thread t to run the main function to release the lock, and the result is output normally, indicating that the lock on the main thread can be unlocked by the child thread.
If you change the lock above to RLock, an error will be reported. In practice, when designing a program, we encapsulate each function into a function, and each function may have a critical area, so we need to use RLock.
Import threadingimport timedef fun_1 (): print ('start') time.sleep (1) lock.acquire () print ("first lock") fun_2 () lock.release () def fun_2 (): lock.acquire () print ("second lock") lock.release () if _ _ name__ = ='_ _ main__': lock = threading.RLock () T1 = threading.Thread (target=fun_1) T2 = threading.Thread (target=fun_1) t1.start () t2.start ()
In a word, Lock cannot be naughty, but RLock can; Lock can be operated by locks in other threads, and RLock can only be operated by this thread.
At this point, I believe you have a deeper understanding of "how to understand Python multithreading". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.