Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is Python Asynchronous Crawler Multithreading and Thread Pool

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly explains "what is Python asynchronous crawler multithreading and thread pool". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what is Python asynchronous crawler multithreading and thread pool".

Catalogue

Background

Asynchronous crawler mode

Multi-thread, multi-process (not recommended)

Thread pool, process pool (appropriate use)

Single thread + asynchronous co-program (recommended)

Multithreading

Thread pool

Background

When sending requests to multiple url, only the first url is requested before the second url is requested (requests is a blocking operation). There is a waiting time, which is very inefficient. Can we start the process or thread separately when sending the request and wait, continue to request the next url, and execute the parallel request?

Asynchronous crawler mode multi-thread, multi-process (not recommended)

Benefits: threads or processes can be opened separately for related blocking operations, and blocking operations can be performed asynchronously

Disadvantages: can not open multi-thread or multi-process indefinitely (need to frequently create or destroy processes, threads)

Thread pool, process pool (appropriate use)

Benefits: it can reduce the frequency of process or thread creation and destruction, thus reducing the overhead of the system.

Disadvantages: there is a limit to the number of threads or process pools

Single thread + asynchronous co-program (recommended) multithreading

It takes 8 seconds to run the following code normally, because sleep is a blocking operation and nothing else is performed while waiting, which greatly reduces efficiency.

From time import sleepimport timestart = time.time () def xx (str): print ('downloading:', str) sleep (2) str = ['xiaozi',' aa', 'bb',' cc'] for i in str: xx (I) end = time.time () print ('running time:', end-start)

After using multithreading

From threading import Threadfrom time import sleepimport timestart = time.time () def xx (str): print ('downloading:', str) sleep (2) str = ['xiaozi','aa','bb','cc'] def main (): for s in str: # start thread, target= function name, args= (xx,), xx is the parameter passed to the function and must be of tuple type So you need to add t = Thread (target=xx,args= (s,)) t.start () if _ _ name__ ='_ _ main__': main () end = time.time () print ('program run time:', end-start)

But we found that the following sequence seems to be a little out of order.

Thread pool

Change the above to thread pool and run

# from multiprocessing.dummy import Poolfrom time import sleepimport timestart = time.time () def xx (str): print ('downloading:', str) sleep (2) str = ['xiaozi','aa','bb','cc'] # instantiate a thread pool object and open up four thread objects in the thread pool Four parallel threads handle four blocking operations pool = Pool (4) # pass each list element (iterable object) in the list to the xx function (blocking operation) for processing # the map method will have a return value as the function's return value (a list) But there is no return value here, so do not consider # calling the map method pool.map (xx,str) end = time.time () print ('program run time:', end-start)

Thank you for reading, the above is the content of "what is Python asynchronous crawler multithreading and thread pool". After the study of this article, I believe you have a deeper understanding of what is Python asynchronous crawler multithreading and thread pool. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report