In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article will explain in detail what is the concept and usage of Python. The content of the article is of high quality, so the editor will share it with you for reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.
As for the collaborative program, I said that its efficiency is indeed not comparable to that of multithreading, but I do not have an in-depth understanding of it, so I have referred to some materials in recent days, studied and sorted it out, and share it here for your reference only.
Cooperative process
Concept
Co-program, also known as microthreading, fiber, English name Coroutine. The function of the co-program is that when executing function A, you can interrupt at any time to execute function B, and then interrupt to continue to execute function A (you can switch freely). But this process is not a function call (there is no call statement), the whole process looks like multithreading, but the co-program is executed by only one thread.
advantage
Execution efficiency is extremely high, because subroutine switching (function) is not thread switching, it is controlled by the program itself, and there is no overhead of switching threads. Therefore, compared with multithreading, the more the number of threads, the more obvious the advantage of collaborative performance.
There is no need for multithreaded locking mechanism, because there is only one thread, there is no simultaneous write variable conflict, and there is no need to lock when controlling shared resources, so the execution efficiency is much higher.
Note: collaborative programs can deal with the efficiency of IO-intensive programs, but dealing with CPU-intensive programs is not its advantage, such as to make full use of CPU utilization can be combined with multi-processes + collaborative programs.
The above are just some concepts of the collaborative process, which may sound abstract, so let me talk about it in combination with the code. This paper mainly introduces the application of the cooperative program in Python, the support of the Python2 to the cooperative program is relatively limited, the yield implementation of the generator is partial but not complete, but the gevent module has a better implementation; Python3.4 later introduced the asyncio module, which can make good use of the cooperative program.
Python2.x cooperation program
Python2.x collaboration application:
Yield
Gevent
There are not many modules that support cooperative programs in python2.x, and gevent is more commonly used. Here is a brief introduction to the usage of gevent.
Gevent
Gevent is a third-party library, which implements the cooperative program through greenlet. Its basic idea is:
When a greenlet encounters an IO operation, such as accessing the network, it automatically switches to another greenlet, waits for the IO operation to complete, and then switches back to continue execution at the appropriate time. Because the IO operation is very time-consuming and often makes the program wait, with gevent automatically switching protocols for us, it ensures that there is always greenlet running instead of waiting for IO.
Install
Pip install gevent
* version seems to support windows, before the test seems to be unable to run on windows.
Usage
First, let's look at a simple example of a crawler:
#!-*-coding:utf-8-*-import gevent from gevent import monkey;monkey.patch_all () import urllib2 def get_body (I): print "start", I urllib2.urlopen ("http://cn.bing.com") print" end ", I tasks= [gevent.spawn (get_body,i) for i in range (3)] gevent.joinall (tasks))
Running result:
Start 0 start 1 start 2 end 2 end 0 end 1
Note: from the result point of view, the order of performing get_body should be output "start" first, and then when you encounter IO congestion when you execute urllib2, you will automatically switch to run the next program (continue to execute get_body output start) until urllib2 returns the result, and then execute end. In other words, instead of waiting for the urllib2 to request the website to return the result, the program skips it first and waits for the execution to finish and then comes back to get the return value. It is worth mentioning that in this process, only one thread is executing, so this is different from the concept of multithreading.
Replace it with multithreaded code to see:
Import threading import urllib2 def get_body (I): print "start", I urllib2.urlopen ("http://cn.bing.com") print" end ", i for i in range (3): t=threading.Thread (target=get_body,args= (I,)) t.start ()
Running result:
Start 0 start 1 start 2 end 1 end 2 end 0
Note: from the result, the effect of multithreading is the same as that of the co-program, which achieves the function of switching when IO is blocked. The difference is that multithreading switches threads (inter-thread switching) and co-routines switch context (which can be understood as executing functions). The overhead of switching threads is obviously greater than that of switching context, so when there are more threads, the efficiency of the cooperative program is higher than that of multiple threads. (guess the switching cost of multi-process should be * *)
Instructions for using Gevent
Monkey can make some blocking modules non-blocking. The mechanism: automatically switch when you encounter IO operation, and you can use gevent.sleep (0) to switch manually. (replace the crawler code with this, and the effect can also achieve context switch).
Gevent.spawn starts the co-program. The parameters are function name and parameter name.
Gevent.joinall stops the cooperative process
Python3.x cooperation program
In order to test the collaborative application under Python3.x, I installed the environment of python3.6 under virtualenv.
Python3.x collaboration application:
Asynico + yield from (python3.4)
Asynico + await (python3.5)
Gevent
Python3.4 later introduced the asyncio module, which can well support the collaborative process.
Asynico
Asyncio is a standard library introduced by Python version 3.4 and has built-in support for asynchronous IO directly. The asynchronous operation of asyncio needs to be done through yield from in coroutine.
Usage
Example: (to be used in later versions of python3.4)
Import asyncio @ asyncio.coroutine def test (I): print ("test_1", I) r=yield from asyncio.sleep (1) print ("test_2", I) loop=asyncio.get_event_loop () tasks= [test (I) for i in range (5)] loop.run_until_complete (asyncio.wait (tasks)) loop.close ()
Running result:
Test_1 3 test_1 4 test_1 0 test_1 1 test_1 2 test_2 3 test_2 0 test_2 2 test_2 4 test_2 1
Note: as can be seen from the running results, the effect achieved by gevent is the same as that achieved by IO operation (so output test_1 first, and then output test_2 after test_1 output). But here I don't know why the output of test_1 is not executed sequentially. You can compare the output of gevent (I hope the god can answer it).
Asyncio description
@ asyncio.coroutine marks a generator as coroutine, and then we throw the coroutine into the EventLoop for execution.
Test () prints out the test_1 first, and then the yield from syntax makes it easy for us to call another generator. Because asyncio.sleep () is also a coroutine, the thread does not wait for asyncio.sleep (), but simply interrupts and executes the next message loop. When asyncio.sleep () returns, the thread can get the return value from yield from (in this case, None) and then proceed to the next line of statement.
Think of asyncio.sleep (1) as a 1-second IO operation, during which the main thread does not wait, but executes other executable coroutine in the EventLoop, so concurrent execution can be implemented.
Asynico/await
To simplify and better identify asynchronous IO, new syntax async and await have been introduced from Python 3.5 to make coroutine code more concise and readable.
Note that async and await are new syntax for coroutine, and to use the new syntax, you only need to do two simple replacements:
Replace @ asyncio.coroutine with async
Replace yield from with await.
Usage
Example (used in later versions of python3.5):
Import asyncio async def test (I): print ("test_1", I) await asyncio.sleep (1) print ("test_2", I) loop=asyncio.get_event_loop () tasks= [test (I) for i in range (5)] loop.run_until_complete (asyncio.wait (tasks)) loop.close ()
The running result is the same as before.
Explanation: compared with the previous section, yield from is just replaced by await,@asyncio.coroutine and async, and the rest remains the same.
Gevent
It's the same as python2.x.
Cooperative VS multithreading
If you already understand the difference between multithreading and collaboration through the above introduction, then I don't think testing is necessary. Because when there are more and more threads, the main overhead of multithreading is spent on thread switching, while the co-program is switched within a thread, so the overhead is much lower, which may be the fundamental difference in performance between the two. (personal opinion)
Asynchronous crawler
Perhaps most of the friends who care about the cooperative program use it to write crawlers (because the cooperative program can solve the problem of IO blocking very well), however, I find that the commonly used urllib and requests can not be used in combination with asyncio, perhaps because the crawler module itself is synchronized (or maybe I can't find a usage). So for the requirements of asynchronous crawlers, how to use collaborative procedures? Or how to write asynchronous crawlers?
Here are a few scenarios that I know about:
Grequests (asynchronization of requests modules)
Crawler module + gevent (this is recommended)
Aiohttp (this doesn't seem to have much information, and I don't know how to use it at the moment)
Asyncio has built-in crawler function (this is also difficult to use)
Co-program pool
Function: control the number of collaborative processes
From bs4 import BeautifulSoup import requests import gevent from gevent import monkey, pool monkey.patch_all () jobs = [] links = [] p = pool.Pool (10) urls = ['http://www.google.com', #... Another 100 urls] def get_links (url): r = requests.get (url) if r.status_code = 200: soup = BeautifulSoup (r.text) links + soup.find_all ('a') for url in urls: jobs.append (p.spawn (get_links, url) gevent.joinall (jobs) about the concept and usage of Python protocol, that's all. I hope the above content can be of some help to you and learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.