In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly explains "how to quickly master Python collaboration". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to quickly master Python collaboration".
1. Concepts related to collaborative process
1.1 processes and threads
A process (Process) is an instance started by an application, which has code, data and files, and independent memory space, and is the smallest resource management unit of the operating system. There are one or more threads (Thread) under each process, which are responsible for performing the calculation of the program, which is the smallest execution unit.
The point is: the operating system is responsible for allocating the resources of the process; the control lies mainly in the operating system. On the other hand, as the execution unit of the task, the thread has new, runnable runnable (calling the start method, entering the scheduling pool, waiting to get the right to use cpu), running running (getting the right to use cpu to start executing the program) blocking blocked (giving up the right to use cpu, waiting again) different states in the dead dead5. The transition of the thread is also controlled by the operating system. In the case of resource sharing, threads need to be locked, such as producer and consumer mode, where producer production data is shared in queues, and consumers consume data from shared queues.
Switching cpu usage rights requires a loss of performance when threads and processes get and give up cpu usage rights, because a thread must remember all the states of the last execution in order to continue to execute the task when it gets the right to use cpu again. In addition, there is a lock problem with the thread.
1.2 parallelism and concurrency
Parallelism and concurrency sound like performing different tasks at the same time. But the meaning of the same time is different.
Parallelism: multicore CPU can really be executed at the same time, that is, independent resources to complete different tasks, without sequence.
Concurrent: it appears to be executed at the same time, but the actual micro level is sequential execution, which is the scheduling of processes by the operating system and the fast context switching of cpu. Each process executes for a while and then stops, and the cpu resources switch to another process, but the switching time is very short, and it seems that multiple tasks are executing at the same time. To achieve large concurrency, you need to cut tasks into small tasks.
The multicore cpu mentioned above may be executed at the same time, but here it may be related to operating system scheduling. If the operating system is scheduled to the same cpu, then cpu is required to switch context. Of course, in multi-core cases, operating system scheduling will consider different cpu as much as possible.
The context switch here can be understood as the need to retain the state and data of different tasks. All concurrent processing queues, wakes up, and performs at least three of these steps
1.3 Synergy
We know that threads are proposed to achieve the purpose of parallelism in the case of multi-core cpu. And the execution of threads is completely controlled by the operating system. The Coroutine is under the thread, and the control lies with the user. The essence is to enable multiple groups of processes to cross-execute in one thread without consuming all the resources alone, so as to achieve the goal of high concurrency.
Advantages of Collaborative process:
The biggest advantage of collaborative process is its extremely high execution efficiency. Because subroutine switching is not thread switching, but is controlled by the program itself, therefore, there is no thread switching overhead, compared with multithreading, the more threads, the more obvious the performance advantage of the co-program.
The second advantage is that there is no need for multi-threaded locking mechanism, because there is only one thread, and there is no conflict between writing variables at the same time, and the shared resources are not locked in the cooperative program, and only the state is judged, so the execution efficiency is much higher than that of multi-threads.
The difference between a collaborator and a thread:
Co-programs do not participate in multi-core CPU parallel processing, and co-programs are not parallel
Threads are parallel on multi-core processors, while single-core processors are scheduled by the operating system.
The co-program needs to retain the state of the last call
The state of the thread is controlled by the operating system
Let's go over these literal concepts for a moment, and when we connect them with show your code, it will be clearer.
2. Threads in python
Threads in python cannot achieve true parallelism even in the case of multicore cpu for historical reasons. The reason for this is that the global interpreter lock GIL (global interpreter lock), GIL is not exactly a feature of python, but a concept introduced by cpython. When parsing multithreads, the cpython interpreter applies a GIL lock to ensure that only one thread gets the right to use CPU at a time.
Why do you need everything in GIL python to be objects? the collection of objects in Cpython is judged by the reference count of objects. When the reference count of objects is 0, garbage collection will be carried out and memory will be automatically released. But in the case of multithreading, reference counting becomes a shared variable Cpython is the most popular Python interpreter, using reference counting to manage memory, in Python, everything is an object, reference counting is the number of pointers to the object, when this number becomes 0, garbage collection will be carried out and memory will be automatically released. But the problem is that Cpython is not thread-safe.
Consider that if there are two threads An and B referencing an object obj at the same time, the reference count of obj is 2 and An is going to revoke the reference to obj, and when the reference count is subtracted from 1 when the first step is completed, thread switching occurs and A suspends waiting and has not yet performed the operation of destroying the object. B enters the running state, when B also dereferencing obj, and completes the reference count minus 1, destroys the object, and at this time the reference number of obj is 0, freeing memory. If A wakes up again at this time, you will continue to destroy the object, but there is no object at this time. Therefore, in order to ensure that there is no data pollution, GIL is introduced.
Each thread will obtain GIL permission before using it, and release GIL permission after using it. The timing of thread release is determined by check_interval, another mechanism of python.
In multicore cpu, there will be additional performance loss because of the need to acquire and release GIL locks. Especially because of scheduling control, for example, one thread releases the lock, and the scheduling then allocates cpu resources to the same thread, and when the thread initiates an application, it regains the GIL, while other threads are actually waiting, wasting time in applying for and releasing the lock.
Threads in python are more suitable for Imax O-intensive operations (disk IO or network IO).
The use of threads
Import os import time import sys from concurrent import futures def to_do (info): for i in range (100000000): pass return info [0] MAX_WORKERS = 10 param_list = [] for i in range (5): param_list.append (('text%s'% I,' info%s'% I)) workers = min (MAX_WORKERS, len (param_list)) # with will not return until all tasks have been completed So this will block with futures.ThreadPoolExecutor (workers) as executor: results = executor.map (to_do, sorted (param_list)) # print all for result in results: print (result) # non-blocking ways Suitable for cases where there is no need to return a result workers = min (MAX_WORKERS, len (param_list)) executor = futures.ThreadPoolExecutor (workers) results = [] for idx, param in enumerate (param_list): result = executor.submit (to_do) Param) results.append (result) print ('result% s'% idx) # manually wait for all tasks to complete executor.shutdown () print ('='* 10) for result in results: print (result.result ())
3. Processes in python
The multiprocessing package provided by python avoids the shortcomings of GIL and realizes the purpose of parallelism on multi-core cpu. Multiprocessing also provides a mechanism for sharing data and memory between processes. The implementation of concurrent.futures introduced here. The usage is basically the same as threading, changing ThreadPoolExecutor to ProcessPoolExecutor
Import os import time import sys from concurrent import futures def to_do (info): for i in range (10000000): pass return info [0] start_time = time.time () MAX_WORKERS = 10 param_list = [] for i in range (5): param_list.append (('text%s'% I,' info%s'% I) workers = min (MAX_WORKERS) Len (param_list)) # with default will not return until all tasks have been completed So here will block with futures.ProcessPoolExecutor (workers) as executor: results = executor.map (to_do, sorted (param_list)) # print all for result in results: print (result) print (time.time ()-start_time) # takes 0.3704512119293213s, while the thread version needs 14.935384511947632s
4. Cooperative programs in python
4.1 simple collaboration
Let's first take a look at how python implements the collaborative process. The answer is yield. The function of the following example is to calculate the moving average
From collections import namedtuple Result = namedtuple ('Result',' count average') # coprogram function def averager (): total = 0.0 count = 0 average = None while True: term = yield None # pause Wait for the incoming data from the main program to wake up if term is None: break # decide whether to exit total + = term count + = 1 average = total/count # cumulative status Trigger coro_avg including the previous status return Result (count, average) # protocol coro_avg = averager () # pre-activation protocol next (coro_avg) # the caller provides data to the protocol coro_avg.send (10) coro_avg.send (30) coro_avg.send (6.5) try: coro_avg.send (None) except StopIteration as exc: # after execution, a StopIteration exception is thrown The return value is contained in the property value of the exception result = exc.value print (result)
The yield keyword has two meanings: output and concession; output the value to the right of yield to the caller, while making concessions to suspend execution and let the program continue.
As shown in the above example,
The co-program uses yield to control the flow, receive and output data
Next (): pre-activation collaborator
Send: the co-program receives data from the caller
StopIteration: controls the end of the collaboration process and gets the return value at the same time
Let's review the concept of collaborative programs in 1.3: the essence is to enable multiple groups of processes to cross-execute in a single thread without consuming all resources alone. no, no, no. How do you explain the above example?
You can take one task at a time, that is, moving average.
Each task can be divided into small steps (or subroutines), that is, the average of one number at a time
What if multiple tasks need to be performed? How to call the controller on the caller
If there are 10, it is conceivable that if you call a data send randomly given to each task during control, multiple tasks will be executed across each other to achieve the goal of concurrency.
4.2 asyncio Cooperative Program Application package
Asyncio is Asynchronous Imax O, such as network requests with high concurrency (e. G. millions of concurrency). Asynchronous Iripple O means that you can do something else instead of waiting for the execution to finish. The underlying layer of asyncio is implemented in a collaborative way. Let's first look at an example to understand the internal organs of asyncio.
Import time import asyncio now = lambda: time.time () # async defines the co-program async def do_some_work (x): print ("waiting:", x) # await suspend blocking, which is equivalent to yield, and is usually a time-consuming operation await asyncio.sleep (x) return "Done after {} s" .format (x) # callback function And yield output similar functions def callback (future): print ("callback:", future.result ()) start = now () tasks = [] for i in range (1,4): # define multiple coprograms, and pre-activate coroutine = do_some_work (I) task = asyncio.ensure_future (coroutine) task.add_done_callback (callback) tasks.append (task) # set a circular event list Put the task cooperation program in it, loop = asyncio.get_event_loop () try: # execute the task cooperation program asynchronously until all operations are completed You can also collect multiple tasks through asyncio.gather loop.run_until_complete (asyncio.wait (tasks)) for task in tasks: print ("Task ret:" Task.result () except KeyboardInterrupt as e: # State control of collaborative tasks print (asyncio.Task.all_tasks ()) for task in asyncio.Task.all_tasks (): print (task.cancel ()) loop.stop () loop.run_forever () finally: loop.close () print ("Time:", now ()-start)
The above concepts are as follows:
Event_loop event loop: the program starts an infinite loop, registers some functions with the event loop, and calls the corresponding co-program function when the event occurs.
Coroutine: a co-program object, a function defined using the async keyword, whose call does not execute the function immediately, but returns a co-program object. The collaborator object needs to be registered with the event loop and called by the event loop.
Task task: a co-program object is a function that can be natively suspended, and the task further encapsulates the co-program, which contains the various states of the task.
Future: represents the result of tasks that are performed or not performed in the future. There is no fundamental difference between it and task
Async/await keyword: python3.5 is used to define the keyword of a co-program, async defines a co-program, and await is used to suspend blocked asynchronous invocation interfaces. As can be seen from the above, asyncio helps us to handle the control of the caller through events, including send to the data of the co-program. All we have to do is define the protocol through async, define blocking by await, then encapsulate it into a task of future, put it into a circular event list, and wait for the data to be returned.
Let's look at an example of a http download, for example, if you want to download five different url (again, you want to receive millions of external requests)
Import time import asyncio from aiohttp import ClientSession tasks = [] url = "https://www.baidu.com/{}" async def hello (url): async with ClientSession () as session: async with session.get (url) as response: response = await response.read () # print (response) print ('Hello World:%s'% time.time ()) if _ _ name__ ='_ _ main__': loop = asyncio.get_event_loop () for i in range (5): task = asyncio.ensure_future (hello (url.format (I) tasks.append (task) loop.run_until_complete (asyncio.wait (tasks))
4.3 the application scenario of the cooperative program
Support for high concurrency Imax O situations, such as write support for high concurrency servers
Provide concurrency performance instead of threads
Both tornado and gevent have implemented similar functions, as mentioned in the previous article that Twisted is also
Thank you for your reading, the above is the content of "how to quickly master Python cooperation program", after the study of this article, I believe you have a deeper understanding of how to quickly master Python cooperation program, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.