In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
Today, I will talk to you about how to use an open source tool to achieve the visualization of multithreaded Python programs, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following for you. I hope you can get something according to this article.
VizTracer can track concurrent Python programs to help record, debug, and parse.
Concurrency is an essential part of modern programming because we have multiple cores and many tasks that require collaboration. However, when concurrent programs do not run sequentially, it is difficult to understand them. For engineers, finding bug and performance problems in these programs is not as easy as in single-threaded, single-tasking programs.
In Python, you have a variety of concurrency options. Probably the most common are multithreading with threading modules, multiprocesses with subprocess and multiprocessing modules, and recently the async syntax provided with asyncio modules. Prior to VizTracer, there was a lack of tools to analyze programs that used these technologies.
VizTracer is a tool for tracking and visualizing Python programs, which is very helpful for logging, debugging, and profiling. Although it is easy to use for single-threaded and single-task programs, its practicability in concurrent programs is its unique feature.
Try a simple task
Start with a simple exercise task: calculate whether the integers in an array are prime and return a Boolean array. Here is a simple solution:
Def is_prime (n): for i in range (2, n): if n% I = 0: return False return True def get_prime_arr (arr): return [is_prime (elem) for elem in arr]
Try to run it properly in a single thread with VizTracer:
If _ _ name__ = "_ _ main__": num_arr = [random.randint (100,10000) for _ in range (6000)] get_prime_arr (num_arr) viztracer my_program.py
Running code in a single thread
The call stack report shows that it takes about 140ms, and most of the time is spent on get_prime_arr.
Call-stack report
This just executes the is_prime function over and over again on the elements in the array.
This is what you expect, and it's not interesting (if you know VizTracer).
Try a multithreaded program
Try to do this with a multithreaded program:
If _ name__ = "_ main__": num_arr = [random.randint (100,100) for i in range (2000)] thread1 = Thread (target=get_prime_arr, args= (num_arr,)) thread2 = Thread (target=get_prime_arr, args= (num_arr,)) thread3 = Thread (target=get_prime_arr, args= (num_arr) ) thread1.start () thread2.start () thread3.start () thread1.join () thread2.join () thread3.join ()
To match the workload of a single-threaded program, this uses an array of 2000 elements for three threads, simulating a situation in which three threads share tasks.
Multi-thread program
If you are familiar with Python's global interpreter lock (GIL), you will think that it will not go any faster. Because of the high cost, it took a little more time for 140ms. However, you can observe the concurrency of multithreading:
Concurrency of multiple threads
When one thread is working (executing multiple is_prime functions), another thread is frozen (an is_prime function); later, they switch. This is because of GIL, which is why Python doesn't really have multithreading. It can achieve concurrency, but not parallelism.
Try using multiple processes.
The way to achieve parallelism is the multiprocessing library. Here is another version that uses multiprocessing:
If _ _ name__ = = "_ main__": num_arr = [random.randint (100,10000) for _ in range (2000)] p1 = Process (target=get_prime_arr, args= (num_arr,)) p2 = Process (target=get_prime_arr, args= (num_arr,)) p3 = Process (target=get_prime_arr, args= (num_arr) ) p1.start () p2.start () p3.start () p1.join () p2.join () p3.join ()
To run it using VizTracer, you need an extra parameter:
Viztracer-log_multiprocess my_program.py
Running with extra argument
The whole program is completed in a little more time in 50ms, and the actual task is completed before 50ms. The speed of the program is about three times faster.
For comparison with the multithreaded version, here is the multiprocess version:
Multi-process version
In the absence of GIL, multiple processes can be implemented in parallel, that is, multiple is_prime functions can be executed in parallel.
However, Python's multithreading is not useless. For example, for compute-intensive and Iamp O-intensive programs, you can use sleep to forge a task bound to Iripple O:
Def io_task (): time.sleep
Try it in a single-threaded, single-task program:
If _ _ name__ = "_ _ main__": for _ in range (3): io_task ()
I/O-bound single-thread, single-task program
The whole program uses 30ms or so, nothing special.
Now use multithreading:
If _ _ name__ = = "_ main__": thread1 = Thread (target=io_task) thread2 = Thread (target=io_task) thread3 = Thread (target=io_task) thread1.start () thread2.start () thread3.start () thread1.join () thread2.join () thread3.join ()
I/O-bound multi-thread program
The program takes time to 10ms, and it is clear that the three threads are working concurrently, which improves overall performance.
Try using asyncio.
Python is trying to introduce another interesting feature called asynchronous programming. You can make an asynchronous version of the task:
Import asyncio async def io_task (): await asyncio.sleep (0.01) async def main (): T1 = asyncio.create_task (io_task ()) T2 = asyncio.create_task (io_task ()) T3 = asyncio.create_task (io_task () await T1 await T2 await T3 if _ name__ = "_ _ main__": asyncio.run (main ())
Since asyncio is literally a single-threaded scheduler with tasks, you can use VizTracer directly on it:
VizTracer with asyncio
It still took 10ms, but most of the functions shown are the underlying structure, which may not be of interest to the user. To solve this problem, you can use-- log_async to separate the real tasks:
Viztracer-log_async my_program.py
Using-log_async to separate tasks
Now, the user's task is clearer. Most of the time, no task is running (because the only thing it does is sleep). The interesting part is here:
Graph of task creation and execution
This shows the creation and execution time of the task. Task-1 is the main () coordinator that creates other tasks. Task-2, Task-3, Task-4 execute io_task and sleep and then wait to wake up. As shown in the figure, because it is a single-threaded program, there is no overlap between tasks, and VizTracer is visualized to make it easier to understand.
To make it more interesting, you can add a call to time.sleep to the task to block asynchronous loops:
Async def io_task (): time.sleep (0.01) await asyncio.sleep
Time.sleep call
The program takes longer (40ms), and the task fills the gap in the asynchronous scheduler.
This feature is very helpful in diagnosing behavior and performance problems of asynchronous programs.
See what happened to VizTracer?
With VizTracer, you can see the progress of the program on the timeline instead of imagining it from a complex log. This will help you better understand your concurrent programs.
VizTracer is open source, released under the Apache 2.0 license, and supports all common operating systems (Linux, macOS, and Windows). You can learn more about its functions and access its source code in VizTracer's GitHub repository.
After reading the above, do you have any further understanding of how to use an open source tool to visualize multithreaded Python programs? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.