In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Editor to share with you how to speed up the speed of pandas computing, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to understand it!
What's bothering us?
Using pandas, when you run the following lines:
# Standard apply
Df.apply (func)
Get this CPU usage:
Standard pandas applies-only 1 CPU is used
Even if your computer has multiple CPU, only one is completely dedicated to your calculation.
Instead of using the following CPU, you want an easy way to get something like this:
Parallel Pandas applies-use all CPU
How can Pandaral lel help solve this problem?
The idea of Pandaral lel is to distribute pandas computing across all available CPU on the computer to significantly increase speed.
Installation:
$pip install pandarallel [--user]
Import and initialize:
# Import
From pandarallel import pandarallel
# Initialization
Pandarallel.initialize ()
Usage:
Using the simple use case df with pandas DataFrame and the function func to be applied, you only need to replace the parallel_apply of the classic apply.
# Standard pandas apply
Df.apply (func)
# Parallel apply
Df.parallel_apply (func)
Done!
Note that if you don't want to parallelize the calculation, you can still use the classical apply method.
You can also use the initialize function of a progress bar progress_bar=True that will display one progress bar for each work CPU.
Parallel application progress bar
And equipped with more complex cases with pandas DataFrame df, two columns of column1,column2 for the data frame and functional application func:
# Standard pandas apply
Df.groupby (column1) .column2.column (4) .apply (func)
# Parallel apply
Df.groupby (column1). Column 2. Column (4). Parallel_apply (func)
Datum
For the four examples provided here, perform the following configuration:
Https://github.com/nalepae/pandarallel/blob/master/docs/examples.ipynb
Operating system: Linux Ubuntu 16.04
Hardware: Intel Core i7 @ 3.40 GHz-4 core
Standard and parallelism on 4 cores (the lower the better)
Except that the df.groupby.col_name.rolling.apply speed only increases by x3.2 factor, the average speed increases by about x4 factor, even if the number of cores on the used computer.
How does it work under the hood?
When calling parallel_apply, Pandaral ·lel:
Instantiate an Pyarrow Plasma shared memory
Https://arrow.apache.org/docs/python/plasma.html
Create a child process for each CPU, and then require each CPU to work on a child part of the DataFrame
Merge all results into the parent process
The above is all the contents of the article "how to Speed up pandas Computing". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.