Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to Speed up pandas Computing

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Editor to share with you how to speed up the speed of pandas computing, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to understand it!

What's bothering us?

Using pandas, when you run the following lines:

# Standard apply

Df.apply (func)

Get this CPU usage:

Standard pandas applies-only 1 CPU is used

Even if your computer has multiple CPU, only one is completely dedicated to your calculation.

Instead of using the following CPU, you want an easy way to get something like this:

Parallel Pandas applies-use all CPU

How can Pandaral lel   help solve this problem?

The idea of Pandaral lel   is to distribute pandas computing across all available CPU on the computer to significantly increase speed.

Installation:

$pip install pandarallel [--user]

Import and initialize:

# Import

From pandarallel import pandarallel

# Initialization

Pandarallel.initialize ()

Usage:

Using the simple use case df with pandas DataFrame and the function func to be applied, you only need to replace the parallel_apply of the classic apply.

# Standard pandas apply

Df.apply (func)

# Parallel apply

Df.parallel_apply (func)

Done!

Note that if you don't want to parallelize the calculation, you can still use the classical apply method.

You can also use the initialize function of a progress bar progress_bar=True that will display one progress bar for each work CPU.

Parallel application progress bar

And equipped with more complex cases with pandas DataFrame df, two columns of column1,column2 for the data frame and functional application func:

# Standard pandas apply

Df.groupby (column1) .column2.column (4) .apply (func)

# Parallel apply

Df.groupby (column1). Column 2. Column (4). Parallel_apply (func)

Datum

For the four examples provided here, perform the following configuration:

Https://github.com/nalepae/pandarallel/blob/master/docs/examples.ipynb

Operating system: Linux Ubuntu 16.04

Hardware: Intel Core i7 @ 3.40 GHz-4 core

Standard and parallelism on 4 cores (the lower the better)

Except that the df.groupby.col_name.rolling.apply speed only increases by x3.2 factor, the average speed increases by about x4 factor, even if the number of cores on the used computer.

How does it work under the hood?

When calling parallel_apply, Pandaral ·lel:

Instantiate an Pyarrow Plasma shared memory

Https://arrow.apache.org/docs/python/plasma.html

Create a child process for each CPU, and then require each CPU to work on a child part of the DataFrame

Merge all results into the parent process

The above is all the contents of the article "how to Speed up pandas Computing". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report