Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to improve the running speed of Pandas

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article is to share with you about how to improve the speed of Pandas. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

1. Optimization of data reading

Reading data is a necessary link before data analysis, and there are many data reading functions built into pandas, the most common of which is to use pd.read_csv () function to read data from csv files. Data in pkl format is the fastest to read, so for everyday data sets (mostly in csv format), you can first read them in pandas, then dump the data into pkl or hdf format, and then save some time each time you read the data. The code is as follows:

Import pandas as pd

# read csv

Df = pd.read_csv ('xxx.csv')

# pkl format

Save in df.to_pickle ('xxx.pkl') # format

Df = pd.read_pickle ('xxx.pkl') # read

# hdf format

Save in df.to_hdf ('xxx.hdf','df') # format

Df = pd.read_hdf ('xxx.pkl','df') # read

2. Optimization of aggregation operation

When using agg and transform for operations, try to use the built-in functions of Python, which can improve the running efficiency. (the data is still used by the above test case)

(1) agg+Python built-in function

(2) agg+ non-built-in functions

You can see a 60% improvement in efficiency when using built-in functions for the agg method.

(3) transform+Python built-in function

(4) transform+ non-built-in functions

For the transform method, it is twice as efficient when using built-in functions.

3. Optimize the data line by row.

Suppose we now have such a data set of electricity consumption and the price of electricity for the corresponding period of time. The dataset records the hourly power consumption. For example, the first row represents the power consumption of 0.586kwh at zero on January 13, 2001. The price of electricity is different in different periods of use, and our aim now is to find out the total electricity charge, so we need to calculate the unit electricity charge × electricity consumption in the corresponding period. Here are three writing methods, and we test them separately to compare the differences between the three writing methods and the differences in code efficiency.

# write a function to get the corresponding result

Def get_cost (kwh, hour):

If 0 pip install numba

Let's test the speed-up effect of numba with a simple example.

Import numba

@ numba.vectorize

Def f_with_numba (x):

Return x * 2

Def f_without_numba (x):

Return x * 2

# method 1: apply operation line by line

Df ["double_energy"] = df.energy_kwh.apply (f_without_numba)

# method 2: vectorization operation

Df ["double_energy"] = df.energy_kwh*2

# method 3: use numba to accelerate

# need to be passed in the form of numpy array

# otherwise an error will be reported

Df ["double_energy"] = f_with_numba (df.energy_kwh.to_numpy ())

From the test results, it once again highlights the advantages of vectorization processing, at the same time, numba can also improve the efficiency of vectorization processing which is already very fast.

Thank you for reading! This is the end of the article on "how to improve the running speed of Pandas". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report