In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article is to share with you about how to improve the speed of Pandas. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
1. Optimization of data reading
Reading data is a necessary link before data analysis, and there are many data reading functions built into pandas, the most common of which is to use pd.read_csv () function to read data from csv files. Data in pkl format is the fastest to read, so for everyday data sets (mostly in csv format), you can first read them in pandas, then dump the data into pkl or hdf format, and then save some time each time you read the data. The code is as follows:
Import pandas as pd
# read csv
Df = pd.read_csv ('xxx.csv')
# pkl format
Save in df.to_pickle ('xxx.pkl') # format
Df = pd.read_pickle ('xxx.pkl') # read
# hdf format
Save in df.to_hdf ('xxx.hdf','df') # format
Df = pd.read_hdf ('xxx.pkl','df') # read
2. Optimization of aggregation operation
When using agg and transform for operations, try to use the built-in functions of Python, which can improve the running efficiency. (the data is still used by the above test case)
(1) agg+Python built-in function
(2) agg+ non-built-in functions
You can see a 60% improvement in efficiency when using built-in functions for the agg method.
(3) transform+Python built-in function
(4) transform+ non-built-in functions
For the transform method, it is twice as efficient when using built-in functions.
3. Optimize the data line by row.
Suppose we now have such a data set of electricity consumption and the price of electricity for the corresponding period of time. The dataset records the hourly power consumption. For example, the first row represents the power consumption of 0.586kwh at zero on January 13, 2001. The price of electricity is different in different periods of use, and our aim now is to find out the total electricity charge, so we need to calculate the unit electricity charge × electricity consumption in the corresponding period. Here are three writing methods, and we test them separately to compare the differences between the three writing methods and the differences in code efficiency.
# write a function to get the corresponding result
Def get_cost (kwh, hour):
If 0 pip install numba
Let's test the speed-up effect of numba with a simple example.
Import numba
@ numba.vectorize
Def f_with_numba (x):
Return x * 2
Def f_without_numba (x):
Return x * 2
# method 1: apply operation line by line
Df ["double_energy"] = df.energy_kwh.apply (f_without_numba)
# method 2: vectorization operation
Df ["double_energy"] = df.energy_kwh*2
# method 3: use numba to accelerate
# need to be passed in the form of numpy array
# otherwise an error will be reported
Df ["double_energy"] = f_with_numba (df.energy_kwh.to_numpy ())
From the test results, it once again highlights the advantages of vectorization processing, at the same time, numba can also improve the efficiency of vectorization processing which is already very fast.
Thank you for reading! This is the end of the article on "how to improve the running speed of Pandas". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.