In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the knowledge of "how to speed up the Pandas iteration by 150x". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Make a data set
The easiest way to make a point clear is to declare a single-column data box object with integer values ranging from 1 to 100000:
You really don't need anything more complicated to solve the speed problem of Pandas. To verify that everything is going well, here are the first few rows and the overall shape of the dataset:
Well, now that the preparation is done, let's take a look at how to traverse and how not to traverse the rows of the data box. First of all, I introduce how not to make a choice.
Here are some things you shouldn't do.
Ah, the author has been using (and overusing) so many iterrows () methods. It is slow by default, but you know why I bother to find alternatives (short-sighted).
To prove that you shouldn't use the iterrows () method to iterate through the data box, I'll do a quick demonstration-declare a variable and initially set it to 0muri-and then increment it by the current value of the Values property at each iteration.
If you want to know the%% time magic function returns the number of seconds / milliseconds it takes for the cell to complete all operations.
Let's see how this function works:
You might now think that it's not much to iterate through 100000 lines in 15 seconds and increment the values of some external variables. But the truth is-- see the reasons in the next section.
Here's what you should do.
Now there is a magic way to save it-itertuples (). As the name implies, itertuples () loops through the rows of the data box and returns a named tuple. This means that these values cannot be accessed with parentheses [], but need to be used. The reason for the symbol.
You will now demonstrate the same example as a few minutes ago, but using the itertuples () method:
Look at that! Using itertuples () to do the same operation is about 154x faster! Now imagine your daily work scenario, where you are dealing with millions of lines-itertuples () can save you a lot of time.
In this simple example, we have seen the huge impact that small changes to the code can have on the overall result.
This does not mean that itertuples () is 150x faster than iterrows () in every scenario, but it does mean that it is faster each time.
That's all for "how to speed up the Pandas iteration by 150x". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.