In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
Editor to share with you how to use transform () combined with groupby () in Pandas. I hope you will get something after reading this article. Let's discuss it together.
First, suppose we have the following restaurant data set:
Import pandas as pddf = pd.DataFrame ({'restaurant_id': [101102103104105106107],' address': ['Achilles,' Fathers,'G'], 'city': [' London','London','London','Oxford','Oxford', 'Durham',' Durham'], 'sales': [10102103104105106107])
If we want to know: what is the percentage of sales of each restaurant in the city? The expected output is:
Compared to the original data set, there are two more columns, namely, the total sales of all restaurants in a city, and the percentage of sales of each restaurant in the city. There are two solutions:
Option 1 (more troublesome):
1. Use groupby ('city') to group by city. For each of these groups, select its sales column [' sales'], and then use the function apply (sum) or sum () to sum the city's sales.
After that, the new column is renamed to city_total_sales and the index is reset (note that reset_index () cannot be omitted, because the index generated by groupby ('city') is a city, and we want the city to be a normal column).
City_sales = df.groupby ('city') [' sales'] .sum (). Rename ('city_total_sales'). Reset_index ()
The resulting city_sales is as follows:
2. Merge city_sales back with the merge () function, and the resulting df_new is as follows:
Df_new = pd.merge (df, city_sales, how='left')
3. Finally, calculate the percentage and keep two decimal places. The result is as follows:
Df_new ['pct'] = df_new [' sales'] / df_new ['city_total_sales'] df_new [' pct'] = df_new ['pct'] .apply (lambda x: format (x,' .2%'))
Option 2 (convenience):
1 、
The transform () function retains the same number of items as the original dataset after performing the transformation. Therefore, using groupby () and then transform (sum) returns the same output, as shown in the following figure:
Df ['city_total_sales'] = df.groupby (' city') ['sales'] .transform (' sum')
The code translates as follows: the dataset is grouped based on the city, then the sales column is selected, the sales of each group are summed up, and a new column with the same length as the original column is returned.
2 、
It's the same as plan one.
Df ['pct'] = df [' sales'] / df ['city_total_sales'] df [' pct'] = df ['pct'] .apply (lambda x: format (x,' .2%'))
Summary: you can see that after grouping DataFrame into groupby (), if you use apply () or directly use a statistical function, the length of the new column is the same as the number of groups you get; if you use transform (), the length of the new column is the same as that of the column in DataFrame.
After reading this article, I believe you have a certain understanding of "how to use transform () with groupby () in Pandas". If you want to know more about it, you are welcome to follow the industry information channel. Thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.