In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the knowledge about "Pandas how to realize the apply conversion of groupby grouping". In the operation process of actual cases, many people will encounter such difficulties. Next, let Xiaobian lead you to learn how to deal with these situations! I hope you can read carefully and learn something!
Knowledge: Pandas GroupBy follows split, apply, combine patterns
Here split refers to the group by of pandas, we implement the apply function ourselves, the result returned by apply is combined by pandas to get the result
GroupBy.apply(function)
The first parameter of function is dataframe.
function returns results, but dataframes, series, single values, or even completely unrelated to the input dataframe
This example demonstrates:
How do I normalize a column of values by grouping?
How do I get the TOPN data for each group?
Example 1: How to normalize a column of values by grouping?
Normalize the numerical columns of different ranges and map them to the interval [0,1]:
It is easier to compare data horizontally, for example, the price field is hundreds to thousands, and the increase field is 0 to 100.
Machine learning models learn faster and perform better
Normalized formula:
Demo: Normalization of user ratings for movies
Each user's rating varies, some optimists score high, some pessimists score low, normalized by user
import pandas as pd
ratings = pd.read_csv(
"./ datas/movielens-1m/ratings.dat",
sep="::",
engine='python',
names="UserID::MovieID::Rating::Timestamp".split("::")
)
ratings.head()
#Implement grouping by user ID and then normalizing one of the columns
def ratings_norm(df):
"""
@param df: dataframe for each user group
"""
min_value = df["Rating"].min()
max_value = df["Rating"].max()
df["Rating_norm"] = df["Rating"].apply(
lambda x: (x-min_value)/(max_value-min_value))
return df
ratings = ratings.groupby("UserID").apply(ratings_norm)
ratings[ratings["UserID"]==1].head()
You can see that UserID==1, Rating==3 is his lowest score, is an optimist, we normalized to 0 points;
Example 2: How do I get the TOPN data for each packet?
Get the highest temperature data for 2 days per month in 2018
fpath = "./ datas/beijing_tianqi/beijing_tianqi_2018.csv"
df = pd.read_csv(fpath)
#Replace suffix ℃ for temperature
df.loc[:, "bWendu"] = df["bWendu"].str.replace("℃", "").astype('int32')
df.loc[:, "yWendu"] = df["yWendu"].str.replace("℃", "").astype('int32')
#Add a new column for the month
df['month'] = df['ymd'].str[:7]
df.head()
def getWenduTopN(df, topn):
"""
df here is df for each month grouping group
"""
return df.sort_values(by="bWendu")[["ymd", "bWendu"]][-topn:]
df.groupby("month").apply(getWenduTopN, topn=1).head()
We see that the dataframe returned by group's apply function can actually be completely different from the original dataframe.
"Pandas how to implement groupby group apply conversion" content is introduced here, thank you for reading. If you want to know more about industry-related knowledge, you can pay attention to the website. Xiaobian will output more high-quality practical articles for everyone!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.