How to use the grouping function groupby and grouping operation function agg in python 07/12 Update SLTechnology News&Howtos

How to use the grouping function groupby and grouping operation function agg in python

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "how to use grouping function groupby and grouping operation function agg in python". In daily operation, I believe that many people have doubts about how to use grouping function groupby and grouping operation function agg in python. Xiaobian consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for everyone to answer the doubts of "how to use grouping function groupby and grouping operation function agg in python". Next, please follow the editor to study!

Groupby:

First create the data:

Import pandas as pdimport numpy as npdf = pd.DataFrame ({'Aids: [' asides: ['ASAS,' baked, 'asides,' canals, 'bads,' c'], 'Balls: [2,7,1,3,3,2,4,8],' cations: [100,87,96,130,105,87,96 DfOut [2]: a B C0 a 2 1001 b 7 872 a 1 963 c 3 1304 a 3 1055 c 2 876 b 4 96

Basic operations of groupby in pandas:

1. Group according to column An and find the mean of column B and column C.

Df.groupby ('A'). Mean () Out [6]: B CA a 2.000000 100.333333b 5.500000 91.500000c 4.333333 124.000000

Of course, you can also group by multiple columns to get the mean of other columns:

Df.groupby. Mean () Out [7]: CA B a 1 96 2 100 3 105b 4 96 7 87c 2 87 3 130 8 155

2. After grouping, select the column to calculate:

Data=df.groupby ('A') data ['B'] .std () Out [11]: Aa 1.00000b 2.12132c 3.21455Name: B, dtype: float64 # Select two columns of data, B and C. mean () Out [12]: B CA a 2.000000 100.3333b 5.500000 91.500000c 4.333333 124.000000

3. After grouping by A, you can use different aggregation methods for different columns (ps: this is very similar to hive)

Data.agg ({'Bauer Vuitton)) # column mean, column C summary Out [14]: C BA a 301 2.0000b 185.500000c 372 4.333333

4. If we use the same aggregation method for multiple columns after grouping according to A, we can use the apply function:

Df.groupby ('A') .apply (np.mean) Out [25]: B CA a 2.000000 100.333333b 5.500000 91.500000c 4.333333 124.000000

5. Divide a column of data into different range segments according to the data value for grouping operation

Create a dataset:

Np.random.seed (0) df = pd.DataFrame ({'Age': np.random.randint (20,70,100),' Sex': np.random.choice (['Male',' Female'], 100), 'number_of_foo': np.random.randint (1,20) Out [38]: Age Sex number_of_foo0 64 Female 141 67 Female 142 20 Female 12323 Male 17423 Female 15

Goal: divide the age field into three groups, which can be implemented in the following two ways:

# the first method: 1. Bins=4pd.cut (df ['Age'], bins=4) 0 (56.75,69.0] 1 (56.75,69.0] 2 (19.951, 32.25) 3 (19.951, 32.25] 4 (19.951, 32.25). # second method 2. Bins= [1950,40.0] pd.cut (df ['Age'], bins= [1950,65.0]) Out [40]: 0 (40.0,65.0) 1 (65.0,40.0] 3 (19.0,40.0] 3 (19.0,40.0] 4 (19.0,40.0] # the grouping range is as follows: age_groups = pd.cut (df [' Age']) Bins= df.groupby (age_groups). Mean () Out [43]: Age number_of_fooAge (19.0,40.0] 29.840000 9.880000 (40.0,65.0) 52.833333 9.452381 (65.0) Inf] 67.375000 9.250000 # make crosstab by 'Age' grouping range and gender (sex) pd.crosstab (age_groups, df [' Sex']) Out [44]: Sex Female MaleAge (19.0,40.0] 22 28 (40.0,65.0] 18 24 (65.0, inf] 3 5agg:

1. After grouping by column (A) using groupby, you need to adopt a different aggregation method for another column:

Df.groupby ('A') ['B'] .agg ({'mean':np.mean,' std': np.std}) Out [16]: std meanA a 1.00000 2.000000b 2.12132 5.500000c 3.21455 4.333333

2. After grouping according to a column, different aggregation methods are used for different columns:

Df.groupby ('A'). Agg ({'count': [np.mean,'sum'],' count': ['count'] Np.std]}) # [] corresponds to two methods Out [17]: C B count std mean sumA a 3 4.509250 2.000000 6b 2 6.363961 5.500000 11c 3 34.394767 4.333333 13

Transform:

The results of the first two methods are indexed by column A values. What should I do if I use index that is not grouped by groupby? The transform function will be used at this point. The transform (func, args, * kwargs) method simplifies the process by applying the func parameter to all groups, and then placing the result on the index of the original array:

DfOut [31]: a B C 0 a 2 1001 b 7 872 a 1 963 c 3 1304 a 3 1055 c 2 876 b 4 967 c 8 155 df.groupby ('A') ['Bauer count'']. Transform ('count') # Note: the count function does not calculate the nanvalue Out [32]: B C0 3 31 2 22 3 33 3 34 3 35 3 36 2 33 33

It can be seen from this: grouping according to column A, when counting columns B and C, the index of B for a has [0meme2je 4], so the value of index [0quotient 2p4] in the result column is all 3, which is equivalent to broadcasting. For column C, the same is true.

At this point, the study on "how to use the grouping function groupby and grouping operation function agg in python" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.