How to use groupby grouping in Pandas 07/06 Update SLTechnology News&Howtos

How to use groupby grouping in Pandas

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces how to use groupby grouping in Pandas, which has a certain reference value, and interested friends can refer to it. I hope you will gain a lot after reading this article.

Groupby grouping import pandas as pdimport numpy as npdf=pd.DataFrame ({'A': ['foo',' bar', 'foo',' foo'],'B': ['one',' one', 'two',' three', 'two',' two' 'one',' three'],'C': np.random.randn (8),'D': np.random.randn (8)) print (df) grouped=df.groupby ('A') print ('-'* 30) print (grouped.count ()) print ('-'* 30) grouped=df.groupby (['A'') 'B']) print (grouped.count ()) print (' -'* 30) # grouped by function def get_letter_type (letter): if letter.lower () in 'aeiou': return'a' else: return 'b'grouped=df.groupby (get_letter_type Axis=1) print (grouped.count ()) A B C D0 foo one 1.429387 0.6435691 bar one-0.858448-0.2130342 foo two 0.375644 0.2145843 bar three 0.042284-0.3304814 foo two-1.421967 0.7681765 bar two 1.293483-0.3990036 foo one-1.101385-0.2363417 foo three-0.852603-1.718694- -B C DA bar 3 3 3foo 5 55-C DA B bar one 1 1 three 1 1 two 1 1foo one 2 2 three 1 1 two 2 2- -a b 0 1 31 1 32 1 33 1 34 1 35 1 36 1 37 1 3se=pd.Series 3) print (se) se.groupby (level=0) 6 19 28 39 48 5dtype: int64# grouping summation grouped=se.groupby (level=0). Sum () print (grouped) 6 18 89 6dtype: int64df2=pd.DataFrame ({'Xray: [' Acho dagger: ['Achuo dagger]],' Yee: [1pint 2jue 3]}) print (df2) X Y0A 11 B 22 A 33 B) And query the data of column A grp=df2.groupby ('X'). Get_group ('A') print (grp) X Y0A12A 3Pandas multilevel index arrays = [['bar',' bar', 'baz',' baz', 'foo',' foo', 'qux',' qux'], ['one',' two', 'one' 'two']] index=pd.MultiIndex.from_arrays (arrays,names= [' first','second']) print (index) MultiIndex ([('bar',' one'), ('bar',' two'), ('baz',' one'), ('baz',' two'), ('foo',' one'), ('foo',' two') ('qux',' one'), ('qux',' two')], names= ['first',' second']) s=pd.Series (np.random.randn (8) Index=index) print (s) first secondbar one 0.120979 two-0.440384baz one 0.515106 two-0.019882foo one 1.149595 two-0.369984qux one-0.930438 two 0.146044dtype: float64# packet Sum grouped=s.groupby (level='first') print (grouped.sum ()) firstbar-0.319405baz 0.495224foo 0.779611qux-0.784394dtype: float64grouped=df.groupby (['A'' ) print (grouped.size ()) A B bar one 1 three 1 two 1foo one 2 three 1 two 2dtype: int64print (df) A B C D0 foo one 1.429387 0.6435691 bar one-0.858448-0.2130342 foo two 0.375644 0.2145843 bar three 0.042284-0.3304814 foo two-1.421967 0.7681765 bar two 1.293483-0.3990036 foo one-1.101385-0.2363417 foo three-0.852603-1.718694print (grouped.describe (). Head ()) C\ count mean std min 25% 75% A B Bar one 1.0-0.858448 NaN-0.858448-0.858448-0.858448-0.858448 three 1.0 0.042284 NaN 0.042284 0.042284 0.042284 two 1.01.293483 NaN 1.293483 1.293483 1.293483 foo one 0.164001 1.789526-1.101385-0.468692 0.164001 0.796694 three 1.0-0.852603 NaN-0.852603-0.852603-0.852603-0.852603 D\ max count mean std min 25% 50% A B bar one-0.858448 1.0-0.213034 NaN-0.213034-0.213034-0.213034 three 0.042284 1.0-0.330481 NaN-0.330481-0.330481-0.330481 two 1.293483 NaN-0.399003-0.399003-0.399003 foo one 1.429387 2.0 0.203614 0.622191-0.236341-0.016364 0.203614 three-0.852603 1.0-1.718694 NaN-1.718694-1.718694-1.718694

75% max

A B

Bar one-0.213034-0.213034

Three-0.330481-0.330481

Two-0.399003-0.399003

Foo one 0.423592 0.643569

Three-1.718694-1.718694

Grouped=df.groupby ('A') grouped ['C'] .agg ([np.sum,np.mean,np.std])

.dataframe tbody tr th:only-of-type {vertical-align: middle;}

.dataframe tbody tr th {vertical-align: top;}. Dataframe thead th {text-align: right;}

Sum mean std A bar 0.477319 0.159106 1.080712 foo-1.570925-0.314185 1.188767

String manipulation import pandas as pdimport numpy as nps=pd.Series (['Achievement recording, baking, recording, caging, writing, etc.) Np.nan]) print (s) # to lowercase print (s.str.lower ()) # to uppercase print (s.str.upper ()) # length of each character print (s.str.len ()) 0 A1 b2 c3 D4 NaNdtype: object0 A1 b2 c3 D4 NaNdtype: object0 A1 B2 C3 D4 NaNdtype: object0 1.01 1.02 1.03 1.04 NaNdtype: float64index=pd.Index (['Index' 'ru',' men']) # remove spaces print (index.str.strip ()) # remove spaces on the left print (index.str.lstrip ()) # remove spaces on the right print (index.str.rstrip ()) Index (['Index',' ru', 'men'], dtype='object') Index ([' Index', 'ru',' men'], dtype='object') Index (['Index',' ru', 'men'] Dtype='object') df=pd.DataFrame (np.random.randn (3jing2), columns= ['An axiom dagger BB'], index=range (3) print (df) An a B b0 3.005273 0.4866961 1.093889 1.0542302-2.846352 0.30246 replacement of print (df.columns.str.replace ('','_') Index (['Aggafia,' Blockb'], dtype='object') s=pd.Series (['astatbC') Print (s) 0 a_b_C1 c_d_e2 f_g_hdtype: objectprint (s.str.split ('_')) 0 [a, b, C] 1 [c, d, e] 2 [f, g, h] dtype: objectprint (s.str.split ('_', expand=True,n=1)) 0 10 a b_C1 c d_e2 f g_hs = pd.Series (['A'') 'rumen','ru','rumen','xiao','zhan']) print (s.str.contains (' ru')) 0 False1 True2 True3 True4 False5 Falsedtype: bools=pd.Series (['axiajiajia | bangzhonga | c']) print (s) 0 A1 a | b2 a | cdtype: objectprint (s.str.get_dummies (sep=' |)) a b c0 1 0 01 1 02 1 01 index s=pd.Series (np.arange (5)) Np.arange (5) [::-1], dtype='int64') s4 03 12 21 30 4dtype: int64print (s > 2]) 1 30 4dtype: int64# isin query index is in a certain range print (s.isin ([1jorm 3])) 4 False3 True2 False1 True0 Truedtype: bool# queries data according to the index print (s [s.isin ([1jue 3]) (4])) 3 11 30 4dtype: int64# constructs a joint index of data s=pd.Series (np.arange (6), index=pd.MultiIndex.from_product ([[1mai 2], ['axie dagger])) print (s) 1 a 0 b 1 c 22 a 3 b 4 c 5dtype: int64print (s.iloc [s.index.isin ([(1) remark b'), (2) ]) 1 b 12 c 5dtype: int64# constructs a time series dates=pd.date_range ('20200920) print (dates) DatetimeIndex ([' 2020-09-20, '2020-09-21,' 2020-09-22, '2020-09-23,' 2020-09-24, '2020-09-25,' 2020-09-26, '2020-09-27'] Dtype='datetime64 [ns]', freq='D') df=pd.DataFrame (np.random.randn (8, 4), index=dates,columns= ['Achilles recalcitrance' ) print (df) A B C D2020-09-20-1.218522 2.067088 0.015009 0.1587802020-09-21-0.546837-0.601178-0.894882 0.1720372020-09-22 0.189848-0.910520 0.196186-0.0734952020-09-23-0.566892 0.899193-0.450925 0.6332532020-09-24 0.038838 1.577004 0.580927 0 . 6090502020-09-25 1.562094 0.020813-0.618859-0.5152122020-09-26-1.333947 0.275765 0.139325 1.1242072020-09-27-1.271748 1.082302 1.036805-1.04120 query column A data print (df ['A']) 2020-09-20-1.2185222020-09-21-0.5468372020-09-22 0.1898482020-09-23-0.5668922020-24 0.0388382020-09-25 1.5620942020-09-26-1.3339472020-09-27-1.271748Freq: d Name: a, dtype: float64# queries numbers less than 0, and values greater than 0 are set to NaNdf.where by default (data print of dfb > a (df.query ('(c))

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.