Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the general operations of pandas

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the routine operation of pandas, which has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, let the editor take you to understand it.

An aggregate function 1. The statistical method used by numpy and pandas data [['counts',' ches_name']] .agg ([np.mean, np.std]) agg ({'xx':np.mean,' xx2': [np.sum, np.std]}) 2. There are no off-the-shelf functions available in pandas or numpy, so you can use transform custom functions

For example, all the data of the specified column * 2

Method 1: data ['counts']. Transform (lambda x: Xerox) method 2: operate the specified data according to the established rules in the function def transform_func (values): "custom function Define data manipulation rules "" return values*2data ['counts'] .transform (transform_func) # one-dimensional data1 = data.groupby (by=' brand') ['sales'] .transform (tran_func) # after grouping custom aggregations

Recommended good course: Python Automation Office

2 PivotTable-pivot_table source code parameter analysis def pivot_table (data, # Dataframe, which table to operate on values=None, # display field index=None, # row grouping key, can be array, list, if array, must have the same length columns=None, # column grouping key aggfunc= "mean", # aggregate function The default is mean fill_value=None, # fill the null value, fill the value of Nan to the corresponding value margins=False, # summary switch, default is False dropna=True, margins_name= "All", # summary column or row bolumns, you can specify to modify the name observed=False,1, index: row grouping key, after grouping The value of the grouping key is at the location of the row index pd.pivot_table (data, index= ['order_id',' dishes_name'], aggfunc= [np.mean, np.sum], values= ['add_inprice'] 'counts']) mean sum add_inprice counts add_inprice countsorder_id dishes_name 137Nongfu Spring NFC Juice 100% 010 1 cold spinach 0 10 1 Braised Beef Brisket with Tomato\ r\ n 0 10 1 plain rice / small bowl 0 40 4 watermelon and carrot salad 0 101. .. 1323 tomato stewed okra 0 1 0 1 cheese stewed Boston lobster 0 1 0 1 celery yellow eel 0 1 0 1 Minced garlic Oyster 0 1 01 Rice Xiaozhuang 0 1 1 [2778 rows x 4 columns] 2, Columns: column grouping key After grouping, the value of the grouping key is on the column index pd.pivot_table (data, columns= ['order_id',' amounts'], aggfunc= [np.mean, np.sum], values= ['add_inprice',' counts']) # column grouping key, which can be said to be the transposed mean of the row grouping key. Sum order_id 137 165... 1323 amounts 16 26 27 35 99 9... 39 49 58 65 78 80 175add_inprice 0.0 0.0 0.0... 0 0 0 0counts 4.0 1.0 1.0 1.0 1.5... 1 1 1 [2 rows x 4956 columns] 3 、 In combination with the # aggfunc aggregate function # fill_value is empty How to display, default is Nan# margins summary, default is not summary # margins_name summary column or row field name, default is allpd.pivot_table (data, index= ['dishes_name'], columns='order_id', values='counts', aggfunc=np.sum, fill_value=0, margins=True, margins_name=' total') dishes_name 42 degrees sea blue Arctic Ocean soda 38 degrees Jiannanchun 50 degrees Gujing tribute wine. Butter cookies, cauliflower fried fungus, black rice fall in love with grape total order_id. 137000000. 0 9165 0 0 10... 01 0 21166 0 0 0... 0 0 0 7171 0 0 0... 0 0 0 10177 0 0 0 ... 0 0 0 4... ... 1314 0 0 1 0... 0 0 0 121317 0 0 0... 0 0 0 181319 91323. 0 000 0 0. 0 0 15 total 5 45 6 5. 5 15 18 3088

Recommended course: Python Automation Management

Triple crosstab-crosstabdef crosstab (index, # row grouping key columns, # column grouping key values=None, # displayed fields rownames=None, # row name colnames=None, # column name aggfunc=None, # aggregate function margins=False, # summary margins_name: str = "All", # summary column or row name dropna: bool = True, normalize=False Basic grammar pd.crosstab (index = data ['dishes_name'], columns=data [' order_id'], values=data ['counts'], aggfunc = np.sum) dishes_name 42 degrees Sea Blue Arctic Ocean soda 38 degrees Jiannanchun. Butter cookies, cauliflower fried fungus and black rice fall in love with grape order_id. 137 NaN. NaN NaN NaN165 NaN NaN 1.0... NaN 1.0 NaN166 NaN... NaN NaN NaN171 NaN... NaN NaN NaN177 NaN... NaN NaN NaN... ...... 1309 NaN... NaN NaN NaN1314 NaN NaN 1.0... NaN NaN NaN1317 NaN... NaN NaN NaN1319 NaN... NaN NaN NaN1323 NaN NaN 1.0... NaN [278 rows x 156 columns] four tables merge 1, each table has the same column, pd.concat ((df1, df2, df3... ))

Axis = 0: vertical merge axis = 1: horizontal merge, index corresponding to merge

Function source code def concat (objs: Union [Iterable ["NDFrame"], Mapping [Label, "NDFrame"]), # passed in Df format axis=0, # direction for merging join= "outer", # default external connection ignore_index: bool = False, # reset sort index keys=None, levels=None, names=None, verify_integrity: bool = False, sort: bool = False Copy: bool = True,left = pd.DataFrame ({'key1': [' K0,'K0,'K1,'K3], 'key2': [' K0, K1, K0, K1],'A3: ['A0, A1, A2, A3] Right = pd.DataFrame ({'key1': [K0, K1, K0, K0], key2': [K0, K0]) 'pd.concat: [' C _ 0','C _ 1','C _ 2,'C _ 3'],'D': ['D0','D _ 1','D2','D3']}) pd.concat ((left, right), axis = 0, join = 'inner') # specifies to use an internal connection For merging, outer pd.concat ((left, right), axis = 1, join = 'inner') 2 and table merging are used by default to solve the problem of row mismatch when the row index is meaningless (to solve the problem of horizontal stitching of concat) def merge (left, # left table right, # right table how: str = "inner", # internal join by default On=None, # must be a common primary key in both tables to serve as the primary key left_on=None, # left table primary key right_on=None, # right table primary key left_index: bool = False, right_index: bool = False, sort: bool = False, suffixes= ("_ x", "_ y"), copy: bool = True, indicator: bool = False, validate=None

(1) there is the same primary key in both tables

The primary key of the on connection. The common primary key in the two tables is the how join. By default, the inner join outer outer join is used. Return all connections within inner return equivalent connection left with left table as main right and right table as main pd.merge (left, right, on='key1' How='outer') key1 key2_x A B key2_y C D0 K0 K0 A0 B0 B0 K0C0D01 K0K1A1 B1 K0C0C0D02 K0A2 B2 K0C1 D13 K0A2 B2 K0C2 D24 K1A3 B3 NaN NaN NaN5 K2 NaN K0C3D3 connect multiple identical primary keys to pd.merge (left, right) On= ['key1',' key2'], how='outer') key1 key2 A B C D0 K0 K0 A0 B0 B0 C0 D01 K0 A1 B1 NaN NaN2 K1 A2B2 C1 D13 K1 K0A2 B2 C2 D24 K3 K1 A3 B3 NaN NaN5 K2 K0 NaN NaN C3 D3

(2) there is no same primary key in the two tables.

Left_on: specify the primary key right_on in the left table: specify the primary key pd.merge in the right table (left, right, left_on = 'key1', right_on =' key2' How='outer') key1_x key2_x A B key1_y key2_y C D0K0K0A0B0K0K0C0D01 K0K0A0B0K1K0C1 D12 K0A0B0K0K0K0C2 D23 K0K0A0B0K2K0C3 D34 K0K1A1 K0K0C0D05 K0 K1 A1 B1 K1 K0 C1 D16 K0 K1 A1 B1 K1 K0 C2 D27 K0 K1 A1 B1 K2 K0 C3 D38 K1 K0 A2 B2 NaN NaN9 K3 K1 A3 B3 NaN

(3) the method of changing the name of the form

Left.rename (columns= {'key1':' key11111'}, inplace=True) print (left) key11111 key2 A B

(4) overlap merging, which aims to merge incomplete tables into a complete table df1.combine _ first (df2).

The main table. Combine _ first (schedule) dict1 = {'ID': [1, 2, 3, 4, 5, 5, 7, 8, 9],' System': ['W10, w10, w7, w8]} dict2 = {' ID':, 'System': [np.nan,np.nan,'w7','w7','w7','w7','w8',np.nan] Np.nan]} df1 = pd.DataFrame (dict1) df2 = pd.DataFrame (dict2) print (df1,df2) # who comes first Print (df1.combine_first (df2)) ID System 0 1W10 1 2w10 2 3w7 3 4 w10 4 5 w7 5 6 7 6 7 7 8 w7 8 9 w8

Thank you for reading this article carefully. I hope the article "what is the routine operation of pandas" shared by the editor will be helpful to you. At the same time, I also hope that you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report