Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the common skills of Pandas

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly shows you "what are the common skills of Pandas", the content is simple and clear, and I hope it can help you solve your doubts. Let the editor lead you to study and learn this article "what are the common skills of Pandas?"

1. Calculate the missing rate of variables

Df=pd.read_csv ('titanic_train.csv') def missing_cal (df): "df: dataset return: missing rate of each variable"missing_series = df.isnull (). Sum () / df.shape [0] missing_df = pd.DataFrame (missing_series) .reset_index () missing_df = missing_df.rename (columns= {' index':'col') Missing_df = missing_df.sort_values ('missing_pct',ascending=False) .reset_index (drop=True) return missing_df missing_cal (df)

If you need to calculate the missing rate distribution of the sample, just add the parameter axis=1.

two。 The method of getting the row where the maximum value in the group lies

It is divided into two groups: repeated values and non-repetitive values.

The case where there is no duplicate value.

Df = pd.DataFrame ({'Sp': [' axiaoyunzhuo]], 'Mt': [' s 1','s 1','s 2'], 'Value': [1, 2, 3, 4, 5, 6],' Count': [3, 2, 5, 10, 10]}) dfdf.iloc [df.groupby (['Mt'])) .apply (lambda x: X [' Count']. Idxmax ()]]

First group according to the Mt column, then use the idxmax function to fetch the column where the maximum value of Count is located in the data box after the grouping, and then use the iloc location index to fetch the row.

The case of having duplicate values.

Df ["rank"] = df.groupby ("ID") ["score"] .rank (method= "min", ascending=False) .astype (np.int64) df [df ["rank"] = = 1] [["ID", "class"]]

After grouping the ID, the rank function is applied to the score. If the score is the same, the same ranking will be given, and then the data with rank 1 will be taken out.

3. Multiple columns are merged into one row

Df = pd.DataFrame ({'id_part':],' pred': [0.1, 0.2, 0.4], 'pred_class':,' women','man','cat','dog': ['d1, women','man','cat','dog', d3, d1]})

Df.groupby (['id_part':']) .agg ({'pred_class': [', '.join],' pred': lambda x: list (x), 'id_part':' first'}) .reset_index ()

4. Delete the line that contains a specific string

Df = pd.DataFrame ({'aqiang: [1Jing 2pas 3je 4], 'baked: [' s 1pm, 'exp_s2',' s 3pcDHI: [5pr 6pr 7pr 8], 'daddy: [3pr 2pr 5je 10]}) df [df [' b'] .str.coach ('exp')]

5. Intra-group sorting

Df = pd.DataFrame ([['Awardre 1], [' Amalagem 3], [''Achilles recorder 2], [' Barrie dagger 5], ['Barrie dagger 9]], columns = [' name','score']) df

This paper introduces two methods of efficient intra-group sorting.

Df.sort_values (['name','score'], ascending= [True,False]) df.groupby (' name') .apply (lambda x: x.sort_values ('score', ascending=False)) .reset_index (drop=True)

6. Select a specific type of column

Drinks = pd.read_csv ('data/drinks.csv') # Select all numeric columns drinks.select_dtypes (include= [' number']). Head () # Select all character columns drinks.select_dtypes (include= ['object']) .head () drinks.select_dtypes (include= [' number','object','category']) 'datetime']) .head () # excludes the specified data type drinks.select_dtypes (exclude= [' number']) .head () with the exclude keyword

7. Convert a string to a numeric value

Df = pd.DataFrame ({'column 1': ['1 'column 1'], 'column 2'], 'column 2: [' 4\ ^ 4\ {5\ ^ 5\]], 'column 3: [' 7\ ^ 7\]]) df df.astype ({'column 1: 1, column 1: 2, dtypes: 2].

Converting the third column in this way makes an error because the column contains an underscore for 0, which pandas cannot automatically determine. To solve this problem, you can use the to_numeric () function to process the third column and have pandas convert any invalid input to NaN.

Df = df.apply (pd.to_numeric, errors='coerce'). Fillna (0) df

8. Optimize DataFrame memory footprint

Method 1: read only the columns you really need, using the usecols parameter

Cols= ['beer_servings','continent'] small_drinks = pd.read_csv (' data/drinks.csv', usecols=cols)

Method 2: convert the object column containing type data to the Category data type, which is achieved by specifying the dtype parameter.

Dtypes = {'continent':'category'} smaller_drinks = pd.read_csv (' data/drinks.csv',usecols=cols, dtype=dtypes)

9. Filter DataFrame based on the largest category

Movies = pd.read_csv ('data/imdb_1000.csv') counts = movies.genre.value_counts () movies [movies.genre.isin (counts.nlargest (3) .index)] .head ()

10. Divide a string into multiple columns

Df = pd.DataFrame ({'name': ['Zhang San','Li Si', 'Wang Wu'], 'location': [Beijing-Dongcheng District', 'Shanghai-Huangpu District', 'Guangzhou-Baiyun District']}) df df. Name .str.split ('', expand=True)

11. Convert the list in Series to DataFrame

Df = pd.DataFrame ({'column 1x: [' axiomagemagem]], 'column 2p: [[10lemen 20], [20je 30], [30je 40]]}) dfdf_new = df. Column 2.apply (pd.Series) pd.concat ([df,df_new], axis='columns')

twelve。 Aggregate with multiple functions

Orders = pd.read_csv ('data/chipotle.tsv', sep='\ t') orders.groupby (' order_id'). Item_price.agg (['sum','count']). Head ()

13. Grouping polymerization

Import pandas as pd df = pd.DataFrame ({'key1': [' one', 'two',' one', 'two',' one'], 'data1':np.random.randn (5),' data2':np.random.randn (5)}) dffor name Group in df.groupby ('key1'): print (name) print (group) dict (list (df.groupby (' key1')

Group by dictionary or Series

People = pd.DataFrame (np.random.randn (5,5), columns= ['averse,' baked, 'crested,' dashed,'e'], index= ['Joe',' Steve', 'Wes',' Jim', 'Travis']) mapping = {' a'axiajuana, 'breadfruit, bluetooth,' cantilevered bluetooth, 'dumped bluetooth,' estranged columns= By_column = people.groupby (mapping, axis=1) by_column.sum ()

The above is all the content of this article "what are the Common skills of Pandas?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report