In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly shows you "what are the common skills of Pandas", the content is simple and clear, and I hope it can help you solve your doubts. Let the editor lead you to study and learn this article "what are the common skills of Pandas?"
1. Calculate the missing rate of variables
Df=pd.read_csv ('titanic_train.csv') def missing_cal (df): "df: dataset return: missing rate of each variable"missing_series = df.isnull (). Sum () / df.shape [0] missing_df = pd.DataFrame (missing_series) .reset_index () missing_df = missing_df.rename (columns= {' index':'col') Missing_df = missing_df.sort_values ('missing_pct',ascending=False) .reset_index (drop=True) return missing_df missing_cal (df)
If you need to calculate the missing rate distribution of the sample, just add the parameter axis=1.
two。 The method of getting the row where the maximum value in the group lies
It is divided into two groups: repeated values and non-repetitive values.
The case where there is no duplicate value.
Df = pd.DataFrame ({'Sp': [' axiaoyunzhuo]], 'Mt': [' s 1','s 1','s 2'], 'Value': [1, 2, 3, 4, 5, 6],' Count': [3, 2, 5, 10, 10]}) dfdf.iloc [df.groupby (['Mt'])) .apply (lambda x: X [' Count']. Idxmax ()]]
First group according to the Mt column, then use the idxmax function to fetch the column where the maximum value of Count is located in the data box after the grouping, and then use the iloc location index to fetch the row.
The case of having duplicate values.
Df ["rank"] = df.groupby ("ID") ["score"] .rank (method= "min", ascending=False) .astype (np.int64) df [df ["rank"] = = 1] [["ID", "class"]]
After grouping the ID, the rank function is applied to the score. If the score is the same, the same ranking will be given, and then the data with rank 1 will be taken out.
3. Multiple columns are merged into one row
Df = pd.DataFrame ({'id_part':],' pred': [0.1, 0.2, 0.4], 'pred_class':,' women','man','cat','dog': ['d1, women','man','cat','dog', d3, d1]})
Df.groupby (['id_part':']) .agg ({'pred_class': [', '.join],' pred': lambda x: list (x), 'id_part':' first'}) .reset_index ()
4. Delete the line that contains a specific string
Df = pd.DataFrame ({'aqiang: [1Jing 2pas 3je 4], 'baked: [' s 1pm, 'exp_s2',' s 3pcDHI: [5pr 6pr 7pr 8], 'daddy: [3pr 2pr 5je 10]}) df [df [' b'] .str.coach ('exp')]
5. Intra-group sorting
Df = pd.DataFrame ([['Awardre 1], [' Amalagem 3], [''Achilles recorder 2], [' Barrie dagger 5], ['Barrie dagger 9]], columns = [' name','score']) df
This paper introduces two methods of efficient intra-group sorting.
Df.sort_values (['name','score'], ascending= [True,False]) df.groupby (' name') .apply (lambda x: x.sort_values ('score', ascending=False)) .reset_index (drop=True)
6. Select a specific type of column
Drinks = pd.read_csv ('data/drinks.csv') # Select all numeric columns drinks.select_dtypes (include= [' number']). Head () # Select all character columns drinks.select_dtypes (include= ['object']) .head () drinks.select_dtypes (include= [' number','object','category']) 'datetime']) .head () # excludes the specified data type drinks.select_dtypes (exclude= [' number']) .head () with the exclude keyword
7. Convert a string to a numeric value
Df = pd.DataFrame ({'column 1': ['1 'column 1'], 'column 2'], 'column 2: [' 4\ ^ 4\ {5\ ^ 5\]], 'column 3: [' 7\ ^ 7\]]) df df.astype ({'column 1: 1, column 1: 2, dtypes: 2].
Converting the third column in this way makes an error because the column contains an underscore for 0, which pandas cannot automatically determine. To solve this problem, you can use the to_numeric () function to process the third column and have pandas convert any invalid input to NaN.
Df = df.apply (pd.to_numeric, errors='coerce'). Fillna (0) df
8. Optimize DataFrame memory footprint
Method 1: read only the columns you really need, using the usecols parameter
Cols= ['beer_servings','continent'] small_drinks = pd.read_csv (' data/drinks.csv', usecols=cols)
Method 2: convert the object column containing type data to the Category data type, which is achieved by specifying the dtype parameter.
Dtypes = {'continent':'category'} smaller_drinks = pd.read_csv (' data/drinks.csv',usecols=cols, dtype=dtypes)
9. Filter DataFrame based on the largest category
Movies = pd.read_csv ('data/imdb_1000.csv') counts = movies.genre.value_counts () movies [movies.genre.isin (counts.nlargest (3) .index)] .head ()
10. Divide a string into multiple columns
Df = pd.DataFrame ({'name': ['Zhang San','Li Si', 'Wang Wu'], 'location': [Beijing-Dongcheng District', 'Shanghai-Huangpu District', 'Guangzhou-Baiyun District']}) df df. Name .str.split ('', expand=True)
11. Convert the list in Series to DataFrame
Df = pd.DataFrame ({'column 1x: [' axiomagemagem]], 'column 2p: [[10lemen 20], [20je 30], [30je 40]]}) dfdf_new = df. Column 2.apply (pd.Series) pd.concat ([df,df_new], axis='columns')
twelve。 Aggregate with multiple functions
Orders = pd.read_csv ('data/chipotle.tsv', sep='\ t') orders.groupby (' order_id'). Item_price.agg (['sum','count']). Head ()
13. Grouping polymerization
Import pandas as pd df = pd.DataFrame ({'key1': [' one', 'two',' one', 'two',' one'], 'data1':np.random.randn (5),' data2':np.random.randn (5)}) dffor name Group in df.groupby ('key1'): print (name) print (group) dict (list (df.groupby (' key1')
Group by dictionary or Series
People = pd.DataFrame (np.random.randn (5,5), columns= ['averse,' baked, 'crested,' dashed,'e'], index= ['Joe',' Steve', 'Wes',' Jim', 'Travis']) mapping = {' a'axiajuana, 'breadfruit, bluetooth,' cantilevered bluetooth, 'dumped bluetooth,' estranged columns= By_column = people.groupby (mapping, axis=1) by_column.sum ()
The above is all the content of this article "what are the Common skills of Pandas?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.