Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Basic usage of pandas

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "the basic usage of pandas". Many people will encounter such a dilemma in the operation of actual cases, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Create pandas1. Use the pd.Series () function to create DataFrameimport pandas as pdcountry1 = pd.Series ({'Name':' China', 'Language':' Chinese', 'Area':' 9.597M km2', 'Happiness Rank': 79}) country2 = pd.Series ({' Name': 'USA') 'Language':' English (US)', 'Area':' 9.834M km2', 'Happiness Rank': 14}) country3 = pd.Series ({' Name': 'Australia', 'Language':' English (AU)', 'Area':' 7.692m km2' 'Happiness Rank': 9}) df = pd.DataFrame ([country1, country2, country3], index= [' CH', 'US',' AU'])

two。 Use DataFrame () and dictionary to create df = pd.DataFrame (columns= ["epoch", "train_loss", "train_auc", "test_loss", "test_auc"]) log_dic = {"epoch": 1, "train_loss": 0.2, "train_auc": 1, "test_loss": 0 "test_auc": 0} df = df.append ([log_dic]) log_dic = {"epoch": 2, "train_loss": 0.2, "train_auc": 1, "test_loss": 0 "test_auc": 0} df = df.append ([log_dic]) # renumber index # inplace=True means to modify # drop=True on the original data to indicate the indexdf.reset_index (inplace=True, drop=True) before discarding

3. Using pd.concat to combine multiple DataFrame () df1 = pd.DataFrame (columns= ["epoch", "train_loss", "train_auc", "test_loss", "test_auc"]) log_dic = {"epoch": 1, "train_loss": 0.2, "train_auc": 1, "test_loss": 0 "test_auc": 0} df1 = df1.append ([log_dic]) df2 = pd.DataFrame (columns= ["epoch", "train_loss", "train_auc", "test_loss", "test_auc"]) log_dic = {"epoch": 2, "train_loss": 0.1, "train_auc": 1, "test_loss": 0 "test_auc": 1} df2 = df2.append ([log_dic]) # ignore_index=True means renumber index df_new = pd.concat ([df1, df2], axis=0, ignore_index=True)

File operation 1. Generate a file columns = ["epoch", "train_loss", "train_auc", "test_loss", "test_auc"] df_ [header] .to _ csv ('text.txt', index=False, header=columns, sep='\ t') with columns

two。 Generate a file df_ new [header] .to _ csv ('text.txt', index=False, header=None, sep='\ t') that does not contain columns

3. Read the file df = pd.read_csv ('text.txt', sep='\ tasking, header=None, nrows=100) df.columns = ["epoch", "train_loss", "train_auc", "test_loss", "test_auc"]

4. To read a file containing columns # you need to use the header parameter to specify the line in which the columns is on, usually line 0 df = pd.read_csv ('text.txt', sep='\ tasking, header= [0]) # specify a specific columns read reprot_2016_df = pd.read_csv (' 2016.csvault, index_col='Country', usecols= ['Country',' Happiness Rank' 'Happiness Score',' Region'])

5. Generate a binary pickle file df = pd.DataFrame (columns= ["epoch", "train_loss", "train_auc", "test_loss", "test_auc"]) log_dic = {"epoch": 2, "train_loss": 0.1, "train_auc": 1, "test_loss": 23 "test_auc": 1} df = df.append ([log_dic]) df.to_pickle ('df_log.pickle')

6. Load pickle Fil

Df = pd.read_pickle ('df_log.pickle') index

Use the data in the following figure as an example

1. Row index df.loc ['CH'] # Series type

Df.loc ['CH']. Index # Index ([' Name', 'Language',' Area', 'Happiness Rank'], dtype='object') df.loc [' CH'] ['Name'] #' China 'df.loc [' CH']. To_numpy () # array (['China', 'Chinese',' 9.597M km2', 79], dtype=object) df.iloc [1] # second row of the index

Df.loc [['CH',' US']] df.iloc [[0,1]]

two。 Column index df ['Area'] # type: Seriesdf [[' Name', 'Area']] # type: DataFrame

3. Mixed index print ('fetch column first, then row:') print (df ['Area'] [' CH']) print (df ['Area'] .loc [' CH']) print (df ['Area'] .iloc [0]) print (' fetch row first, then fetch column:') print (df.loc ['CH'] [' Area']) print (df.iloc [0] ['Area']) print (df.at [' CH', 'Area'])

Delete data df.drop (['CH'], inplace=True) # Delete row inplace=True means modify df.drop ([' Area'], axis=1, inplace=True) # delete column, you need to specify operation 1 of axis=1Nan data. Delete operation

Use the following data

Import numpy as npdf = pd.DataFrame ({"name": ['Alfred',' Batman', 'Catwoman'], "toy": [np.nan,' Batmobile', 'Bullwhip'], "born": [pd.NaT, pd.Timestamp ("1940-04-25"), pd.NaT]})

"" axis: 0: row operation (default) 1: column operation how: any: delete whenever there is a null value (default) all: delete inplace: False: return a new dataset (default) True: operate on the desired dataset "" df.dropna (axis=0, how='any', inplace=True)

Df.dropna (axis=0, how='any', subset= ['toy'], inplace=False) # subset specifies the nan2 that operates on a specific column. Filling operation

Use the following data

Df = pd.DataFrame ([[np.nan, 2, np.nan, 0], [3,4, np.nan, 1], [np.nan, 5], [np.nan, 3, np.nan, 4]], columns=list ('ABCD')) df.fillna (0, inplace=True)

# "replace the missing value horizontally with the value preceding the missing value" df.fillna (axis=1, method='ffill', inplace=False)

# "replace the missing value vertically with the value above the missing value" df.fillna (axis=0, method='bfill', inplace=False)

Df ['A'] .fillna (0, inplace=True) # specifies a specific column fill

3. Judge operation df.isnull () df ['A'] .isna ()

Merger operation import pandas as pdstaff_df = pd.DataFrame ({'name': 'Zhang San', 'department': 'research and development department'}, {'name':'Li Si', 'department': 'finance department'}, {'name': 'Zhao Liu' 'Department': 'Marketing Department'}]) student_df = pd.DataFrame ([{'name': 'Zhang San', 'major': 'computer'}, {'name':'Li Si', 'major': 'accountant'}, {'name': 'Wang Wu', 'specialty:' marketing'}]) 1. Merge according to column benchmark

Inner (intersection) outer (union) left right

Pd.merge (staff_df, student_df, how='inner', on=' name') pd.merge (staff_df, student_df, how='outer', on=' name')

two。 Merge # to index staff_df.set_index ('name', inplace=True) student_df.set_index ('name', inplace=True) pd.merge (staff_df, student_df, how='left', left_index=True, right_index=True)

3. Specify a different column benchmark # reset index to range () staff_df.reset_index (inplace=True) student_df.reset_index (inplace=True) staff_df.rename (columns= {'name': 'employee name'}, inplace=True) student_df.rename (columns= {'name': 'student name'}, inplace=True) pd.merge (staff_df, student_df, how='left', left_on=' employee name', right_on=' student name') pd.merge (staff_df Student_df, how='inner', left_on= ['employee name', 'address'], right_on= ['student name', 'address']) other operations report_data = pd.read_csv ('. / 2015.csv') report_data.head () basic attribute operation data.head () data.info () data.describe () data.columnsdata.index rename columnsdf.rename (columns= {'Region':' region', 'Happiness Rank':' ranking' 'Happiness Score':' Happiness Index'}, inplace=True) data cleaning operation # null is replaced with 0df.fillna (0 Inplace=False) # discard nulldf.dropna () # forward fill df.ffill () # backward fill df.bfill (inplace=True) apply operation # apply use # get last name staff_df ['employee name'] .apply (lambda x: X [0]) # get the name staff_df ['employee name'] .apply (lambda x: X [1:]) # result merge staff_df.loc [: 'Last name'] = staff_df ['employee name'] .apply (lambda x: X [0]) staff_df.loc [:, 'first name'] = staff_df ['employee name'] .apply (lambda x: X [1:]) grouping

Grouping according to columns

Grouped = report_data.groupby ('Region') grouped [' Happiness Score'] .mean () # iterative groupby object for group, frame in grouped: mean_score = frame ['Happiness Score'] .mean () max_score = frame [' Happiness Score'] .max () min_score = frame ['Happiness Score'] .min () print (average happiness index of' {} region: {}, highest happiness index: {} Minimum happiness index {} '.format (group, mean_score, max_score, min_score)

Define function grouping

Report_data2 = report_data.set_index ('Happiness Rank') def get_rank_group (rank): rank_group =''if rank 2'if the custom function The operation is aimed at indexreport_data ['score group'] = report_data [' Happiness Score'] .apply (lambda score: int (score)) grouped = report_data.groupby ('score group') for group, frame in grouped: print ({}' .format (group, len (frame)

Use a bar chart of type bar to count the number of each label.

This is the end of train_df.label.value_counts (). Plot (kind='bar') "basic usage of pandas". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report