Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the common functions in pandas data analysis

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

Xiaobian to share with you what Pandas data analysis commonly used functions, I believe most people do not know how, so share this article for everyone's reference, I hope you have a lot of harvest after reading this article, let's go to understand it together!

1. import module

import pandas as pd #Here are two modules used: pandas and numpy

import numpy as np

2. Create a dataset and read

2.1 create a dataset

I constructed a dataset of supermarket purchases with attributes such as order ID, order date, money, product, department, and origin.

#List and dictionary can be passed into DataFrame, I use dictionary here to pass in:

data=pd.DataFrame({

"id":np.arange(101,111), # np.arange will automatically output the data in the range, here will output the id number of 101~110.

"date":pd.date_range(start="20200310",periods=10), #Output date data, set the cycle to 10, note that the number of cycles here should be equal to the number of data.

"money":[5,4,65,-10,15,20,35,16,6,20], #Set a-10 pit, the following will be filled in (so sad, dig a hole for yourself, fortunately not ready to jump ~)

"product":"'soda water','cola',' beef jerky ',' laoganma ',' pineapple','ice cream',' facial cleanser','onion',' toothpaste','potato chips'"],

"department":"'beverage ',' beverage','snacks',' condiments','fruit', np.nan,' daily necessarys','vegetable',' daily necessarys','snacks'"], #Set another empty pit

"origin":['China',' China','America','China','Thailand','China','america','China','China','Japan'] #Set up another American pit

})

data #Output View Data Set

Output:

2.2 data write and read

data.to_csv("shopping.csv",index=False) # index=False means no index, otherwise there will be an extra line index data=pd.read_csv("shopping.csv")

3. data viewing

3.1 Data Set Basic Information Query

data.shape #Number of rows and columns

data.dtypes #Data type of all columns

data['id'].dtype #Data type of a column

data.ndim #Data Dimensions

data.index #row index

data.columns #Column index

data.values #Object Value

3.2 Data Set Overview Query

data.head() #Show a few lines at the head (default 5 lines)

data.tail() #Show the last few lines (default 5 lines)

data.info() #Overview of dataset related information: index status, column data type, non-null value, memory usage

data.describe() #Rapid Synthesis of Statistical Results

4. data cleaning

4.1 View Outliers

Now, of course, this dataset is small, and outliers can be visually detected, but when the dataset is large, I use the following method to check whether there are outliers in the dataset. If there is a better method, please teach me:

for i in data:

print(i+": "+str(data[i].unique())) #View unique values in a column

Output: We find a negative value for money, a null value for department, and a case problem for origin in this dataset.

4.2 null processing

4.2.1 Null detection

data.isnull()#View null values for the entire dataset data ['department '].isnull()#View null values for a column

data.isnull() #View null values for entire dataset

data ['department '].isnull() #View null values in a column

Output:

Summarize the null value judgment, which is more intuitive. Ascending defaults to True, ascending.

data.isnull().sum().sort_values(ascending=False)

Output:

4.2.2 Handling of null values

pandas.DataFrame.fillna(value = None,method = None,inplace = False)

value: The value used for filling, which can be a concrete value, dictionary or array, but not a list;

method: filling method, such as ffill and bfill;

inplace defaults to no False, and if True, modifies all other views on this object.

data['department'].fillna(method="ffill") #Fill in the last value, i.e. fill in "fruit"

Output:

data['department'].fillna(method="bfill") #Fill in the next value, i.e. fill in "commodity"data ['department '].fillna(value="frozen food",inplace=True) #Replace with concrete value and modify the original object value

Output:

4.3 space handling

Only for object type data

for i in data: #Traverse through every column in the dataset

if pd.api.types.is_object_dtype(data[i]): #If it is data of type object, execute the following code

data[i]=data[i].str.strip() #Remove spaces

data['origin'].unique() #Verify

Output result: array(['China',' America','Thailand',' america','Japan'], dtype=object)

4.4 case conversion

data['origin'].str.title() #Capitalize the first letter

data['origin'].str.capitalize() #Capitalize the first letter

data['origin'].str.upper() #All caps

data['origin'].str.lower() #All lower case

4.5 data replacement

data['origin'].replace("america","America",inplace=True) #Replace the first value with the second, inplace defaults to False

data['origin']

Output:

data['money'].replace(-10,np.nan,inplace=True) #Replace negative values with null values

data['money'].replace(np.nan,data['money'].mean(),inplace=True) #Replace null with mean

data['money']

Output:

4.6 data deletion

method one

data1 = data[data.origin != 'American']#Remove origin for American line

data1

data2=data[(data != 'Japan').all(1)] Remove all lines containing Japan Does not equal Japan behavior true, then returns

data2

methodology II

data['origin'].drop_duplicates() #Delete duplicate values later by default, i.e. keep duplicate values for the first time

Output:

data['origin'].drop_duplicates(keep='last') #Remove previous occurrences of duplicate values, i.e. keep the last occurrence of duplicate values

Output:

For more information on the usage of pandas.DataFrame.drop_duplicates, please click the official link below: pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html#pandas.DataFrame.drop_duplicates

4.7 data format conversion

data['id'].astype('str') #Convert id column type to string type.

Common Data Types Comparison

4.8 Change column name

data.rename(columns={'id':' ID','origin':' origin'}) #Change id column to ID and origin to origin.

Output:

The above is "Pandas data analysis commonly used functions what" all the content of this article, thank you for reading! I believe that everyone has a certain understanding, hope to share the content to help everyone, if you still want to learn more knowledge, welcome to pay attention to the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report