In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
Xiaobian to share with you what Pandas data analysis commonly used functions, I believe most people do not know how, so share this article for everyone's reference, I hope you have a lot of harvest after reading this article, let's go to understand it together!
1. import module
import pandas as pd #Here are two modules used: pandas and numpy
import numpy as np
2. Create a dataset and read
2.1 create a dataset
I constructed a dataset of supermarket purchases with attributes such as order ID, order date, money, product, department, and origin.
#List and dictionary can be passed into DataFrame, I use dictionary here to pass in:
data=pd.DataFrame({
"id":np.arange(101,111), # np.arange will automatically output the data in the range, here will output the id number of 101~110.
"date":pd.date_range(start="20200310",periods=10), #Output date data, set the cycle to 10, note that the number of cycles here should be equal to the number of data.
"money":[5,4,65,-10,15,20,35,16,6,20], #Set a-10 pit, the following will be filled in (so sad, dig a hole for yourself, fortunately not ready to jump ~)
"product":"'soda water','cola',' beef jerky ',' laoganma ',' pineapple','ice cream',' facial cleanser','onion',' toothpaste','potato chips'"],
"department":"'beverage ',' beverage','snacks',' condiments','fruit', np.nan,' daily necessarys','vegetable',' daily necessarys','snacks'"], #Set another empty pit
"origin":['China',' China','America','China','Thailand','China','america','China','China','Japan'] #Set up another American pit
})
data #Output View Data Set
Output:
2.2 data write and read
data.to_csv("shopping.csv",index=False) # index=False means no index, otherwise there will be an extra line index data=pd.read_csv("shopping.csv")
3. data viewing
3.1 Data Set Basic Information Query
data.shape #Number of rows and columns
data.dtypes #Data type of all columns
data['id'].dtype #Data type of a column
data.ndim #Data Dimensions
data.index #row index
data.columns #Column index
data.values #Object Value
3.2 Data Set Overview Query
data.head() #Show a few lines at the head (default 5 lines)
data.tail() #Show the last few lines (default 5 lines)
data.info() #Overview of dataset related information: index status, column data type, non-null value, memory usage
data.describe() #Rapid Synthesis of Statistical Results
4. data cleaning
4.1 View Outliers
Now, of course, this dataset is small, and outliers can be visually detected, but when the dataset is large, I use the following method to check whether there are outliers in the dataset. If there is a better method, please teach me:
for i in data:
print(i+": "+str(data[i].unique())) #View unique values in a column
Output: We find a negative value for money, a null value for department, and a case problem for origin in this dataset.
4.2 null processing
4.2.1 Null detection
data.isnull()#View null values for the entire dataset data ['department '].isnull()#View null values for a column
data.isnull() #View null values for entire dataset
data ['department '].isnull() #View null values in a column
Output:
Summarize the null value judgment, which is more intuitive. Ascending defaults to True, ascending.
data.isnull().sum().sort_values(ascending=False)
Output:
4.2.2 Handling of null values
pandas.DataFrame.fillna(value = None,method = None,inplace = False)
value: The value used for filling, which can be a concrete value, dictionary or array, but not a list;
method: filling method, such as ffill and bfill;
inplace defaults to no False, and if True, modifies all other views on this object.
data['department'].fillna(method="ffill") #Fill in the last value, i.e. fill in "fruit"
Output:
data['department'].fillna(method="bfill") #Fill in the next value, i.e. fill in "commodity"data ['department '].fillna(value="frozen food",inplace=True) #Replace with concrete value and modify the original object value
Output:
4.3 space handling
Only for object type data
for i in data: #Traverse through every column in the dataset
if pd.api.types.is_object_dtype(data[i]): #If it is data of type object, execute the following code
data[i]=data[i].str.strip() #Remove spaces
data['origin'].unique() #Verify
Output result: array(['China',' America','Thailand',' america','Japan'], dtype=object)
4.4 case conversion
data['origin'].str.title() #Capitalize the first letter
data['origin'].str.capitalize() #Capitalize the first letter
data['origin'].str.upper() #All caps
data['origin'].str.lower() #All lower case
4.5 data replacement
data['origin'].replace("america","America",inplace=True) #Replace the first value with the second, inplace defaults to False
data['origin']
Output:
data['money'].replace(-10,np.nan,inplace=True) #Replace negative values with null values
data['money'].replace(np.nan,data['money'].mean(),inplace=True) #Replace null with mean
data['money']
Output:
4.6 data deletion
method one
data1 = data[data.origin != 'American']#Remove origin for American line
data1
data2=data[(data != 'Japan').all(1)] Remove all lines containing Japan Does not equal Japan behavior true, then returns
data2
methodology II
data['origin'].drop_duplicates() #Delete duplicate values later by default, i.e. keep duplicate values for the first time
Output:
data['origin'].drop_duplicates(keep='last') #Remove previous occurrences of duplicate values, i.e. keep the last occurrence of duplicate values
Output:
For more information on the usage of pandas.DataFrame.drop_duplicates, please click the official link below: pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html#pandas.DataFrame.drop_duplicates
4.7 data format conversion
data['id'].astype('str') #Convert id column type to string type.
Common Data Types Comparison
4.8 Change column name
data.rename(columns={'id':' ID','origin':' origin'}) #Change id column to ID and origin to origin.
Output:
The above is "Pandas data analysis commonly used functions what" all the content of this article, thank you for reading! I believe that everyone has a certain understanding, hope to share the content to help everyone, if you still want to learn more knowledge, welcome to pay attention to the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.