In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "what are the knowledge points of pandas in Python". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn "what are the pandas knowledge points in Python"?
Preface
Pandas is a tool based on Numpy, which is created to solve the task of data analysis. Pandas incorporates a large number of libraries and some standard data models, and provides the tools needed to operate large data sets efficiently. Pandas provides a large number of functions and methods that enable us to deal with data quickly and easily.
1. Pandas operation flow
Addition, deletion, modification and query of tabular data
Realize multi-table processing
Data cleaning operations: missing values, duplicate values, abnormal values, data standardization, data conversion operations
Realize the special operation of excel, generate PivotTable and crosstab.
Complete the statistical analysis.
II. The creation of pandas
1. Import pandas library
Import pandas as pd
2. Table structure data to build Dataframe
Columns: column index index: row index values: element data
Method 1:
Df = pd.DataFrame (
Data= [['alex', 20,' male', '0831'], ['tom', 30,' female', '0830'],]
Index= ['axiaojiaozhongb'], # can not be written, default starts from 0, or you can directly specify characters to sort.
Columns= ['name',' age', 'sex',' class']
) # Construction method
Print (df) # print data
Name age sex class
An alex 20 male 0831
B tom 30 female 0830
Method 2:
Df1 = pd.DataFrame (data= {'name': [' tom', 'alex'],' age': [18d20], 'sex': [' male', 'female'], 'class': [' 0831']})
Print (df) # print data. When index character sorting is not specified, sorting starts from 0 by default
Name age sex class
0 alex 20 male 0831
1 tom 30 female 0830
3. Attributes of dataframe
Because pandas is based on numpy, the ndarray property of numpy, dataframe also has.
Df.shape # structure
Df.ndim # Dimension
Number of df.size #
Data type of the df.dtypes # element
Df.columns # column index
Df.index # Row Index
Df.values # element
III. Search for df
1. Index a column of values
Df1 ['name'] one-dimensional tangent, which returns series
Print (df1 ['name']) # method of cutting a list of values
0 tom
1 alex
2. The method of cutting multi-column values
Print (df1 [['name',' age']])
Name age
0 tom 18
1 alex 20
Print (type (df1 [['name',' age']]) # series is an one-dimensional type with only one axis
3. The method of index cutting
Method 1:
Print (df [['name',' age']] [: 2]) # cannot specify rows for indexing
Name age
An alex 20
B tom 30
Method 2:
Index cutting method: df.loc [row index name, condition, column index name]
Print (df.loc ['averse,' name'])
Alex
Df.loc ['asides, [' name']] # row or column, as long as one is a string, it is one-dimensional
Df.loc [['a'], ['name']] # rows or columns. Both parameters are lists and are two-dimensional.
4. Conditional index: bool slice
Mask = df ['age'] > 18 # return all students over 18 years old, return True, False
Mask2 = df ['sex'] = =' female'# return to all female classmates
Mask3 = mask & mask2 # combines two mask. You can't use and, you can only use & logic and
Print (mask3)
A False
B True
Dtype: bool
Print (df.loc [mask3,:]) # uses mask to slice the data
Name age sex class
B tom 30 female 0830
5. Index query: iloc [index of row, index of column] # closed before and after opening
Print (df.iloc [: 1,:])
Name age sex class
An alex 20 male 0831
Fourth, the method of adding df
1. Add columns to key-value pairs
# df ['address'] = [' Beijing', 'Shanghai'] two ways, one by one, directly equal to 'Beijing', then all the data will become Beijing
Df ['address'] =' Beijing'
Name age sex class address
An alex 20 male 0831 Beijing
B tom 30 female 0830 Beijing
2. Add rows to append
Df_mini = pd.DataFrame (data = {
'name': ['jerry',' make']
'age': [15,18]
'sex': [' male', 'female']
'class': [' 0831', '0770']
Address': ['Beijing', 'Henan']
}, index = ['averse,' b'])
Df4 = df.append (df_mini)
Print (df4)
An alex 20 male 0831 Beijing
B tom 30 female 0830 Beijing
A jerry 15 male 0831 Beijing
B make 18 female 0770 Henan
5. Deletion method
Axis: deleted rows or columns
Inplace: whether to modify the original table
A = df4.drop (labels= ['address',' class'], axis=1) # deleting a column requires a variable to accept
Df4.drop (labels= ['a'], axis=0, inplace=True)
VI. Modification
Cut out the specified data, and then modify the assignment
C = df4.loc [df4 ['name'] = =' tom', 'class'] =' problem'
Print (c)
Name age sex class address
An alex 20 male 0831 Beijing
B tom 30 women have problems Beijing
A jerry 15 male 0831 Beijing
B make 18 female 0770 Henan
VII. Statistical analysis
1. 10 statistical methods in Numpy are extended.
Min () argmin () max () argmax () std () vat () sum () mean () cumsum () cumprod ()
2. Methods in pandas
Df ['age'] .min () df [' age'] .max () df ['age'] .argsort ()
3. Number, non-empty elements, frequency
Df ['age'] .mode ()
A grade
B grade
Dtype: object
Df ['age'] .count ()
Tom 1
Make 1
Alex 1
Jerry 1
Name: name, dtype: int64
Df ['age'] .value_counts ()
Name alex
Age 20
Sex female
Class 0830
Address Beijing
Dtype: object
4. For df type
Df ['age'] .idxmax (axis=1) # horizontal comparison
Df ['age'] .idxmax (axis=0) # Vertical comparison
Name age sex class address
0 alex 15 women 0831 Beijing
1 jerry 18 male NaN NaN
2 make 20 NaN NaN NaN
3 tom 30 NaN NaN NaN
5. Describe describe
Df ['age'] .resume ()
# age
# count 4.00 non-empty number
# mean 20.75 APCge
# std 6.50 standard deviation
# min 15.00 minimum
# 25% 17.25 1amp 4
# 50% 19.00 2amp 4
# 75% 22.50 3max 4
# max 30.00 Max
Df ['name'] .resume ()
# count: non-empty number
# unique: there are several values after the weight is removed
# top: multiplicity
# freq: the frequency at which the mode appears
VIII. Reading of Excel files
Pandas can read multiple data types. Here's how to read excel data.
Pd.read_excel (r 'file path')
At this point, I believe you have a deeper understanding of "what are the pandas knowledge points in Python?" you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.