Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the pandas knowledge points in Python?

2025-04-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what are the knowledge points of pandas in Python". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn "what are the pandas knowledge points in Python"?

Preface

Pandas is a tool based on Numpy, which is created to solve the task of data analysis. Pandas incorporates a large number of libraries and some standard data models, and provides the tools needed to operate large data sets efficiently. Pandas provides a large number of functions and methods that enable us to deal with data quickly and easily.

1. Pandas operation flow

Addition, deletion, modification and query of tabular data

Realize multi-table processing

Data cleaning operations: missing values, duplicate values, abnormal values, data standardization, data conversion operations

Realize the special operation of excel, generate PivotTable and crosstab.

Complete the statistical analysis.

II. The creation of pandas

1. Import pandas library

Import pandas as pd

2. Table structure data to build Dataframe

Columns: column index index: row index values: element data

Method 1:

Df = pd.DataFrame (

Data= [['alex', 20,' male', '0831'], ['tom', 30,' female', '0830'],]

Index= ['axiaojiaozhongb'], # can not be written, default starts from 0, or you can directly specify characters to sort.

Columns= ['name',' age', 'sex',' class']

) # Construction method

Print (df) # print data

Name age sex class

An alex 20 male 0831

B tom 30 female 0830

Method 2:

Df1 = pd.DataFrame (data= {'name': [' tom', 'alex'],' age': [18d20], 'sex': [' male', 'female'], 'class': [' 0831']})

Print (df) # print data. When index character sorting is not specified, sorting starts from 0 by default

Name age sex class

0 alex 20 male 0831

1 tom 30 female 0830

3. Attributes of dataframe

Because pandas is based on numpy, the ndarray property of numpy, dataframe also has.

Df.shape # structure

Df.ndim # Dimension

Number of df.size #

Data type of the df.dtypes # element

Df.columns # column index

Df.index # Row Index

Df.values # element

III. Search for df

1. Index a column of values

Df1 ['name'] one-dimensional tangent, which returns series

Print (df1 ['name']) # method of cutting a list of values

0 tom

1 alex

2. The method of cutting multi-column values

Print (df1 [['name',' age']])

Name age

0 tom 18

1 alex 20

Print (type (df1 [['name',' age']]) # series is an one-dimensional type with only one axis

3. The method of index cutting

Method 1:

Print (df [['name',' age']] [: 2]) # cannot specify rows for indexing

Name age

An alex 20

B tom 30

Method 2:

Index cutting method: df.loc [row index name, condition, column index name]

Print (df.loc ['averse,' name'])

Alex

Df.loc ['asides, [' name']] # row or column, as long as one is a string, it is one-dimensional

Df.loc [['a'], ['name']] # rows or columns. Both parameters are lists and are two-dimensional.

4. Conditional index: bool slice

Mask = df ['age'] > 18 # return all students over 18 years old, return True, False

Mask2 = df ['sex'] = =' female'# return to all female classmates

Mask3 = mask & mask2 # combines two mask. You can't use and, you can only use & logic and

Print (mask3)

A False

B True

Dtype: bool

Print (df.loc [mask3,:]) # uses mask to slice the data

Name age sex class

B tom 30 female 0830

5. Index query: iloc [index of row, index of column] # closed before and after opening

Print (df.iloc [: 1,:])

Name age sex class

An alex 20 male 0831

Fourth, the method of adding df

1. Add columns to key-value pairs

# df ['address'] = [' Beijing', 'Shanghai'] two ways, one by one, directly equal to 'Beijing', then all the data will become Beijing

Df ['address'] =' Beijing'

Name age sex class address

An alex 20 male 0831 Beijing

B tom 30 female 0830 Beijing

2. Add rows to append

Df_mini = pd.DataFrame (data = {

'name': ['jerry',' make']

'age': [15,18]

'sex': [' male', 'female']

'class': [' 0831', '0770']

Address': ['Beijing', 'Henan']

}, index = ['averse,' b'])

Df4 = df.append (df_mini)

Print (df4)

An alex 20 male 0831 Beijing

B tom 30 female 0830 Beijing

A jerry 15 male 0831 Beijing

B make 18 female 0770 Henan

5. Deletion method

Axis: deleted rows or columns

Inplace: whether to modify the original table

A = df4.drop (labels= ['address',' class'], axis=1) # deleting a column requires a variable to accept

Df4.drop (labels= ['a'], axis=0, inplace=True)

VI. Modification

Cut out the specified data, and then modify the assignment

C = df4.loc [df4 ['name'] = =' tom', 'class'] =' problem'

Print (c)

Name age sex class address

An alex 20 male 0831 Beijing

B tom 30 women have problems Beijing

A jerry 15 male 0831 Beijing

B make 18 female 0770 Henan

VII. Statistical analysis

1. 10 statistical methods in Numpy are extended.

Min () argmin () max () argmax () std () vat () sum () mean () cumsum () cumprod ()

2. Methods in pandas

Df ['age'] .min () df [' age'] .max () df ['age'] .argsort ()

3. Number, non-empty elements, frequency

Df ['age'] .mode ()

A grade

B grade

Dtype: object

Df ['age'] .count ()

Tom 1

Make 1

Alex 1

Jerry 1

Name: name, dtype: int64

Df ['age'] .value_counts ()

Name alex

Age 20

Sex female

Class 0830

Address Beijing

Dtype: object

4. For df type

Df ['age'] .idxmax (axis=1) # horizontal comparison

Df ['age'] .idxmax (axis=0) # Vertical comparison

Name age sex class address

0 alex 15 women 0831 Beijing

1 jerry 18 male NaN NaN

2 make 20 NaN NaN NaN

3 tom 30 NaN NaN NaN

5. Describe describe

Df ['age'] .resume ()

# age

# count 4.00 non-empty number

# mean 20.75 APCge

# std 6.50 standard deviation

# min 15.00 minimum

# 25% 17.25 1amp 4

# 50% 19.00 2amp 4

# 75% 22.50 3max 4

# max 30.00 Max

Df ['name'] .resume ()

# count: non-empty number

# unique: there are several values after the weight is removed

# top: multiplicity

# freq: the frequency at which the mode appears

VIII. Reading of Excel files

Pandas can read multiple data types. Here's how to read excel data.

Pd.read_excel (r 'file path')

At this point, I believe you have a deeper understanding of "what are the pandas knowledge points in Python?" you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report