Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the most commonly used Pandas functions from Excel to Python

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

What are the most commonly used Pandas functions from Excel to Python? I believe many inexperienced people are at a loss about this. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Generate a data table

There are two common ways to generate data tables, the first is to import external data, and the second is to write data directly. The File menu in Excel provides the ability to obtain external data and supports the import of multiple data sources for databases and text files and pages.

Python supports importing from many types of data. At the beginning of using Python for data

We need to import the pandas library before importing. For convenience, we also import numpy.

Library.

Import numpy as npimport pandas as pd

Import external data

Df=pd.DataFrame (pd.read_csv ('name.csv',header=1)) df=pd.DataFrame (pd.read_Excel (' name.xlsx')) c

There are many optional parameter settings, such as column name, index column, data format, etc.

Write data directly

Df = pd.DataFrame ({"id": [1001pearl 1002, 1004, 1004), "date": pd.date_range ('20130102, periods=6), "city": [' Beijing', 'SH',' guangzhou', 'Shenzhen',' shanghai', 'BEIJING'], "age": [23, 4, 4, 54, 32, 32], "category": ['100 lyrics, 100 races, 100 races, 110 races, 110 colors, 110 colors, Np.nan,4432]}, columns = ['id','date','city','category','age','price'])

Datasheet check

The purpose of the datasheet check is to understand the overall situation of the datasheet, to obtain the key information of the datasheet and an overview of the data, such as the size, space occupied, data format, and availability of the entire datasheet.

Null values and duplicates and specific data content, ready for subsequent cleaning and preprocessing.

1. Data dimensions (rows and rows)

In Excel, you can use the down cursor key of CTRL+ and the right cursor key of CTRL+

Check the line number and column number. The shape function is used in Python to view the dimensions of the data table, that is, the number of rows and columns.

Df.shape

two。 Datasheet information

Use the info function to view the overall information of the data table, including the data dimension, column name, data format, and space occupied.

# data sheet Information

Df.info () RangeIndex: 6 entries, 0 to 5Data columns (total 6 columns): id 6 non-null int64date 6 non-null datetime 64 [ns] city 6 non-null objectcategory 6 non-null objectage 6 non-null int64price 4 non-null float64dtypes: datetime64 [ns] (1), float64 (1), int64 (2), object (2) memory usage: 368.0 + bytes

3. View data format

In Excel, the number is determined by selecting the cell and viewing the numeric type in the start menu.

According to the format. The dtypes function is used in Python to return the data format.

Dtypes is a function to view the data format, which can be viewed in the data table at once.

There is a format of the data, or you can specify a column to view separately.

# View data table column format df.dtypesid int64date datetime64 [ns] city objectcategory objectage int64price float64dtype: object# view single column format df ['B'] .dtypedtype ('int64')

4. View null values

The way to see null values in Excel is to use the location criteria in the find and Select directory under the start directory.

Isnull is a function that checks null values in Python.

# check data null df.isnull ()

# check specific column null df ['price'] .isnull ()

5. View unique valu

The way to see unique values in Excel is to use conditional formatting to color unique values

Mark.

Use the unique function in Python to view unique values.

# View the unique value df ['city'] .unique () array in the city column ([' Beijing', 'SH',' guangzhou', 'Shenzhen',' shanghai', 'BEIJING'], dtype=object)

6. View Datasheet values

The Values function in Python is used to view the values in the data table.

# View the value df.values of the data table

7. View column names

The Colums function is used to view the column names in the data table separately.

# View column name df.columnsIndex (['id',' date', 'city',' category', 'age',' price'], dtype='object')

8. View the first 10 rows of data

The Head function is used to view the first N rows of data in the data table.

# View the first 3 rows of data df.head (3)

9. View the last 10 rows of data

The number of Tail rows is opposite to the head function, which is used to view the data of the last N rows in the data table.

# View the last 3 rows of df.tail (3) data table cleaning

This Python tutorial introduces cleaning problems in data tables, including the handling of null values, case problems, data formats, and duplicate values.

1. Handle null values (delete or fill)

Null values can be handled by the find and replace function in Excel

The method of dealing with null values in Python is more flexible. You can use the Dropna function to delete data that contains null values in the data table, or you can use the fillna function to fill the null values.

# Delete rows with null values in the data table, df.dropna (how='any')

You can also use numbers to fill in null values

# populate the empty value df.fillna (value=0) in the data table with the number 0

Populate the NA field with the mean of the price column, as well as the fillna function, and use the mean function to calculate the current mean of the price column in the values to be populated, and then use this mean to populate the NA.

# populate NA with price mean df ['price'] .fillna (df [' price']. Mean ()) Out [8]: 0 1200.01 3299.52 2133.03 5433.04 3299.55 4432.0Name: price, dtype: float64

two。 Clear whitespace

Spaces in characters are also a common problem in data cleaning.

# clear the character space in the city field df ['city'] = df [' city'] .map (str.strip)

3. Case conversion

In the English field, the case of letters is not uniform is also a common problem.

There are functions such as UPPER,LOWER in Excel and functions with the same name in Python to solve the problem.

It's a matter of case.

# city column case conversion df ['city'] = df [' city'] .str.lower ()

4. Change the data format

The data format can be modified through the "format cells" function in Excel.

In Python, the astype function is used to modify the data format.

# change the data format df ['price'] .astype (' int') 0 12001 32992 21333 54334 32995 4432Name: price, dtype: int32

5. Change column name

Rename is a function that changes the name of a column. We will change the category column in the data table to category-size in the future.

# change the column name df.rename (columns= {'category':' category-size'})

6. Delete duplicate valu

Excel has the function of "delete duplicates" under the data directory.

Use the drop_duplicates function to delete duplicate values in Python

Df ['city'] 0 beijing1 sh2 guangzhou3 shenzhen4 shanghai5 beijingName: city, dtype: object

There is a repetition of beijing in the city column, which is in the first and last place, respectively.

The drop_duplicates () function deletes duplicate values

# duplicate value df ['city'] .drop_duplicates () 0 beijing1 sh2 guangzhou3 shenzhen4 shanghaiName: city, dtype: object after deletion

After setting the keep='last'' parameter, contrary to the previous result of deleting the duplicate value, the first

The beijing that appears is deleted

# Delete the repeated value df ['city'] .drop_duplicates (keep='last') 1 sh2 guangzhou3 shenzhen4 shanghai5 beijingName: city, dtype: objec

7. Numerical modification and replacement

Numerical substitution can be realized by using the find and replace function in Excel.

Using replace function to realize data replacement in Python

Attached # data replacement df ['city']. Replace (' sh', 'shanghai') 0 beijing1 shanghai2 guangzhou3 shenzhen4 shanghai5 beijingName: city, d after reading the above, have you mastered the most commonly used Pandas functions from Excel to Python? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report