Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the methods of data cleaning in Pandas

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains "what are the methods of Pandas to achieve data cleaning", interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn "what are the methods of Pandas to achieve data cleaning"?

1. Deal with null values in data

When we deal with real data, there are often a lot of missing characteristic data, that is, the so-called null value, which must be processed before further analysis.

There are many ways to deal with null values, usually by deleting or populating.

Excel uniformly replaces null values through the find and replace function:

Delete the null value by locating:

Pandas is flexible in dealing with null values. You can use the dropna function to delete null values.

Import pandas as pd data=pd.read_csv ('transcript .csv', encoding='gbk') data.dropna (how='any')

Filling null value with fillna function

① populates the null values in the data table with the number 0

Data.fillna (value=0)

② populates null values in the data table with averages

Data ['Chinese'] .fillna (data ['Chinese'] .mean ())

two。 Delete whitespace

Cleaning up spaces in excel is simple and can be replaced directly.

It is also convenient for pandas to delete spaces, mainly using the map function

Data ['name'] = data ['name'] .map (str.strip) data

3. Case conversion

The case conversion functions in excel are upper () and lower (), respectively.

The conversion functions in pandas are also upper () and lower ()

Data ['Pinyin'] = data ['Pinyin'] .str.upper () data

Data ['Pinyin'] = data ['Pinyin'] .str.lower () data

4. Change the data format

To change the data format in excel, use the shortcut key "ctrl+1" to open "format cells":

Pandas uses astype to modify the data format, changing the "Chinese" column to an integer as an example

Data ['language']. Dropna (how='any'). Astype ('int')

5. Change column name

If you change the column name in excel, you won't say it. Everyone will.

Pandas uses the rename function to change the column name, as follows:

Data.rename (columns= {'Chinese': 'language grades'})

6. Delete duplicate valu

Under the "data" of the excel ribbon, there is "Delete duplicates", which can be used to delete duplicate values in the table. The first duplicate value is retained by default, and the latter is deleted:

Pandas uses the drop_duplicates function to delete duplicate values:

Data ['math'] .drop_duplicates () # deletes the following duplicate value data ['mathematical'] .drop_duplicates (keep='last') # deletes the first duplicate value 7 by default. Modify and replace data

Using the "find and replace" function in excel to realize the replacement of numerical values

Using replace function to realize data replacement in pandas

Data ['name']. Replace ('success', 'failure') so far, I believe you have a deeper understanding of "what are the methods of data cleaning in Pandas". You might as well do it in practice! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report