In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "what are the methods of Pandas to achieve data cleaning", interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn "what are the methods of Pandas to achieve data cleaning"?
1. Deal with null values in data
When we deal with real data, there are often a lot of missing characteristic data, that is, the so-called null value, which must be processed before further analysis.
There are many ways to deal with null values, usually by deleting or populating.
Excel uniformly replaces null values through the find and replace function:
Delete the null value by locating:
Pandas is flexible in dealing with null values. You can use the dropna function to delete null values.
Import pandas as pd data=pd.read_csv ('transcript .csv', encoding='gbk') data.dropna (how='any')
Filling null value with fillna function
① populates the null values in the data table with the number 0
Data.fillna (value=0)
② populates null values in the data table with averages
Data ['Chinese'] .fillna (data ['Chinese'] .mean ())
two。 Delete whitespace
Cleaning up spaces in excel is simple and can be replaced directly.
It is also convenient for pandas to delete spaces, mainly using the map function
Data ['name'] = data ['name'] .map (str.strip) data
3. Case conversion
The case conversion functions in excel are upper () and lower (), respectively.
The conversion functions in pandas are also upper () and lower ()
Data ['Pinyin'] = data ['Pinyin'] .str.upper () data
Data ['Pinyin'] = data ['Pinyin'] .str.lower () data
4. Change the data format
To change the data format in excel, use the shortcut key "ctrl+1" to open "format cells":
Pandas uses astype to modify the data format, changing the "Chinese" column to an integer as an example
Data ['language']. Dropna (how='any'). Astype ('int')
5. Change column name
If you change the column name in excel, you won't say it. Everyone will.
Pandas uses the rename function to change the column name, as follows:
Data.rename (columns= {'Chinese': 'language grades'})
6. Delete duplicate valu
Under the "data" of the excel ribbon, there is "Delete duplicates", which can be used to delete duplicate values in the table. The first duplicate value is retained by default, and the latter is deleted:
Pandas uses the drop_duplicates function to delete duplicate values:
Data ['math'] .drop_duplicates () # deletes the following duplicate value data ['mathematical'] .drop_duplicates (keep='last') # deletes the first duplicate value 7 by default. Modify and replace data
Using the "find and replace" function in excel to realize the replacement of numerical values
Using replace function to realize data replacement in pandas
Data ['name']. Replace ('success', 'failure') so far, I believe you have a deeper understanding of "what are the methods of data cleaning in Pandas". You might as well do it in practice! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.