In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces the relevant knowledge of "what are the common skills of Python data preprocessing". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Data set
This is a sample data box containing dirty data
Let's see what can be done to make this dataset clean.
The first column is redundant and should be deleted
There is no standard for Date
Name is written with last name, first name, and uppercase and lowercase letters
Payment represents a quantity, but they are displayed as strings and need to be processed
In Note, there are some non-alphanumeric numbers that should be deleted
Example 1
Deleting a column is a simple operation using the drop function. In addition to writing column names, we also need to specify the value of the axis parameter, because the drop function is used to delete rows and columns. Finally, we can use the inplace parameter to save the changes.
Example 2 of import pandas as pddf.drop ("Unnamed: 0", axis=1, inplace=True)
We have a variety of options to convert date values to the appropriate format. An easier way is to use the astype function to change the data type of the column.
It can process a wide range of values and convert them to a clean, standard date format.
Df ["Date"] = df ["Date"] .astype ("datetime64 [ns]")
Example 3
With regard to the name column, we first need to solve the following problems:
First of all, we should represent them in all uppercase or lowercase letters. Another option is to capitalize them (that is, only the initials are capitalized)
Switch the order of last name and first name
Df ["Name"] .str.split (",", expand=True)
Then, I will combine the second column with the first column, with a space in the middle. The final step is to use the lower function to convert letters to lowercase.
Df ["Name"] = (df ["Name"] .str.split (",", expand=True) [1] + "+ df [" Name "] .str.split (", ", expand=True) [0]) .str.lower ()
Example 4
The data type of the payment Payment cannot be used for numerical analysis. Before converting it to a numeric data type (that is, an integer or floating point), we need to remove the dollar sign and replace the comma in the first line with a dot.
We can do all of this in one line of code using Pandas
Df ["Payment"] = df ["Payment"] .str.replace (",", ".") .astype ("float")
Example 5
Some characters in the Note column also need to be deleted. When working with large datasets, it can be difficult to replace them manually.
What we can do is delete non-alphanumeric characters (for example? 、! , -. Etc.). The replace function can also be used in this case because it accepts regular expressions.
If we only want alphabetic characters, here's how we use substitution functions:
Df ["Note"] .str.replace ('[^ a-zA-Z]','') 0 Unhappy1 Satisfied2 Neutral3 Unhappy4 NeutralName: Note, dtype: object
If we want letters and numbers (that is, alphanumeric), we need to add numbers to our regular expression:
Df ["Note"] .str.replace ('[^ a-zA-Z0-9]','') 0 Unhappy1 Satisfied2 Neutral3 Unhappy4 Neutral0Name: Note, dtype: object
Please note that the 0 in the last line is not deleted this time, I just need to select the first option. If I also want to convert letters to lowercase after deleting non-alphanumeric characters
Df ["Note"] = df ["Note"] .str.replace ('[^ a-zA-Z]',') .str.lower ()
The dataset looks much better than the initial form. Of course, it's a simple dataset, but these cleanup operations will certainly help you when dealing with large datasets.
This is the end of the content of "what are the common skills of Python data preprocessing". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.