Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the common skills of Python data preprocessing

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "what are the common skills of Python data preprocessing". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Data set

This is a sample data box containing dirty data

Let's see what can be done to make this dataset clean.

The first column is redundant and should be deleted

There is no standard for Date

Name is written with last name, first name, and uppercase and lowercase letters

Payment represents a quantity, but they are displayed as strings and need to be processed

In Note, there are some non-alphanumeric numbers that should be deleted

Example 1

Deleting a column is a simple operation using the drop function. In addition to writing column names, we also need to specify the value of the axis parameter, because the drop function is used to delete rows and columns. Finally, we can use the inplace parameter to save the changes.

Example 2 of import pandas as pddf.drop ("Unnamed: 0", axis=1, inplace=True)

We have a variety of options to convert date values to the appropriate format. An easier way is to use the astype function to change the data type of the column.

It can process a wide range of values and convert them to a clean, standard date format.

Df ["Date"] = df ["Date"] .astype ("datetime64 [ns]")

Example 3

With regard to the name column, we first need to solve the following problems:

First of all, we should represent them in all uppercase or lowercase letters. Another option is to capitalize them (that is, only the initials are capitalized)

Switch the order of last name and first name

Df ["Name"] .str.split (",", expand=True)

Then, I will combine the second column with the first column, with a space in the middle. The final step is to use the lower function to convert letters to lowercase.

Df ["Name"] = (df ["Name"] .str.split (",", expand=True) [1] + "+ df [" Name "] .str.split (", ", expand=True) [0]) .str.lower ()

Example 4

The data type of the payment Payment cannot be used for numerical analysis. Before converting it to a numeric data type (that is, an integer or floating point), we need to remove the dollar sign and replace the comma in the first line with a dot.

We can do all of this in one line of code using Pandas

Df ["Payment"] = df ["Payment"] .str.replace (",", ".") .astype ("float")

Example 5

Some characters in the Note column also need to be deleted. When working with large datasets, it can be difficult to replace them manually.

What we can do is delete non-alphanumeric characters (for example? 、! , -. Etc.). The replace function can also be used in this case because it accepts regular expressions.

If we only want alphabetic characters, here's how we use substitution functions:

Df ["Note"] .str.replace ('[^ a-zA-Z]','') 0 Unhappy1 Satisfied2 Neutral3 Unhappy4 NeutralName: Note, dtype: object

If we want letters and numbers (that is, alphanumeric), we need to add numbers to our regular expression:

Df ["Note"] .str.replace ('[^ a-zA-Z0-9]','') 0 Unhappy1 Satisfied2 Neutral3 Unhappy4 Neutral0Name: Note, dtype: object

Please note that the 0 in the last line is not deleted this time, I just need to select the first option. If I also want to convert letters to lowercase after deleting non-alphanumeric characters

Df ["Note"] = df ["Note"] .str.replace ('[^ a-zA-Z]',') .str.lower ()

The dataset looks much better than the initial form. Of course, it's a simple dataset, but these cleanup operations will certainly help you when dealing with large datasets.

This is the end of the content of "what are the common skills of Python data preprocessing". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report