Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the practical skills of Pandas data analysis

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article is to share with you what practical tips for Pandas data analysis have. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

Tip 1: how to use map to do feature engineering on certain columns?

Sir, the data:

D = {"gender": ["male", "female", "male", "female"], "color": ["red", "green", "blue", "green"], "age": [25,30,15,32]} df = pd.DataFrame (d) df

On the gender column, use the map method to quickly complete the following mapping:

D = {"male": 0, "female": 1} df ["gender2"] = df ["gender"] .map (d)

Pandas 数据分析 5 个实用小技巧

Tip 2: clean data using replace and regularization

The strength of Pandas lies in data analysis, so data cleaning is essential.

A quick data cleaning tip, using the replace method and regularization on a column to quickly clean the values.

Source data:

D = {"customer": ["A", "B", "C", "D"], "sales": [1100, "950.5RMB", "$400", "$1250.75"]} df = pd.DataFrame (d) df

Print the results:

Customer sales0 A 11001 B 950.5RMB2 C $4003 D $1250.75

See the value of the sales column, there are integers, floating point + RMB becomes string, and dollar + integer, dollar + floating point.

Our goal: to clean out the RMB,$ symbol and convert this to float.

One line of code is done: (click on the code area, swipe to the right to see the complete code)

Df ["sales"] = df ["sales"] .replace ("[$, RMB]", "", regex = True)\ .astype ("float")

Using regular substitution, put the character to be replaced in the list [$, RMB] and replace it with an empty character, that is, "

Finally, use astype to convert to float

Print the results:

Customer sales0 A 1100.001 B 950.502 C 400.003 D 1250.75

If you are worried, check the type of the following values:

Df ["sales"] .apply (type)

Print the results:

0 1 23 [python Learning Communication Group] Tip 3: how to use melt to analyze data perspective?

Construct a DataFrame:

D = {\ "district_code": [12345, 56789, 101112, 131415], "apple": [5.2,2.4,4.2,3.6], "banana": [3.5,1.9,4.0,2.3], "orange": [12345, 7.5,6.4,3.9]} df = pd.DataFrame (d) df

Print the results:

District_code apple banana orange0123455.23.58.01567892.41.97.521011124.24.06.431314153.62.33.9

Represents the apple price of region 12345, and apple, banana, orange, these three columns are all a kind of fruit, so how to merge these three columns into one column?

Use pd.melt

The value of specific parameters is determined according to this example:

Df = df.melt (\ id_vars = "district_code", var_name = "fruit_name", value_name = "price") df

Print the results:

District_code fruit_name price012345 apple 5.2156789 apple 2.42101112 apple 4.23131415 apple 3.6412345 banana 3.5556789 banana 1.96101112 banana 4.07131415 banana 2.3812345 orange 8.0956789 orange 7.510101112 orange 6.411131415 orange 3.9

The above is the long DataFrame, and the corresponding original DataFrame is the wide DF.

Tip 4: know year and dayofyear, how to transfer datetime?

Original DataFrame

D = {\ "year": [2019, 2019, 2020], "day_of_year": [350,365,1]} df = pd.DataFrame (d) df

Print the results:

Year day_of_year0201935012019365220201

Tips for transferring to datetime

Step 1: create an integer

Df ["int_number"] = df ["year"] * 1000 + df ["day_of_year"]

Print df results:

Year day_of_year int_number0201935020193501201936520193652202012020001

Step 2: to_datetime

Df ["date"] = pd.to_datetime (df ["int_number"], format = "% Y% j")

Note the conversion format j in "% Y% j"

Print the results:

Year day_of_year int_number date0201935020193502019-12-161201936520193652019-12-3122020120200012020-01-01 Tip 5: how to classify a value with fewer occurrences in a classification as others?

This is also a task we face in data cleaning and feature construction.

The following DataFrame:

D = {"name": ['Jone','Alica','Emily','Robert','Tomas','Zhang','Liu','Wang','Jack','Wsx','Guo'], "categories": ["A", "C", "A", "D", "A", "B", "B", "C", "A", "E", "F"]} df = pd.DataFrame (d) df

Results:

Name categories0 Jone A1 Alica C2 Emily A3 Robert D4 Tomas A5 Zhang B6 Liu B7 Wang C8 Jack A9 Wsx E10 Guo F

D, E and F appear only once in the classification, and An appears more times.

Step 1: count the frequency and normalize it

Frequencies = df ["categories"] .value_counts (normalize = True) frequencies

Results:

A 0.363636B 0.181818C 0.181818F 0.090909E 0.090909D 0.090909Name: categories, dtype: float64

Step 2: set the threshold and filter out the value with less frequency

Threshold = 0.1small_categories = frequencies [frequencies < threshold] .indexsmall _ categories

Results:

Index (['favored,' estranged,'D'], dtype='object')

Step 3: replace the value

Df ["categories"] = df ["categories"]\ .replace (small_categories, "Others")

Replaced DataFrame:

Name categories0 Jone A1 Alica C2 Emily A3 Robert Others4 Tomas A5 Zhang B6 Liu B7 Wang C8 Jack A9 Wsx Others10 Guo Others Thank you for reading! This is the end of this article on "what are the practical tips for Pandas data analysis?". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report