What are the commonly used exploratory data analysis methods in Python 04/24 Update SLTechnology News&Howtos

What are the commonly used exploratory data analysis methods in Python

2025-04-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces which exploratory data analysis methods are commonly used in Python, which have a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, let the editor take you to understand it.

There are many common exploratory data analysis methods, such as .head (), .tail (), .info (), .plot (), .value _ counts (), .head (), .tail (), .info (), .plot ().

Import pandas as pdimport numpy as npdf = pd.DataFrame ({"Student": ["Mike", "Jack", "Diana", "Charles", "Philipp", "Charles", "Kale", "Jack"], "City": ["London", "London", "Berlin", "London", "London", "Berlin", "London", "Berlin"], "Age": [20,40,18 24, 37, 40, 44, 20], "Maths_Score": [84, 80, 50, 36, 44, 24, 41, 35], "Science_Score": [66, 83, 51, 35, 43, 58, 71, 65]}) df creates a groupby () object in Pandas

In many cases, we want to split the dataset into multiple groups and process those groups. The Pandas method groupby () is used to group data in DataFrame.

Instead of using groupby () and aggregation methods together, create a groupby () object. Ideally, we can use this object directly when needed.

Let's group the given DataFrame according to the column "City"

Df_city_group = df.groupby ("City")

We create an object df_city_group that can be combined with different aggregations, such as min (), max (), mean (), describe (), and count (). An example is shown below.

To get that "City" is a DataFrame subset of Berlin, simply use the method .get _ group ()

This eliminates the need to create a copy of each child DataFrame for each group, which saves memory.

In addition, slicing with .groupby () is twice as fast as the conventional method!

Use .nsimplest ()

Typically, we know the Top 3 or Top 5 data of DataFrame based on the values of a particular column. For example, those who get the top three scores in the exam or the top five movies watched the most from the dataset. Using Pandas .nsimplest () is the easiest way.

Df.nlargest (N, column_name, keep = 'first')

Using the .nsimplest () method, you can retrieve the DataFrame row that contains the Top'N' value of the specified column.

In the above example, let's get the rows of the first three "Maths_Score" DataFrame.

If there is a relationship between the two values, you can modify additional and optional parameters. It requires the values "first", "last", and "all" to retrieve the first, last, and all values in the tie. The advantage of this approach is that you don't need to sort DataFrame specifically.

Use .nsmallest ()

Similar to Top 3 or 5, sometimes we need five Last records in DataFrame. For example, the five students with the lowest scores in the five lowest-rated movies or exams. Using Pandas .nsmallest () is the easiest way

Df.nsmallestst (N, column_name, keep = 'first')

Using the .nsmallest () method, you can retrieve the DataFrame row containing the bottom "N" value of the specified column.

In the same example, let's get the lowest three lines of "Maths_Score" in DataFrame "df".

Logical comparison

The comparison operators, =, =,! = and their wrappers .lt (), .gt (), .le (), .ge (), .eq (), and .ne () are convenient to compare DataFrame with base values in the following cases, respectively, resulting in a series of Boolean values that can be used as future indicators.

Slicing DataFrame based on comparison

You can extract a subset from the DataFrame based on a comparison with the value.

Create a new column in the existing DataFrame based on the comparison of the two columns.

All of these scenarios are explained in the following example

# 1. Comparing the DataFrame to a base value# Selecting the columns with numerical values onlydf.iloc [:, 2:5] .GT (50) df.iloc [: 2:5] .lt (50) # 2. Slicing the DataFrame based on comparison# df1 is subset of df when values in "Maths_Score" column are not equal or equal to '35'df1 = df [df ["Maths_Score"] .ne (35)] df2 = df [df ["Maths_Score"] .eq (35)] # 3. Creating new column of True-False values by comparing two columnsdf ["Maths_Student"] = df ["Maths_Score"] .ge (df ["Science_Score"] ) df ["Maths_Student_1"] = df ["Science_Score"] .le (df ["Maths_Score"]) Thank you for reading this article carefully I hope the article "what are the commonly used exploratory data analysis methods in Python" shared by the editor will be helpful to you. At the same time, I also hope that you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.