Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to realize data Visualization with python

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "how to achieve data visualization in python". In daily operation, I believe many people have doubts about how to achieve data visualization in python. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "how to achieve data visualization in python". Next, please follow the editor to study!

Exploratory data Analysis (EDA) is an important part of data science or machine learning pipeline. To use data to create a robust and valuable product, you need to study the data, understand the relationships between variables, and the underlying structure of the data. Data visualization is one of the most effective tools in EDA.

We will create many different visualizations and try to introduce a feature of the Matplotlib or Seaborn library into each visualization.

We first import the related library and read the dataset into the pandas data frame.

Import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as snssns.set (style='darkgrid')% matplotlib inlinedf = pd.read_csv ("/ content/Churn_Modelling.csv") df.head ()

The dataset contains 10000 customers (that is, banks) and 14 characteristics about bank customers and their products. The goal here is to use the characteristics provided to predict whether the customer will be lost (that is, exit = 1).

Let's start with catplot, which is a classification diagram of the Seaborn library.

Sns.catplot (xylene Gendery, yawning agglomeration, data=df, hue='Exited', height=8, aspect=1.2)

People between the ages of 45 and 60 are more likely to leave than people in other age groups, the study found. There is not much difference between women and men.

The hue parameter is used to distinguish data points based on category variables.

The next visualization is the scatter plot, which shows the relationship between two numeric variables. Let's see if the customer's salary is related to the balance.

Plt.figure (figsize= (122.8)) plt.title ("EstimatedSalary vs Balance", fontsize=16) sns.scatterplot (xylene written Balancestry, yearly estimated Salaryship, data=df)

For the first time, we used the matplotlib.pyplot interface to create the Figure object and set the title. Then, we use Seaborn to draw the actual diagram on this graphical object.

Results: there is no meaningful relationship or correlation between estimated wages and balances. The balance seems to have a normal distribution (excluding customers with zero balance).

The next visualization is the box chart, which shows the distribution of a variable in the median and quartile.

Plt.figure (figsize= (122.8)) ax = sns.boxplot (x-ray geographer, yearly geographer, data=df) ax.set_xlabel ("Country", fontsize=16) ax.set_ylabel ("Age", fontsize=16)

We also resized the x and y axes using set_xlabel and set_ylabel.

The following is a box chart:

The median is the middle point when all points are sorted. Q1 (the first or lower quartile) is the median of the lower half of the dataset. Q3 (the third or upper quartile) is the median of the upper half of the dataset.

Therefore, the box line diagram provides us with the concept of distribution and outliers. In the box chart we created, there are many outliers (represented by dots) at the top.

It is found that the distribution of age variables is skewed to the right. Because of the outlier on the upper side, the average value is greater than the median.

Right skewness can be observed in the univariate distribution of variables. Let's create a distplot to observe the distribution.

Plt.figure (figsize= (125.8)) plt.title ("Distribution of Age", fontsize=16) sns.distplot (df ['Age'], hist=False)

The tail on the right is heavier than the left. The reason is the outliers we observed on the box chart.

Distplot also provides a histogram by default, but we changed it using the hist parameter.

The Seaborn library also provides different types of pair diagrams that provide an overview of pairwise relationships between variables. Let's first take a random sample from the data set to make the graph more attractive. The original data set has 10000 observations, and we will select a sample that contains 100 observations and 4 features.

Subset=df [['CreditScore','Age','Balance','EstimatedSalary']] .sample (nasty 100) g = sns.pairplot (subset, height=2.5)

On the diagonal, we can see the histogram of the variable. The other part of the grid represents the relationship between variables.

Another tool for observing pairwise relationships is a heat map, which takes a matrix and generates a color coding map. Heat maps are mainly used to check the correlation between features and target variables.

Let's first create some feature correlation matrices using the corr function of pandas.

Corr_matrix = df [['CreditScore','Age','Tenure','Balance','EstimatedSalary','Exited']] .corr ()

We can now draw this matrix.

Plt.figure (figsize= (122.8)) sns.heatmap (corr_matrix, cmap='Blues_r', annot=True)

It is found that the columns of "age" and "balance" are positively correlated with customer loss.

With the increase of the amount of data, it becomes more and more difficult to analyze and explore the data. Visualization is an important tool in exploratory data analysis, and it has powerful power when it is used effectively and properly. Visualization can also help convey a message to your audience or tell them what you have found.

There is no one visualization method for all types, so some tasks require different types of visualization. Depending on the task, different choices may be more appropriate. What all visualizations have in common is that they are good tools for exploratory data analysis and storytelling in data science.

At this point, the study on "how to achieve data visualization in python" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report